# Overview

<!-- file:///home/strokach/documents/teaching/csc343/2018-fall/slides/SQL-DML.pdf#page=43 -->

I'm covering slides 41-90 for Sina.

- Table joins:
  - Cross join vs. natural join vs. theta join.
  - Inner join vs. full / left / right outer join.


- Impact of having null values:
  - Be super careful when columns involved in JOIN or WHERE have nulls.


- Subqueries:
  - In `FROM`
  - In `WHERE` (`ANY`, `ALL`, `IN`, `EXISTS`).

# Imports

In [None]:
import pandas as pd
import sqlalchemy as sa

In [None]:
%run sql_magic.ipynb

In [None]:
NOTEBOOK_NAME = "lecture_5"

# Start database

In [None]:
%run start_db.ipynb

In [None]:
engine = sa.create_engine(DB_URL, connect_args={'options': '-csearch_path=University'})

In [None]:
engine.table_names()

# Examples from lecture

## Avoid natural joins

In [None]:
%%sql
-- Select student id, course id, instructor name
-- for each course taken by each student
SELECT sID, oID, instructor
FROM Student NATURAL JOIN Took NATURAL JOIN Offering
LIMIT 5;

In [None]:
%%sql
select * from offering limit 2;

In [None]:
%%sql
alter table offering
add column campus varchar(255) default null;

In [None]:
%%sql
alter table offering drop column campus;

## Dangling tuples

## Null is special

In [None]:
%%sql
select * from student;

In [None]:
%%sql
drop table if exists student_2;
create table student_2 as (select * from student);
update student_2 set cgpa = null where sid = 157;
update student_2 set cgpa = null where sid = 11111;

In [None]:
%%sql
select * from student_2;

In [None]:
%%sql
select avg(cgpa) from student_2;

In [None]:
%%sql
select distinct cgpa from student_2;

In [None]:
%%sql
select count(distinct cgpa) from student_2;

In [None]:
%%sql
select cgpa from student_2
union
-- intersect
-- except
select cgpa from student_2;

In [None]:
%%sql
select *
from student_2
where cgpa <= 3.6 or cgpa > 3.6
-- or cgpa is null;

In [None]:
%%sql
-- Nulls are skiped in joins
select *
from student_2 s1
join student_2 s2 on (s1.cgpa = s2.cgpa)
-- join student_2 s2 ON (s1.cgpa = s2.cgpa or (s1.cgpa is null and s2.cgpa is null))

In [None]:
%%sql
-- Create a unique constraint on cgpa
ALTER TABLE student_2 ADD CONSTRAINT unqiue_cgpa UNIQUE (cgpa);

In [None]:
%%sql
-- Can't insert a tuple with a duplicate value for cgpa
insert into student_2 values (1, 'Hello', 'World', 'StG', null, 3.13);

In [None]:
%%sql
-- **Can** insert multiple tuples with cgpa = null
insert into student_2 values (1, 'Hello', 'World', 'StG', null, null);

In [None]:
%%sql
select *
from student_2;

https://www.postgresql.org/docs/8.2/static/ddl-constraints.html#AEN2058

> In general, a unique constraint is violated when there are two or more rows in the table where the values of all of the columns included in the constraint are equal. However, two null values are not considered equal in this comparison. **That means even in the presence of a unique constraint it is possible to store duplicate rows that contain a null value in at least one of the constrained columns. This behavior conforms to the SQL standard, but we have heard that other SQL databases may not follow this rule.** So be careful when developing applications that are intended to be portable.



## Subqueries

### Worksheet, Q1

In [None]:
%%sql
SELECT sid, dept||cnum as course, grade
FROM Took,
(
    SELECT *
    FROM Offering
    WHERE instructor = 'Horton'
) Hoffering
WHERE Took.oid = Hoffering.oid;

### Worksheet, Q2

In [None]:
%%sql
SELECT sid, surname
FROM Student
WHERE cgpa >
(
    SELECT cgpa
    FROM Student
    WHERE sid = 99999  -- 11111
);

In [None]:
%%sql
select * from student_2;

In [None]:
%%sql
-- Be careful with nulls
SELECT sid, surname
FROM Student
WHERE cgpa >
(
    SELECT cgpa
    FROM student_2
    WHERE sid = 11111
);

### The operator `ANY` / `ALL`

In [None]:
%%sql
SELECT sid, surname
FROM Student
WHERE cgpa > all
(
    SELECT cgpa
    FROM Student
    WHERE campus = 'StG'
);

In [None]:
%%sql
select * from student

### Worksheet, Q3

In [None]:
%%sql
SELECT sid, dept||cnum AS course, grade
FROM Took NATURAL JOIN Offering
WHERE grade >= 80 AND
(cnum, dept) IN (
    SELECT cnum, dept
    FROM Took NATURAL JOIN Offering NATURAL JOIN Student
    WHERE surname = 'Lakemeyer'
);

### The Operator `EXISTS`

In [None]:
%%sql
SELECT surname, cgpa
FROM Student
WHERE EXISTS (
    SELECT *
    FROM Took
    WHERE Student.sid = Took.sid and
    grade > 85
);

### Worksheet, Q5

In [None]:
%%sql
SELECT instructor
FROM Offering Off1
WHERE NOT EXISTS (
    SELECT *
    FROM Offering
    WHERE oid <> Off1.oid
    AND instructor = Off1.instructor
);

### Worksheet, Q6

In [None]:
%%sql
SELECT DISTINCT oid
FROM Took
WHERE EXISTS (
    SELECT *
    FROM Took t, Offering o
    WHERE t.oid = o.oid
    AND t.oid <> Took.oid
    AND o.dept = 'CSC'
    AND took.sid = t.sid
);