In [2]:
# The following commands load the requiste modules. 
# **NOTE: If there is a warning, it doesn't seem to affect things.**

%load_ext sql
%sql postgresql://postgres:postgres@localhost/university

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


We can now run SQL commands using `magic` commands, which is an extensibility mechanism provided by Jupyter.

- `%sql` is for single-line commands
- `%%sql` allows multi-line SQL commands

# University Database
Below we will use the University database from the class textbook. The University Dataset is the same as the one discussed in the book, and contains randomly populated information about students, courses, and instructors in a university. 

You should follow the rest of the Notebook along with the appropriate sections in the book.
Each section in the notebook is tagged with the corresponding section in the book.

The schema diagram for the database is as follows:
<center><img src="https://github.com/umddb/cmsc424-fall2015/raw/master/postgresql-setup/university.png" width=800px></center>

One drawback of this way of accessing the database is that we can only run valid SQL -- the commands like `\d` provided by `psql` are not available to us.

Instead, we will need to query the system catalog (metadata) directly
- The first command below is equivalent to `\d`
- The second one is similar to `\d instructor`.

In [3]:
%%sql
-- Print all the tables.
SELECT table_schema, table_name FROM information_schema.tables
    WHERE table_type = 'BASE TABLE' AND
    table_schema NOT IN ('pg_catalog', 'information_schema', 'priv');

 * postgresql://postgres:***@localhost/university
11 rows affected.


table_schema,table_name
public,department
public,course
public,instructor
public,section
public,classroom
public,teaches
public,student
public,takes
public,advisor
public,time_slot


You can see that there are:
- some tables that describe objects (e.g., `student`, `course`, `time_slot`, `classroom`, `instructor`); and
- other tables that describe "relationships" between objects (e.g., `takes`)

- `department`: info about department
- `course`: info about courses
- `instructor`: info about instructors
- `takes`: binds student with taken courses
- `section`: binds courses with time and location
- `student`: info about students
- `advisor`: binds students and instructors
- `time_slot`: schedule of each time slot
- `classroom`: info about the classrooms
- `teaches`: binds instructors with classes
- `prereq`: relationship between courses

In [4]:
%%sql
-- Print schema for instructor.
SELECT column_name, data_type
    FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'instructor';

 * postgresql://postgres:***@localhost/university
4 rows affected.


column_name,data_type
salary,numeric
id,character varying
name,character varying
dept_name,character varying


In [5]:
%%sql
--SELECT * FROM takes LIMIT 4;
--SELECT * FROM student LIMIT 4;
--SELECT * FROM section LIMIT 4;
--SELECT * FROM course LIMIT 4;
--SELECT * FROM department LIMIT 4;
--SELECT * FROM advisor LIMIT 4;
--SELECT * FROM time_slot LIMIT 4;
--SELECT * FROM classroom LIMIT 4;
--SELECT * FROM teaches LIMIT 4;
--SELECT * FROM prereq LIMIT 4;
SELECT * FROM instructor LIMIT 4;

 * postgresql://postgres:***@localhost/university
4 rows affected.


id,name,dept_name,salary
10101,Srinivasan,Comp. Sci.,65000.0
12121,Wu,Finance,90000.0
15151,Mozart,Music,40000.0
22222,Einstein,Physics,95000.0


In [6]:
%%sql
-- Print table instructor.
SELECT * FROM instructor;

 * postgresql://postgres:***@localhost/university
12 rows affected.


id,name,dept_name,salary
10101,Srinivasan,Comp. Sci.,65000.0
12121,Wu,Finance,90000.0
15151,Mozart,Music,40000.0
22222,Einstein,Physics,95000.0
32343,El Said,History,60000.0
33456,Gold,Physics,87000.0
45565,Katz,Comp. Sci.,75000.0
58583,Califieri,History,62000.0
76543,Singh,Finance,80000.0
76766,Crick,Biology,72000.0


## Creating schema

You can take a look at the `DDL.sql` file to see how the tables we are using are created. We won't try to run those commands here since they will only give errors.

In [6]:
!cat DDL.sql

drop table if exists prereq;
drop table if exists time_slot;
drop table if exists advisor;
drop table if exists takes;
drop table if exists student;
drop table if exists teaches;
drop table if exists section;
drop table if exists instructor;
drop table if exists course;
drop table if exists department;
drop table if exists classroom;

create table classroom
	(building		varchar(15),
	 room_number		varchar(7),
	 capacity		numeric(4,0),
	 primary key (building, room_number)
	);

create table department
	(dept_name		varchar(20), 
	 building		varchar(15), 
	 budget		        numeric(12,2) check (budget > 0),
	 primary key (dept_name)
	);

create table course
	(course_id		varchar(8), 
	 title			varchar(50), 
	 dept_name		varchar(20),
	 credits		numeric(2,0) check (credits > 0),
	 primary key (course_id),
	 foreign key (dept_name) references department
		on delete set null
	);

create table instructor
	(ID			varchar(5), 
	 name			varchar(20) not null, 
	 

## Populating data

The DB is populated with one of the scripts
- smallRelationsInsertFile.sql
- largeRelationsInsertFile.sql

In [7]:
!cat smallRelationsInsertFile.sql

delete from prereq;
delete from time_slot;
delete from advisor;
delete from takes;
delete from student;
delete from teaches;
delete from section;
delete from instructor;
delete from course;
delete from department;
delete from classroom;
insert into classroom values ('Packard', '101', '500');
insert into classroom values ('Painter', '514', '10');
insert into classroom values ('Taylor', '3128', '70');
insert into classroom values ('Watson', '100', '30');
insert into classroom values ('Watson', '120', '50');
insert into department values ('Biology', 'Watson', '90000');
insert into department values ('Comp. Sci.', 'Taylor', '100000');
insert into department values ('Elec. Eng.', 'Taylor', '85000');
insert into department values ('Finance', 'Painter', '120000');
insert into department values ('History', 'Painter', '50000');
insert into department values ('Music', 'Packard', '80000');
insert into department values ('Physics', 'Watson', '70000');
insert into course valu

In [8]:
%%sql
-- Test connection showing one table.
SELECT * FROM takes;

 * postgresql://postgres:***@localhost/university
22 rows affected.


id,course_id,sec_id,semester,year,grade
128,CS-101,1,Fall,2009,A
128,CS-347,1,Fall,2009,A-
12345,CS-101,1,Fall,2009,C
12345,CS-190,2,Spring,2009,A
12345,CS-315,1,Spring,2010,A
12345,CS-347,1,Fall,2009,A
19991,HIS-351,1,Spring,2010,B
23121,FIN-201,1,Spring,2010,C+
44553,PHY-101,1,Fall,2009,B-
45678,CS-101,1,Fall,2009,F


In [9]:
%%sql
-- Find the names of all instructors.

 * postgresql://postgres:***@localhost/university
(psycopg2.ProgrammingError) can't execute an empty query
[SQL: -- Find the names of all instructors.]
(Background on this error at: https://sqlalche.me/e/14/f405)


# (3.2) SQL Data definition

In [10]:
%%sql
-- Delete the relation.
DROP TABLE IF EXISTS department_tmp;
-- Create a relation.
CREATE TABLE department_tmp (
    dept_name varchar(20),
    building varchar(15),
    -- 12 digits, 2 digits after decimal point.
    budget numeric(12, 2),
    PRIMARY KEY (dept_name)
);

 * postgresql://postgres:***@localhost/university
Done.
Done.


[]

In [11]:
%%sql
-- Empty relation.
DELETE FROM department_tmp;
-- Insert.
INSERT INTO department_tmp VALUES ('Packard', '101', '500');
SELECT * FROM department_tmp;

 * postgresql://postgres:***@localhost/university
0 rows affected.
1 rows affected.
1 rows affected.


dept_name,building,budget
Packard,101,500.0


In [12]:
%%sql
-- Empty relation.
DELETE FROM department_tmp;

 * postgresql://postgres:***@localhost/university
1 rows affected.


[]

In [13]:
%%sql
SELECT * FROM department_tmp;

 * postgresql://postgres:***@localhost/university
0 rows affected.


dept_name,building,budget


In [14]:
%%sql
-- Insert.
INSERT INTO department_tmp VALUES ('Packard', '101', '500');
SELECT * FROM department_tmp;

 * postgresql://postgres:***@localhost/university
1 rows affected.
1 rows affected.


dept_name,building,budget
Packard,101,500.0


In [15]:
%%sql
-- Add an attribute.
ALTER TABLE department_tmp ADD city VARCHAR(20);
SELECT * FROM department_tmp;

 * postgresql://postgres:***@localhost/university
Done.
1 rows affected.


dept_name,building,budget,city
Packard,101,500.0,


In [16]:
%%sql
-- Remove an attribute.
ALTER TABLE department_tmp DROP city;
SELECT * FROM department_tmp;

 * postgresql://postgres:***@localhost/university
Done.
1 rows affected.


dept_name,building,budget
Packard,101,500.0


### (3.3.1) Queries on a single relation

In [17]:
%%sql
-- Projection.
SELECT name FROM instructor;

 * postgresql://postgres:***@localhost/university
12 rows affected.


name
Srinivasan
Wu
Mozart
Einstein
El Said
Gold
Katz
Califieri
Singh
Crick


In [18]:
%%sql
SELECT dept_name FROM instructor;

 * postgresql://postgres:***@localhost/university
12 rows affected.


dept_name
Comp. Sci.
Finance
Music
Physics
History
Physics
Comp. Sci.
History
Finance
Biology


In [19]:
%%sql
SELECT DISTINCT dept_name FROM instructor;

 * postgresql://postgres:***@localhost/university
7 rows affected.


dept_name
Finance
History
Physics
Music
Comp. Sci.
Biology
Elec. Eng.


In [20]:
%%sql
SELECT id, name, dept_name, salary FROM instructor;

 * postgresql://postgres:***@localhost/university
12 rows affected.


id,name,dept_name,salary
10101,Srinivasan,Comp. Sci.,65000.0
12121,Wu,Finance,90000.0
15151,Mozart,Music,40000.0
22222,Einstein,Physics,95000.0
32343,El Said,History,60000.0
33456,Gold,Physics,87000.0
45565,Katz,Comp. Sci.,75000.0
58583,Califieri,History,62000.0
76543,Singh,Finance,80000.0
76766,Crick,Biology,72000.0


In [21]:
%%sql
SELECT id, name, dept_name, salary * 1.1 FROM instructor LIMIT 4;

 * postgresql://postgres:***@localhost/university
4 rows affected.


id,name,dept_name,?column?
10101,Srinivasan,Comp. Sci.,71500.0
12121,Wu,Finance,99000.0
15151,Mozart,Music,44000.0
22222,Einstein,Physics,104500.0


In [22]:
%%sql
SELECT name FROM instructor WHERE dept_name = 'Comp. Sci.';

 * postgresql://postgres:***@localhost/university
3 rows affected.


name
Srinivasan
Katz
Brandt


In [23]:
%%sql
SELECT name FROM instructor WHERE dept_name = 'Comp. Sci.' AND salary > 70000;

 * postgresql://postgres:***@localhost/university
2 rows affected.


name
Katz
Brandt


### (3.3.2) Queries on multiple relations

In [24]:
%%sql
SELECT * FROM instructor LIMIT 4;

 * postgresql://postgres:***@localhost/university
4 rows affected.


id,name,dept_name,salary
10101,Srinivasan,Comp. Sci.,65000.0
12121,Wu,Finance,90000.0
15151,Mozart,Music,40000.0
22222,Einstein,Physics,95000.0


In [25]:
%%sql
SELECT * FROM department LIMIT 4;

 * postgresql://postgres:***@localhost/university
4 rows affected.


dept_name,building,budget
Biology,Watson,90000.0
Comp. Sci.,Taylor,100000.0
Elec. Eng.,Taylor,85000.0
Finance,Painter,120000.0


In [26]:
%%sql
-- Find the name of instructors with their dept name and dept building name.
-- It is a join.
SELECT name, instructor.dept_name, building
    FROM instructor, department
    WHERE instructor.dept_name = department.dept_name;

 * postgresql://postgres:***@localhost/university
12 rows affected.


name,dept_name,building
Srinivasan,Comp. Sci.,Taylor
Wu,Finance,Painter
Mozart,Music,Packard
Einstein,Physics,Watson
El Said,History,Painter
Gold,Physics,Watson
Katz,Comp. Sci.,Taylor
Califieri,History,Painter
Singh,Finance,Painter
Crick,Biology,Watson


In [27]:
%%sql
-- Cartesian product of two relations.
SELECT * FROM instructor, teaches;

 * postgresql://postgres:***@localhost/university
180 rows affected.


id,name,dept_name,salary,id_1,course_id,sec_id,semester,year
10101,Srinivasan,Comp. Sci.,65000.0,10101,CS-101,1,Fall,2009
12121,Wu,Finance,90000.0,10101,CS-101,1,Fall,2009
15151,Mozart,Music,40000.0,10101,CS-101,1,Fall,2009
22222,Einstein,Physics,95000.0,10101,CS-101,1,Fall,2009
32343,El Said,History,60000.0,10101,CS-101,1,Fall,2009
33456,Gold,Physics,87000.0,10101,CS-101,1,Fall,2009
45565,Katz,Comp. Sci.,75000.0,10101,CS-101,1,Fall,2009
58583,Califieri,History,62000.0,10101,CS-101,1,Fall,2009
76543,Singh,Finance,80000.0,10101,CS-101,1,Fall,2009
76766,Crick,Biology,72000.0,10101,CS-101,1,Fall,2009


In [28]:
%%sql
-- Find instructors who have taught some course and the courses they taught.
-- Note that the duplicates are not removed.
SELECT name, course_id
    FROM instructor, teaches
    WHERE instructor.ID = teaches.ID;

 * postgresql://postgres:***@localhost/university
15 rows affected.


name,course_id
Srinivasan,CS-101
Srinivasan,CS-315
Srinivasan,CS-347
Wu,FIN-201
Mozart,MU-199
Einstein,PHY-101
El Said,HIS-351
Katz,CS-101
Katz,CS-319
Crick,BIO-101


In [29]:
%%sql
-- Removing the duplicates.
SELECT DISTINCT name, course_id
    FROM instructor, teaches
    WHERE instructor.ID = teaches.ID;

 * postgresql://postgres:***@localhost/university
14 rows affected.


name,course_id
Srinivasan,CS-347
Katz,CS-101
Crick,BIO-301
El Said,HIS-351
Srinivasan,CS-315
Kim,EE-181
Einstein,PHY-101
Brandt,CS-190
Srinivasan,CS-101
Brandt,CS-319


In [30]:
%%sql
-- Find instructors who have taught some course in the CS dept and courses they taught.
SELECT DISTINCT name, course_id
    FROM instructor, teaches
    WHERE instructor.ID = teaches.ID AND
        instructor.dept_name = 'Comp. Sci.';

 * postgresql://postgres:***@localhost/university
7 rows affected.


name,course_id
Brandt,CS-190
Brandt,CS-319
Katz,CS-101
Katz,CS-319
Srinivasan,CS-101
Srinivasan,CS-315
Srinivasan,CS-347


## (3.4) Additional basic operations

In [31]:
%%sql
-- Rename in the SELECT clause.
-- name can be confusing so we can rename it
SELECT DISTINCT name AS instructor_name, course_id
    FROM instructor, teaches
    WHERE instructor.ID = teaches.ID;

 * postgresql://postgres:***@localhost/university
14 rows affected.


instructor_name,course_id
Srinivasan,CS-347
Katz,CS-101
Crick,BIO-301
El Said,HIS-351
Srinivasan,CS-315
Kim,EE-181
Einstein,PHY-101
Brandt,CS-190
Srinivasan,CS-101
Brandt,CS-319


In [32]:
%%sql
-- Rename relations in the WHERE clause.
SELECT DISTINCT T.name, S.course_id
    FROM instructor AS T, teaches AS S
    WHERE T.ID = S.ID;

 * postgresql://postgres:***@localhost/university
14 rows affected.


name,course_id
Srinivasan,CS-347
Katz,CS-101
Crick,BIO-301
El Said,HIS-351
Srinivasan,CS-315
Kim,EE-181
Einstein,PHY-101
Brandt,CS-190
Srinivasan,CS-101
Brandt,CS-319


In [33]:
%%sql
-- Find the names of all instructors whose salary is greater than at least one instructor in the Biology dept.
-- E.g., the minimum salary in the biology dept.
SELECT DISTINCT T.name, T.salary
    FROM instructor AS T, instructor AS S
    WHERE T.salary > S.salary AND S.dept_name = 'Biology';

 * postgresql://postgres:***@localhost/university
7 rows affected.


name,salary
Einstein,95000.0
Wu,90000.0
Katz,75000.0
Brandt,92000.0
Gold,87000.0
Singh,80000.0
Kim,80000.0


In [34]:
%%sql
-- Regex matching.
SELECT dept_name, building
    FROM department
    WHERE building like '%Wats%';

 * postgresql://postgres:***@localhost/university
2 rows affected.


dept_name,building
Biology,Watson
Physics,Watson


In [35]:
%%sql
-- Get the name of all the fields after a join.
SELECT DISTINCT instructor.*, teaches.*
    FROM instructor, teaches
    WHERE instructor.ID = teaches.ID;

 * postgresql://postgres:***@localhost/university
15 rows affected.


id,name,dept_name,salary,id_1,course_id,sec_id,semester,year
32343,El Said,History,60000.0,32343,HIS-351,1,Spring,2010
45565,Katz,Comp. Sci.,75000.0,45565,CS-319,1,Spring,2010
83821,Brandt,Comp. Sci.,92000.0,83821,CS-190,2,Spring,2009
10101,Srinivasan,Comp. Sci.,65000.0,10101,CS-315,1,Spring,2010
83821,Brandt,Comp. Sci.,92000.0,83821,CS-190,1,Spring,2009
98345,Kim,Elec. Eng.,80000.0,98345,EE-181,1,Spring,2009
76766,Crick,Biology,72000.0,76766,BIO-101,1,Summer,2009
12121,Wu,Finance,90000.0,12121,FIN-201,1,Spring,2010
45565,Katz,Comp. Sci.,75000.0,45565,CS-101,1,Spring,2010
22222,Einstein,Physics,95000.0,22222,PHY-101,1,Fall,2009


In [36]:
%%sql
SELECT name
    FROM instructor
    WHERE dept_name = 'Physics'
    ORDER BY name;

 * postgresql://postgres:***@localhost/university
2 rows affected.


name
Einstein
Gold


In [37]:
%%sql
-- Sorting on multiple attributes.
SELECT * FROM instructor
    ORDER BY salary DESC, name ASC;

 * postgresql://postgres:***@localhost/university
12 rows affected.


id,name,dept_name,salary
22222,Einstein,Physics,95000.0
83821,Brandt,Comp. Sci.,92000.0
12121,Wu,Finance,90000.0
33456,Gold,Physics,87000.0
98345,Kim,Elec. Eng.,80000.0
76543,Singh,Finance,80000.0
45565,Katz,Comp. Sci.,75000.0
76766,Crick,Biology,72000.0
10101,Srinivasan,Comp. Sci.,65000.0
58583,Califieri,History,62000.0


## (3.5) Set operations

In [38]:
%%sql
SELECT * FROM course LIMIT 4;

 * postgresql://postgres:***@localhost/university
4 rows affected.


course_id,title,dept_name,credits
BIO-101,Intro. to Biology,Biology,4
BIO-301,Genetics,Biology,4
BIO-399,Computational Biology,Biology,3
CS-101,Intro. to Computer Science,Comp. Sci.,4


In [39]:
%%sql
SELECT * FROM section LIMIT 4;

 * postgresql://postgres:***@localhost/university
4 rows affected.


course_id,sec_id,semester,year,building,room_number,time_slot_id
BIO-101,1,Summer,2009,Painter,514,B
BIO-301,1,Summer,2010,Painter,514,A
CS-101,1,Fall,2009,Packard,101,H
CS-101,1,Spring,2010,Packard,101,F


In [40]:
%%sql
-- Set of all courses taught in Fall 2009 semester.
SELECT DISTINCT c.course_id
    FROM course AS c, section AS s
    WHERE s.semester = 'Fall' AND s.year = '2009'
    ORDER BY c.course_id;

 * postgresql://postgres:***@localhost/university
13 rows affected.


course_id
BIO-101
BIO-301
BIO-399
CS-101
CS-190
CS-315
CS-319
CS-347
EE-181
FIN-201


In [41]:
%%sql
-- Set of all courses taught in Spring 2009 semester.
SELECT DISTINCT c.course_id
    FROM course AS c, section AS s
    WHERE s.semester = 'Spring' AND s.year = '2009';

 * postgresql://postgres:***@localhost/university
13 rows affected.


course_id
BIO-301
CS-347
CS-315
EE-181
MU-199
PHY-101
CS-319
FIN-201
BIO-101
HIS-351


In [42]:
%%sql
(SELECT DISTINCT c.course_id
     FROM course AS c, section AS s
     WHERE s.semester = 'Spring' AND s.year = '2009') 
UNION
(SELECT DISTINCT c.course_id
     FROM course AS c, section AS s
     WHERE s.semester = 'Fall' AND s.year = '2009') 

 * postgresql://postgres:***@localhost/university
13 rows affected.


course_id
BIO-301
CS-347
MU-199
FIN-201
BIO-101
HIS-351
CS-315
EE-181
PHY-101
CS-319


In [43]:
%%sql
(SELECT DISTINCT c.course_id
     FROM course AS c, section AS s
     WHERE s.semester = 'Spring' AND s.year = '2009') 
INTERSECT
(SELECT DISTINCT c.course_id
     FROM course AS c, section AS s
     WHERE s.semester = 'Fall' AND s.year = '2007') 

 * postgresql://postgres:***@localhost/university
0 rows affected.


course_id


## (3.6) NULL values

## (3.7) Aggregate functions

### Count

In [44]:
%%sql
SELECT * FROM instructor;

 * postgresql://postgres:***@localhost/university
12 rows affected.


id,name,dept_name,salary
10101,Srinivasan,Comp. Sci.,65000.0
12121,Wu,Finance,90000.0
15151,Mozart,Music,40000.0
22222,Einstein,Physics,95000.0
32343,El Said,History,60000.0
33456,Gold,Physics,87000.0
45565,Katz,Comp. Sci.,75000.0
58583,Califieri,History,62000.0
76543,Singh,Finance,80000.0
76766,Crick,Biology,72000.0


In [45]:
%%sql
-- Count instructors by department.
SELECT dept_name, count(*)
    FROM instructor
    GROUP BY dept_name
    ORDER BY count;

 * postgresql://postgres:***@localhost/university
7 rows affected.


dept_name,count
Music,1
Biology,1
Elec. Eng.,1
Finance,2
History,2
Physics,2
Comp. Sci.,3


In [46]:
%%sql
-- Compute the average salary of instructors in the CS dept.
SELECT AVG(salary) AS avg_salary
    FROM instructor
    WHERE dept_name = 'Comp. Sci.';

 * postgresql://postgres:***@localhost/university
1 rows affected.


avg_salary
77333.33333333333


In [47]:
%%sql
-- Count the elements in a table.
SELECT COUNT(*) FROM instructor;

 * postgresql://postgres:***@localhost/university
1 rows affected.


count
12


In [48]:
%%sql
-- Count the distinct ids.
SELECT COUNT(DISTINCT ID) FROM instructor;

 * postgresql://postgres:***@localhost/university
1 rows affected.


count
12


In [49]:
%%sql
SELECT *
    FROM teaches
    WHERE semester = 'Spring' and year = '2009';

 * postgresql://postgres:***@localhost/university
3 rows affected.


id,course_id,sec_id,semester,year
83821,CS-190,1,Spring,2009
83821,CS-190,2,Spring,2009
98345,EE-181,1,Spring,2009


In [50]:
%%sql
-- COUNT() counts the number of elements in a group by.
SELECT COUNT (DISTINCT ID)
    FROM teaches
    WHERE semester = 'Spring' and year = '2009';

 * postgresql://postgres:***@localhost/university
1 rows affected.


count
2


In [51]:
%%sql
SELECT COUNT (*) FROM course;

 * postgresql://postgres:***@localhost/university
1 rows affected.


count
13


In [52]:
# %%sql
# -- Distinct doesn't work with count.
# -- SELECT COUNT (DISTINCT *) FROM course;

In [53]:
%%sql
-- Find the average dept in each department.
SELECT dept_name, AVG(salary) AS avg_salary
    FROM instructor 
    GROUP BY dept_name;

 * postgresql://postgres:***@localhost/university
7 rows affected.


dept_name,avg_salary
Finance,85000.0
History,61000.0
Physics,91000.0
Music,40000.0
Comp. Sci.,77333.33333333333
Biology,72000.0
Elec. Eng.,80000.0


In [54]:
%%sql
-- Find the number of instructors in each dept who teach a course in Spring 2007.
SELECT dept_name, COUNT(DISTINCT instructor.ID) AS instr_count
    FROM instructor, teaches
    WHERE instructor.ID = teaches.ID
        AND semester = 'Spring' AND year = 2009
    GROUP BY dept_name;

 * postgresql://postgres:***@localhost/university
2 rows affected.


dept_name,instr_count
Comp. Sci.,1
Elec. Eng.,1


### Having

In [55]:
%%sql
-- Get the department having instructors with an average salary larger than $42k.
SELECT dept_name, AVG(salary) AS avg_salary
    FROM instructor 
    GROUP BY dept_name
    HAVING AVG(salary) > 42000;

 * postgresql://postgres:***@localhost/university
6 rows affected.


dept_name,avg_salary
Finance,85000.0
History,61000.0
Physics,91000.0
Comp. Sci.,77333.33333333333
Biology,72000.0
Elec. Eng.,80000.0


In [56]:
%%sql
-- Report the average total credits of students taking courses in 2009
-- with at least 2 students.
SELECT course_id, semester, year, sec_id, AVG(tot_cred)
    FROM student, takes
    WHERE student.ID = takes.ID AND year = 2009
    GROUP BY course_id, semester, year, sec_id
    HAVING COUNT(student.ID) >= 2;

 * postgresql://postgres:***@localhost/university
3 rows affected.


course_id,semester,year,sec_id,avg
CS-101,Fall,2009,1,65.0
CS-190,Spring,2009,2,43.0
CS-347,Fall,2009,1,67.0


## (3.8) Nested subqueries

In [57]:
%%sql
SELECT course_id FROM section WHERE semester = 'Fall' and year=2009

 * postgresql://postgres:***@localhost/university
3 rows affected.


course_id
CS-101
CS-347
PHY-101


In [58]:
%%sql
SELECT course_id FROM section WHERE semester = 'Spring' and year=2009

 * postgresql://postgres:***@localhost/university
3 rows affected.


course_id
CS-190
CS-190
EE-181


In [59]:
%%sql
-- Find all the courses in either fall 2009 or spring 2009, using nested subquery.
SELECT course_id
    FROM section
    WHERE semester = 'Fall' AND year=2009
        OR course_id IN
            -- Nested query.
            (SELECT course_id FROM section
                WHERE semester = 'Spring' AND year=2009)

 * postgresql://postgres:***@localhost/university
6 rows affected.


course_id
CS-101
CS-190
CS-190
CS-347
EE-181
PHY-101


In [60]:
%%sql
-- Find all the instructors that are not Mozart or Einstein.
SELECT DISTINCT name
    FROM instructor
    WHERE name NOT IN ('Mozart', 'Einstein');

 * postgresql://postgres:***@localhost/university
10 rows affected.


name
Katz
Singh
Kim
Brandt
El Said
Wu
Srinivasan
Crick
Gold
Califieri


In [61]:
%%sql
-- Find the dept with an average salary per instruction larger than $42k.
-- This is an alternative query to the HAVING query.
SELECT tmp.dept_name, tmp.avg_salary
    FROM 
        (SELECT dept_name, AVG(salary) AS avg_salary
          FROM instructor
          GROUP BY dept_name) AS tmp
    WHERE avg_salary > 42000

 * postgresql://postgres:***@localhost/university
6 rows affected.


dept_name,avg_salary
Finance,85000.0
History,61000.0
Physics,91000.0
Comp. Sci.,77333.33333333333
Biology,72000.0
Elec. Eng.,80000.0


In many cases you might find it easier to create temporary tables, especially for queries involving finding "max" or "min". This also allows you to break down the full query AND makes it easier to debug. It is preferable to use the WITH construct for this purpose. The syntax AND support differs across systems, but here is the link to PostgreSQL: http://www.postgresql.org/docs/9.0/static/queries-with.html

These are also called Common Table Expressions (CTEs).

In [62]:
%%sql
-- Find department with the maximum budget.
WITH max_budget(value) as (
        SELECT MAX(budget) FROM department)
    SELECT department.dept_name, budget
        FROM department, max_budget
        WHERE department.budget = max_budget.value

 * postgresql://postgres:***@localhost/university
1 rows affected.


dept_name,budget
Finance,120000.0


## (3.9) Modification of the DB

# Other queries

In [63]:
%%sql
SELECT * FROM course;

 * postgresql://postgres:***@localhost/university
13 rows affected.


course_id,title,dept_name,credits
BIO-101,Intro. to Biology,Biology,4
BIO-301,Genetics,Biology,4
BIO-399,Computational Biology,Biology,3
CS-101,Intro. to Computer Science,Comp. Sci.,4
CS-190,Game Design,Comp. Sci.,4
CS-315,Robotics,Comp. Sci.,3
CS-319,Image Processing,Comp. Sci.,3
CS-347,Database System Concepts,Comp. Sci.,3
EE-181,Intro. to Digital Systems,Elec. Eng.,3
FIN-201,Investment Banking,Finance,3


In [64]:
%%sql
-- Reports the courses with titles containing Biology.
SELECT *
    FROM course
    WHERE title LIKE '%Biology%';

 * postgresql://postgres:***@localhost/university
2 rows affected.


course_id,title,dept_name,credits
BIO-101,Intro. to Biology,Biology,4
BIO-399,Computational Biology,Biology,3


In [65]:
%%sql
-- There are two  courses. How many students are enrolled in the first one (ever)?
SELECT *
    FROM takes
    WHERE course_id = 'BIO-101';

 * postgresql://postgres:***@localhost/university
1 rows affected.


id,course_id,sec_id,semester,year,grade
98988,BIO-101,1,Summer,2009,A


In [66]:
%%sql
-- What about in Summer 2009?
SELECT *
    FROM takes
    WHERE course_id = 'BIO-101' AND year = 2009 AND semester = 'Summer';

 * postgresql://postgres:***@localhost/university
1 rows affected.


id,course_id,sec_id,semester,year,grade
98988,BIO-101,1,Summer,2009,A


### Aggregates


In [67]:
%%sql
--  Count the number of instructors in Finance.
SELECT COUNT(*)
    FROM instructor WHERE dept_name = 'Finance';

 * postgresql://postgres:***@localhost/university
1 rows affected.


count
2


In [68]:
%%sql
-- Find the instructor with the maximum salary using subquery.
SELECT *
    FROM instructor
    WHERE salary =
        (SELECT MAX(salary) FROM instructor);

 * postgresql://postgres:***@localhost/university
1 rows affected.


id,name,dept_name,salary
22222,Einstein,Physics,95000.0


### (3.3.2) Joins AND Cartesian Product

In [69]:
%%sql
-- To find building names for all instructors, we must do a join between two relations.
SELECT name, instructor.dept_name, building
    FROM instructor, department
    WHERE instructor.dept_name = department.dept_name;

 * postgresql://postgres:***@localhost/university
12 rows affected.


name,dept_name,building
Srinivasan,Comp. Sci.,Taylor
Wu,Finance,Painter
Mozart,Music,Packard
Einstein,Physics,Watson
El Said,History,Painter
Gold,Physics,Watson
Katz,Comp. Sci.,Taylor
Califieri,History,Painter
Singh,Finance,Painter
Crick,Biology,Watson


In [70]:
%%sql 
-- Since the join here is a equality join on the common attributes in the two relations:
SELECT name, instructor.dept_name, building
    FROM instructor NATURAL JOIN department;

 * postgresql://postgres:***@localhost/university
12 rows affected.


name,dept_name,building
Srinivasan,Comp. Sci.,Taylor
Wu,Finance,Painter
Mozart,Music,Packard
Einstein,Physics,Watson
El Said,History,Painter
Gold,Physics,Watson
Katz,Comp. Sci.,Taylor
Califieri,History,Painter
Singh,Finance,Painter
Crick,Biology,Watson


In [71]:
%%sql
-- On the other hand, just doing the following (i.e., just the Cartesian Product) will lead to a large number of tuples, most
-- of which are not meaningful.
SELECT name, instructor.dept_name, building
    FROM instructor, department;

 * postgresql://postgres:***@localhost/university
84 rows affected.


name,dept_name,building
Srinivasan,Comp. Sci.,Watson
Wu,Finance,Watson
Mozart,Music,Watson
Einstein,Physics,Watson
El Said,History,Watson
Gold,Physics,Watson
Katz,Comp. Sci.,Watson
Califieri,History,Watson
Singh,Finance,Watson
Crick,Biology,Watson


### Renaming using "as"

In [72]:
%%sql
-- AS can be used to rename tables AND simplify queries.
EXPLAIN
    -- ANALYZE
    SELECT DISTINCT T.name
        FROM instructor AS T, instructor AS S  
        WHERE T.salary > S.salary AND S.dept_name = 'Biology';

 * postgresql://postgres:***@localhost/university
8 rows affected.


QUERY PLAN
HashAggregate (cost=43.84..45.84 rows=200 width=58)
Group Key: t.name
-> Nested Loop (cost=0.00..43.10 rows=293 width=58)
Join Filter: (t.salary > s.salary)
-> Seq Scan on instructor t (cost=0.00..14.40 rows=440 width=72)
-> Materialize (cost=0.00..15.51 rows=2 width=14)
-> Seq Scan on instructor s (cost=0.00..15.50 rows=2 width=14)
Filter: ((dept_name)::text = 'Biology'::text)


**Self-joins** (WHERE two of the relations in the FROM clause are the same) are impossible without using `as`. The following query associates a course with the pre-requisite of one of its pre-requisites. There is no way to disambiguate the columns without some form of renaming.

In [73]:
%%sql
EXPLAIN 
    ANALYZE
        SELECT p1.course_id, p2.prereq_id AS pre_prereq_id
            FROM prereq p1, prereq p2
            WHERE p1.prereq_id = p2.course_id;

 * postgresql://postgres:***@localhost/university
8 rows affected.


QUERY PLAN
Hash Join (cost=29.12..176.18 rows=3612 width=68) (actual time=0.020..0.024 rows=4 loops=1)
Hash Cond: ((p1.prereq_id)::text = (p2.course_id)::text)
-> Seq Scan on prereq p1 (cost=0.00..18.50 rows=850 width=68) (actual time=0.005..0.006 rows=8 loops=1)
-> Hash (cost=18.50..18.50 rows=850 width=68) (actual time=0.009..0.009 rows=8 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on prereq p2 (cost=0.00..18.50 rows=850 width=68) (actual time=0.002..0.004 rows=8 loops=1)
Planning Time: 0.192 ms
Execution Time: 0.041 ms


The small University database doesn't have any chains of this kind. You can try adding a new tuple using a new tuple. Now the query will return an answer.

In [74]:
%sql insert into prereq values ('CS-101', 'PHY-101');

 * postgresql://postgres:***@localhost/university
(psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "prereq_pkey"
DETAIL:  Key (course_id, prereq_id)=(CS-101, PHY-101) already exists.

[SQL: insert into prereq values ('CS-101', 'PHY-101' );]
(Background on this error at: https://sqlalche.me/e/14/gkpj)


In [75]:
%%sql
SELECT p1.course_id, p2.prereq_id AS pre_prereq_id
    FROM prereq p1, prereq p2
    WHERE p1.prereq_id = p2.course_id;

 * postgresql://postgres:***@localhost/university
4 rows affected.


course_id,pre_prereq_id
CS-190,PHY-101
CS-315,PHY-101
CS-319,PHY-101
CS-347,PHY-101


### LIMIT
PostgreSQL allows you to limit the number of results displayed which
is useful for debugging etc. Here is an example.

In [76]:
%sql SELECT * FROM instructor limit 2;

 * postgresql://postgres:***@localhost/university
2 rows affected.


id,name,dept_name,salary
10101,Srinivasan,Comp. Sci.,65000.0
12121,Wu,Finance,90000.0


### Try your own queries
Feel free to use the cells below to write new queries. You can also just modify the above queries directly if you'd like.