# Some more SQL, and Database modeling

In [1]:
%load_ext sql

In [2]:
%sql sqlite:///lect04.sqlite

u'Connected: @lect04.sqlite'

## Two queries from the last lecture

Last time we solved the following two problems:

**Exercise:** Show the average `size_hs` for applications to
the different colleges, order by descending size.

We solved the problem using a view:

In [3]:
%%sql
DROP VIEW IF EXISTS student_applications;
CREATE VIEW student_applications(s_id, c_name) AS
    SELECT DISTINCT s_id, c_name
    FROM            applications

 * sqlite:///lect04.sqlite
Done.
Done.


[]

and used the view in a simple `SELECT` statement:

In [4]:
%%sql
SELECT   c_name, avg(size_hs)
FROM     student_applications
JOIN     students
USING    (s_id)
GROUP BY c_name

 * sqlite:///lect04.sqlite
Done.


c_name,avg(size_hs)
Berkeley,1100.0
Cornell,1000.0
MIT,966.666666667
Stanford,780.0


Now, let's try to solve the problem with a subquery instead of a view:

In [6]:
%%sql
SELECT   c_name, avg(size_hs)
FROM     student_applications
JOIN     students
USING    (s_id)
GROUP BY c_name

 * sqlite:///lect04.sqlite
Done.


c_name,avg(size_hs)
Berkeley,1100.0
Cornell,1000.0
MIT,966.666666667
Stanford,780.0


**Exercise:** (from last time) _Who has shared the chemistry prize
with exactly one other laureate in years when the summer olympics were
held in Europe?_ Solve the problem using several CTEs.

We came up with the following solution:

In [8]:
%%sql
WITH
  european_summer_olympics(year) AS (
      SELECT year
      FROM   olympics
      WHERE  season = 'summer' AND continent = 'Europe'
  ),
  chemistry_laureates(name, year) AS (
      SELECT name, year
      FROM   nobel
      WHERE  category = 'chemistry'
  ),
  years_with_two_chemistry_laureates(year) AS (
      SELECT year
      FROM   chemistry_laureates
      GROUP BY year
      HAVING count() = 2
  ),
  relevant_years(year) AS (
      SELECT year
      FROM   european_summer_olympics
      
      INTERSECT
      
      SELECT year
      FROM   years_with_two_chemistry_laureates
  )
SELECT year, name
FROM   chemistry_laureates
WHERE  year IN (
    SELECT year
    FROM   relevant_years
)

 * sqlite:///lect04.sqlite
Done.


year,name
1912,Paul Sabatier
1912,Victor Grignard
1952,Archer John Porter Martin
1952,Richard Laurence Millington Synge
2012,Brian K. Kobilka
2012,Robert J. Lefkowitz


Now, try to replace the subquery with something else:

In [7]:
%%sql
WITH
  european_summer_olympics(year) AS (
      SELECT year
      FROM   olympics
      WHERE  season = 'summer' AND continent = 'Europe'
  ),
  chemistry_laureates(name, year) AS (
      SELECT name, year
      FROM   nobel
      WHERE  category = 'chemistry'
  ),
  years_with_two_chemistry_laureates(year) AS (
      SELECT year
      FROM   chemistry_laureates
      GROUP BY year
      HAVING count() = 2
  ),
  relevant_years(year) AS (
      SELECT year
      FROM   european_summer_olympics
      
      INTERSECT
      
      SELECT year
      FROM   years_with_two_chemistry_laureates
  )
SELECT year, name
FROM   chemistry_laureates
JOIN relevant_years
USING (year)


 * sqlite:///lect04.sqlite
Done.


year,name
1912,Paul Sabatier
1912,Victor Grignard
1952,Archer John Porter Martin
1952,Richard Laurence Millington Synge
2012,Brian K. Kobilka
2012,Robert J. Lefkowitz


## Defining a database in SQL

As we saw in the slides, we have a number of different
primitive types at our disposal -- exactly which depends on
what database we're using. These are some common types:

- integers: `INT`, `INTEGER`
- real numbers: `REAL`, `DECIMAL(p,s)`
- strings: `TEXT`, `CHAR(n)`, `VARCHAR(n)`
- date/time: `DATE`, `TIME`, `TIMESTAMP`

Some databases also have `BLOB` (binary large object), and
other types, but we won't discuss them in this course.

To create a table, we use the `CREATE TABLE` command, to
create the `student` table we've seen above, we write:

In [9]:
%%sql
DROP TABLE IF EXISTS students;

CREATE TABLE students(
  s_id     INT,
  s_name   TEXT,
  gpa      REAL,
  size_hs  INT
);

 * sqlite:///lect04.sqlite
Done.
Done.


[]

We want `s_id` to be the primary key of this table, and
there are two ways to do that:

In [10]:
%%sql
DROP TABLE IF EXISTS students;

CREATE TABLE students(
  s_id     INT PRIMARY KEY,
  s_name   TEXT,
  gpa      REAL,
  size_hs  INT
);

 * sqlite:///lect04.sqlite
Done.
Done.


[]

or

In [11]:
%%sql
DROP TABLE IF EXISTS students;

CREATE TABLE students(
  s_id     INT,
  s_name   TEXT,
  gpa      REAL,
  size_hs  INT,
  PRIMARY KEY (s_id)
);

 * sqlite:///lect04.sqlite
Done.
Done.


[]

The latter works if we want more than one column in our key.

If we change the type of `s_id` to `INTEGER`, we can make
SQLite3 generate unique values of `s_id` for us when we
insert new students -- more about that below.


Often we make sure there is no table with the same name
before we create a table:

In [None]:
%%sql
DROP TABLE IF EXISTS students;

CREATE TABLE students ...

For our `applications` table, we want a boolean attribute
`decision`, but not all databases have a `BOOLEAN` type, and
then we often use an `INT` instead (0 = false, 1 = true).

In [None]:
%%sql
DROP TABLE IF EXISTS applications;

CREATE TABLE applications(
  s_id      INT,
  c_name    TEXT,
  major     TEXT,
  decision  INT     -- 0 = false, 1 = true
);

In the example we've used before, we can use a character
instead:

In [None]:
%%sql
DROP TABLE IF EXISTS applications;

CREATE TABLE applications(
  s_id      INT,
  c_name    TEXT,
  major     TEXT,
  decision  CHAR(1)    -- 'Y' or 'N'
);

In this table, we want `s_id` to be a value in the
`students` table, and `c_name` to be a value in the
`colleges` table, and we make sure they are by marking them
as `FOREIGN KEY`:

In [None]:
%%sql
DROP TABLE IF EXISTS applications;

CREATE TABLE applications(
  s_id        INT,
  c_name      TEXT,
  major       TEXT,
  decision    CHAR(1),    -- 'Y' or 'N'
  FOREIGN KEY (s_id) REFERENCES students(s_id),
  FOREIGN KEY (c_name) REFERENCES colleges(c_name)
);

In its current state, `application` is a weak entity set, we
can define (`s_id`, `c_name`, `major`) as a key like this:

In [None]:
%%sql
DROP TABLE IF EXISTS applications;

CREATE TABLE applications(
  s_id        INT,
  c_name      TEXT,
  major       TEXT,
  decision    CHAR(1),    -- 'Y' or 'N'
  PRIMARY KEY (s_id, c_name, major),
  FOREIGN KEY (s_id) REFERENCES students(s_id),
  FOREIGN KEY (c_name) REFERENCES colleges(c_name)
);

This ensures that we will not have duplicate applications
(i.e., one student applies to the same college/major more
than one time). Another way of doing that is to use the
`UNIQUE` keyword:

In [None]:
%%sql
DROP TABLE IF EXISTS applications;

CREATE TABLE applications(
  s_id      INT,
  c_name    TEXT,
  major     TEXT,
  decision  CHAR(1),    -- 'Y' or 'N'
  UNIQUE (s_id, c_name, major),
  FOREIGN KEY (s_id) REFERENCES students(s_id),
  FOREIGN KEY (c_name) REFERENCES colleges(c_name)
);

**Exercise:** _Write SQL statements to create the tables in
our database for books and authors (assume that books can be
written by several authors)._

In [13]:
%%sql

DROP TABLE IF EXISTS BOOKS;

CREATE TABLE books(
isbn      INT PRIMARY KEY,
author    TEXT
);


 * sqlite:///lect04.sqlite
Done.
Done.


[]

In [None]:
%%sql

INSERT



In [None]:
%%sql


## Inserting, updating, and removing rows in SQL

Let's first set things up:

In [14]:
%%sql
DROP TABLE IF EXISTS students;

CREATE TABLE students(
  s_id        INTEGER,
  s_name      TEXT,
  gpa         REAL,
  size_hs     INT,
  PRIMARY KEY (s_id)
);

DROP TABLE IF EXISTS colleges;

CREATE TABLE colleges(
  c_name      TEXT,
  state       TEXT,
  enrollment  INT,
  PRIMARY KEY (c_name)
);

DROP TABLE IF EXISTS applications;

CREATE TABLE applications(
  s_id        INTEGER,
  c_name      TEXT,
  major       TEXT,
  decision    CHAR(1) DEFAULT 'N',
  PRIMARY KEY (s_id, c_name, major),
  FOREIGN KEY (s_id) REFERENCES students(s_id),
  FOREIGN KEY (c_name) REFERENCES colleges(c_name)
);

DELETE FROM students;
DELETE FROM colleges;
DELETE FROM applications;

 * sqlite:///lect04.sqlite
Done.
Done.
Done.
Done.
Done.
Done.
0 rows affected.
0 rows affected.
0 rows affected.


[]

To insert a row, we can use the `INSERT` statement:

In [None]:
%%sql
INSERT
INTO   students(s_id, s_name, gpa, size_hs)
VALUES (123, 'Amy', 3.9, 1000),
       (234, 'Bob', 3.6, 1500),
       (345, 'Craig', 3.5, 500),
       (456, 'Doris', 3.9, 1000),
       (567, 'Edward', 2.9, 2000),
       (678, 'Fay', 3.8, 200),
       (789, 'Gary', 3.4, 800),
       (987, 'Helen', 3.7, 800),
       (876, 'Irene', 3.9, 400),
       (765, 'Jay', 2.9, 1500),
       (654, 'Amy', 3.9, 1000),
       (543, 'Craig', 3.4, 2000);

Here we have an invented key (`s_id`) -- and since it is an
`INTEGER` value and a `PRIMARY KEY`, SQLite3 can generate
values for it automagically, we don't have to provide `s_id`
ourselves:

In [None]:
%%sql
INSERT
INTO   students(s_name, gpa, size_hs)
VALUES ('Amy', 3.9, 1000),
       ('Bob', 3.6, 1500),
       (...);

One potential problem with doing this, is that we will
probably get integers from 1 and upwards, and if someone
would get hold of our `s_id`s, they could pretty easily
figure out how big our database is. It wouldn't be a major
problem in this case, but for a web shop it could be very
damaging.

Instead of generating consecutive integers as invented keys,
we can use a _uuid(4)_, which is a 122-bit random value.
Some databases have built in support for uuid, but
unfortunately SQLite3 hasn't (at least not without an
extension).

As a primitive replacement for a uuid SQLite3 provides the
`randomblob`-function, we can use it like this:

In [None]:
%%sql
DROP TABLE IF EXISTS students;

CREATE TABLE students(
  s_id        TEXT DEFAULT (lower(hex(randomblob(16)))),
  s_name      TEXT,
  gpa         REAL,
  size_hs     INT,
  PRIMARY KEY (s_id)
);

Inserting into the `applications` table is a bit more
involved than inserting into `students`. We use an invented
key (`s_id`) to refer to our students in `applications`, so
we must find the `s_id` for a student before we insert an
application for them.

In the case of 'Irene' (who has a unique name), we can do it
using a special `INSERT INTO ... SELECT`-statement:

In [None]:
%%sql
INSERT
INTO   applications(s_id, c_name, major)
SELECT s_id, 'Stanford', 'CS'
FROM   students
WHERE  s_name = 'Irene';

It works, but it's much harder than it would have been if we
had a simple natural key like the name (since we have two
'Amy's, this is a non-starter, though):

In [None]:
%%sql
INSERT
INTO   applications(s_name, c_name, major)
VALUES ('Irene', 'Stanford', 'CS'),
       ('Irene', 'MIT', 'biology'),
       ('Irene', 'MIT', 'marine biology');

To remove rows from a table, we can use the `DELETE` clause,
to remove all Bobs applications, we write

In [None]:
%%sql
DELETE
FROM   applications
WHERE  s_id = 234;

If we don't have Bob's `s_id`, we can find it using:

In [None]:
%%sql
DELETE
FROM   applications
WHERE  s_id IN (
         SELECT s_id
         FROM   students
         WHERE  s_name = 'Bob');

We must be careful when we use `DELETE`, since the innocent
looking

In [None]:
%%sql
DELETE
FROM   applications

will remove _everything_ from the table.

We can update rows in a similar vein, to set Irene's `gpa`
to 4.0, we write:

In [None]:
%%sql
UPDATE students
SET    gpa = 4.0
WHERE  s_name = 'Irene';

We can mark that her application to study biology at MIT has
been approved (this would approve applications to biology at
MIT by all students named 'Irene'):

In [None]:
%%sql
UPDATE applications
SET    decision = 'Y'
WHERE  c_name = 'MIT' AND
       major = 'biology' AND
       s_id IN (
           SELECT s_id
           FROM   students
           WHERE  s_name = 'Irene');