# Introduction to SQL with Python

## Introduction and basic syntax

### Topics

01. Introduction
    * What is SQL and why use it
    * Magic

02. Basic syntax 
    * Connecting to our database
    * Table commands : CREATE, DROP
    * Simple queries : SELECT, WHERE, AND, OR, IN, ORDER
    * INSERT, SELECT, UPDATE, DELETE
    * More syntax : operators, LIKE, NULL, NOT

03. Functions
    * AVG(), COUNT(), SUM(), GROUP BY, DISTINCT

04. Combining Tables
    * JOIN

### Introduction
#### Why use a relational database?

* relational data
* concurrent access
* largish dataset
* advanced query language

#### Different relational databases

##### Server based:

* Postrgresql
* MySQL (MariaDB)
* Oracle
* MS SQL

##### File based:

* MS Access
* Sqlite

##### Big Data:

* Google BigQuery

#### Database Normalization

> Database normalization, or simply normalization, is the process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity.

First normal form enforces these criteria:

* Eliminate repeating groups in individual tables.
* Create a separate table for each set of related data.
* Identify each set of related data with a primary key


#### Customer Table

| Customer ID | First Name | Surname   | Telephone Number |
|-------------|------------|-----------|------------------|
| 123         | Robert     | Ingram    | 555-861-2025     |
| 456         | Jane       | Wright    | 555-403-1659     |
| 789         | Maria      | Fernandez | 555-808-9633     |


#### Customer Table with multiple Telephone numbers

| Customer ID | First Name | Surname   | Telephone Number |
|-------------|------------|-----------|---------------------------|
| 123         | Robert     | Ingram    | 555-861-2025 |
| 456         | Jane       | Wright    | 555-403-1659 555-776-4100 |
| 789         | Maria      | Fernandez | 555-808-9633 | 

#### Customer Name

| Customer ID | First Name | Surname   |
|-------------|------------|-----------|
| 123         | Robert     | Ingram    |
| 456         | Jane       | Wright    |
| 789         | Maria      | Fernandez |


#### Customer Telephone Number

| Customer ID | Telephone Number |
|-------------|------------------|
| 123         | 555-861-2025     |
| 456         | 555-403-1659     |
| 456         | 555-776-4100     |
| 789         | 555-808-9633     |

### Magic

To simplify SQL handling we're going to use some magic.<br/>
This is only available in notebooks, so you can't use this in Python scripts.<br>
See the Pure_python notebook use see the examples without this magic.

First make sure the extension is installed.<br/>
This will generate some noise.<br/>
This only needs to be done once on each Anaconda installation, comment the command once it's done.

In [None]:
# !pip install --user ipython-sql

Load the extension so we can use the magic.<br/>
This will probably result in a warning, which can be ignored.

In [None]:
%load_ext sql

### Basic syntax
#### Syntax pointers
* Single line comment : '--'
* Multi line comment : /\* ... \*/
* Single '=' for equality
* Using %%sql allows a multi-line statement
* In statements with multiple tables, always specify the table

#### Connecting

Let's start by creating a database in memory.<br/>That means it isn't stored on disk, and will be lost when the notebook is closed.<br/>
The output will show we're connected.

In [None]:
%sql sqlite://

#### CREATE

We'll start by creating a table 'writer' with 4 columns : <br/>
first name, last name, year of birth and year of death (omitting the column types for now)<br/>

In [None]:
%%sql

-- This creates the table 'Writer' with its columns (without types)
CREATE TABLE Writer (
    FirstName,
    LastName,
    YearOfBirth,
    YearOfDeath
);

#### DROP
We can also drop tables.<br/>
Make sure you know what you're doing!

In [None]:
%sql DROP TABLE Writer;

Selecting from a non-existing table produces a "no such table" error.

In [None]:
try:
    %sql SELECT * FROM Writer;
except Exception as e:
    print(e)

In [None]:
%%sql

-- Recreate the table
CREATE TABLE Writer (
    FirstName,
    LastName,
    YearOfBirth,
    YearOfDeath
);

#### INSERT
Let's insert some data into the table.

In [None]:
%%sql

-- Insert one row per statement
INSERT INTO Writer VALUES
    ('William', 'Shakespeare', NULL, 1516);


In [None]:
%%sql

-- Insert multiple rows in a single statement
INSERT INTO Writer VALUES 
    ('Bertold', 'Brecht', 1898, 1956),
    ('Ernest', 'Hemingway', 1899, 1961),
    ('Oliver', 'Sacks', 1933, 2015), 
    ('Richard', 'Bird', 1943, NULL),
    ('Hans Petter', 'Langtangen', 1962, NULL), 
    ('Jan Jacob', 'Slauerhoff', 1898, 1936),
    ('William', 'Burroughs', 1914, 1987), 
    ('Ira', 'Kalet', 1944, NULL);

#### SELECT
Verify the data is indeed in the database

In [None]:
%%sql

SELECT *
FROM Writer;

#### SELECT

__*__ selects all of the columns.<br/>
Specify the column names for a subset.

In [None]:
%%sql 

SELECT FirstName, LastName 
FROM Writer;

### Simple queries

#### WHERE
Suppose we only want writers with the first name William.

In [None]:
%%sql 

-- Notice the use of '=' instead of '==' 
SELECT * 
FROM Writer 
WHERE FirstName = 'William';

#### AND
Suppose we're not interested in just any William, but also want to filter on last name.

In [None]:
%%sql 

SELECT * 
FROM Writer 
WHERE FirstName = 'William' 
AND LastName = 'Shakespeare'; 

#### OR
What if we want every writer with the first name 'Bertold' or 'Oliver'.

In [None]:
%%sql

SELECT * 
FROM Writer 
WHERE FirstName = 'Bertold' 
OR FirstName = 'Oliver';

#### IN
When looking for a lot of matches, the query might become very long.

In [None]:
%%sql

SELECT * 
FROM Writer 
WHERE FirstName IN ('Bertold', 'Oliver'); 

#### ORDER
List all the writers, ordered by their first name.

In [None]:
%%sql

SELECT * 
FROM Writer 
ORDER BY FirstName ASC; -- The default is ASC, in this case optional 

#### ORDER
Reverse order is also possible with the DESC keyword.

In [None]:
%%sql

SELECT * 
FROM Writer 
ORDER BY FirstName DESC;

#### UPDATE
Unfortunately we made a mistake, William Burroughs passed away in 1997, not 1987.<br/>

In [None]:
%%sql

-- Correct the data in a single row
UPDATE Writer 
SET YearOfDeath = 1997 
WHERE FirstName = 'William'
AND LastName = 'Burroughs';

-- Verify
SELECT * from Writer WHERE LastName = 'Burroughs';

#### DELETE
It's also possible to delete rows.
We'll start with a single row.

In [None]:
%%sql 

-- Delete all writers named 'Ernest Hemingway'
DELETE FROM Writer 
WHERE FirstName = 'Ernest' 
AND LastName = 'Hemingway';

In [None]:
%sql SELECT * from Writer;

Delete multiple rows.

In [None]:
%%sql 

-- Delete all writers with first name 'William';
DELETE FROM Writer 
WHERE FirstName = 'William';

-- Verify
SELECT * FROM Writer;

In [None]:
%%sql 

-- Re-insert the deleted rows
INSERT INTO Writer VALUES
    ('William', 'Shakespeare', NULL, 1616),
    ('William', 'Burroughs', 1914, 1997),
    ('Ernest', 'Hemingway', 1899, 1961);

### More syntax

#### Operators
SQL use the same comparison operators as Python (<, >, =).

In [None]:
%%sql

SELECT * 
FROM Writer 
WHERE YearOfDeath < 1960;

#### Operators

In [None]:
%%sql 

SELECT * 
FROM Writer 
WHERE YearOfDeath >= 1961;

#### LIKE

Use the LIKE operator in a WHERE clause to search for a specified pattern in a column.<br/>
Use a '%' zero or more characters, '_' for a single character.

In [None]:
%%sql 

-- Search for last names starting with 'slauer' (case-insensitive)
SELECT * 
FROM Writer 
WHERE LastName LIKE 'slauer%';

#### LIKE

In [None]:
%%sql 

-- Search for last names containing an 'e'
SELECT * 
FROM Writer 
WHERE LastName LIKE '%e%';

#### NULL
Find all the still living writers, i.e. with no year of death.

In [None]:
%%sql 

-- NULL doesn't work with '='
SELECT * 
FROM Writer 
WHERE YearOfDeath = NULL;

#### NULL

In [None]:
%%sql 

-- For NULL 'IS' is needed
SELECT * 
FROM Writer 
WHERE YearOfDeath IS NULL;

#### NOT
Find all the writers that didn't die in 1936.

In [None]:
%%sql

-- Notice the writers writers without a year_of_death are missing
SELECT * 
FROM Writer 
WHERE YearOfDeath != 1936;

#### NOT

In [None]:
%%sql

SELECT * 
FROM Writer 
WHERE NOT YearOfDeath IS 1936;