# Introduction to relational databases

## What is a relational database?
- Based on relational model of data
- First described by Edgar “Ted” Codd

## Relational model
-  Widely adopted
- Todd’s 12 Rules/Commandments
    - Consists of 13 rules (zero-indexed!)
    - Describes what a Relational Database Management System should adhere to to be considered relational

## Relational Database Management Systems
- PostgreSQL
- MySQL
- SQLite
- SQL = Structured Query Language

---
# Let's practice!

In [1]:
#!pip install sqlalchemy

# Creating a database engine in Python

## Creating a database engine
- SQLite database
    - Fast and simple
- SQLAlchemy
    - Works with many Relational Database Management Systems


```python
In [1]: from sqlalchemy import create_engine
In [2]: engine = create_engine('sqlite:///Northwind.sqlite')
    ```

## Ge!ing table names


In [2]:
from sqlalchemy import create_engine
engine = create_engine('sqlite:///Northwind.sqlite')

table_names = engine.table_names()

print(table_names)

['Category', 'Customer', 'CustomerCustomerDemo', 'CustomerDemographic', 'Employee', 'EmployeeTerritory', 'Order', 'OrderDetail', 'Product', 'Region', 'Shipper', 'Supplier', 'Territory']


### Category:
```python
Index(['Id', 'CategoryName', 'Description'], dtype='object')
```


### Customer
```python
Index(['Id', 'CompanyName', 'ContactName', 'ContactTitle', 'Address', 'City',
       'Region', 'PostalCode', 'Country', 'Phone', 'Fax'],
      dtype='object')
```      
      

### Employee

```python
Index(['Id', 'LastName', 'FirstName', 'Title', 'TitleOfCourtesy', 'BirthDate',
       'HireDate', 'Address', 'City', 'Region', 'PostalCode', 'Country',
       'HomePhone', 'Extension', 'Photo', 'Notes', 'ReportsTo', 'PhotoPath'],
      dtype='object') ```
      
      
### EmployeeTerritory

```python
Index(['Id', 'EmployeeId', 'TerritoryId'], dtype='object')

```


### OrderDetail

```python
Index(['Id', 'OrderId', 'ProductId', 'UnitPrice', 'Quantity', 'Discount'], dtype='object')

```


### Region

```python
Index(['Id', 'RegionDescription'], dtype='object')

```


### Shipper

```python

Index(['Id', 'CompanyName', 'Phone'], dtype='object')
```


### Territory

```python
Index(['Id', 'TerritoryDescription', 'RegionId'], dtype='object')

```


### Product:

```python
Index(['Id', 'ProductName', 'SupplierId', 'CategoryId', 'QuantityPerUnit',
       'UnitPrice', 'UnitsInStock', 'UnitsOnOrder', 'ReorderLevel',
       'Discontinued'],
      dtype='object')
      ```
      

### Supplier:

```python
Index(['Id', 'CompanyName', 'ContactName', 'ContactTitle', 'Address', 'City',
       'Region', 'PostalCode', 'Country', 'Phone', 'Fax', 'HomePage'],
      dtype='object')```

---
# Let’s practice!

In [3]:
# Import necessary module
from sqlalchemy import create_engine

# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')

In [4]:
# Import necessary module
from sqlalchemy import create_engine

# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')

# Save the table names to a list: table_names
table_names = engine.table_names()

# Print the table names to the shell
print(table_names)

['Album', 'Artist', 'Customer', 'Employee', 'Genre', 'Invoice', 'InvoiceLine', 'MediaType', 'Playlist', 'PlaylistTrack', 'Track']


# Querying relational databases in Python

## Basic SQL query

```sql
SELECT * FROM Table_Name
```

- Returns all columns of all rows of the table
- Example:

```sql
SELECT * FROM Orders
```

- We’ll use SQLAlchemy and pandas

## Workflow of SQL querying
- Import packages and functions
- Create the database engine
- Connect to the engine
- Query the database
- Save query results to a DataFrame
- Close the connection

```
['Category', 'Customer', 'CustomerCustomerDemo', 'CustomerDemographic', 'Employee', 'EmployeeTerritory', 'Order', 'OrderDetail', 'Product', 'Region', 'Shipper', 'Supplier', 'Territory']```

## Your first SQL query

In [5]:
from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('sqlite:///Northwind.sqlite')

con = engine.connect()

rs = con.execute("SELECT * FROM Category")

df = pd.DataFrame(rs.fetchall())
con.close()

## Printing your query results

In [6]:
df.head()

Unnamed: 0,0,1,2
0,1,Beverages,"Soft drinks, coffees, teas, beers, and ales"
1,2,Condiments,"Sweet and savory sauces, relishes, spreads, an..."
2,3,Confections,"Desserts, candies, and sweet breads"
3,4,Dairy Products,Cheeses
4,5,Grains/Cereals,"Breads, crackers, pasta, and cereal"


## Set the DataFrame column names

In [7]:
from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('sqlite:///Northwind.sqlite')

con = engine.connect()

rs = con.execute("SELECT * FROM Category")

df = pd.DataFrame(rs.fetchall())

df.columns = rs.keys()
con.close()

In [8]:
df.columns

Index(['Id', 'CategoryName', 'Description'], dtype='object')

## Using the context manager

In [9]:
from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('sqlite:///Northwind.sqlite')


with engine.connect() as con:
    rs = con.execute("SELECT * FROM Customer")
    df = pd.DataFrame(rs.fetchmany(size=5))
    df.columns = rs.keys()


In [10]:
df.head()

Unnamed: 0,Id,CompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax
0,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,030-0076545
1,ANATR,Ana Trujillo Emparedados y helados,Ana Trujillo,Owner,Avda. de la Constitución 2222,México D.F.,,05021,Mexico,(5) 555-4729,(5) 555-3745
2,ANTON,Antonio Moreno Taquería,Antonio Moreno,Owner,Mataderos 2312,México D.F.,,05023,Mexico,(5) 555-3932,
3,AROUT,Around the Horn,Thomas Hardy,Sales Representative,120 Hanover Sq.,London,,WA1 1DP,UK,(171) 555-7788,(171) 555-6750
4,BERGS,Berglunds snabbköp,Christina Berglund,Order Administrator,Berguvsvägen 8,Luleå,,S-958 22,Sweden,0921-12 34 65,0921-12 34 67



---
## Let’s practice!

In [11]:
# Import packages
from sqlalchemy import create_engine
import pandas as pd

# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')

# Open engine connection: con
con = engine.connect()

# Perform query: rs
rs = con.execute("SELECT * FROM Album")

# Save results of the query to DataFrame: df
df = pd.DataFrame(rs.fetchall())

# Close connection
con.close()

# Print head of DataFrame df
print(df.head())

   0                                      1  2
0  1  For Those About To Rock We Salute You  1
1  2                      Balls to the Wall  2
2  3                      Restless and Wild  2
3  4                      Let There Be Rock  1
4  5                               Big Ones  3


In [12]:
# Open engine in context manager
# Perform query and save results to DataFrame: df
with engine.connect() as con:
    rs = con.execute("SELECT LastName, Title FROM Employee ")
    df = pd.DataFrame(rs.fetchmany(size=3))
    df.columns = rs.keys()

# Print the length of the DataFrame df
print(len(df))

# Print the head of the DataFrame df
print(df.head())

3
  LastName                Title
0    Adams      General Manager
1  Edwards        Sales Manager
2  Peacock  Sales Support Agent


In [13]:
# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')

# Open engine in context manager
# Perform query and save results to DataFrame: df
with engine.connect() as con:
    rs = con.execute("SELECT * FROM Employee WHERE  EmployeeID >= 6")
    df = pd.DataFrame(rs.fetchall())
    df.columns = rs.keys()

# Print the head of the DataFrame df
print(df.head())


   EmployeeId  LastName FirstName       Title  ReportsTo            BirthDate  \
0           6  Mitchell   Michael  IT Manager          1  1973-07-01 00:00:00   
1           7      King    Robert    IT Staff          6  1970-05-29 00:00:00   
2           8  Callahan     Laura    IT Staff          6  1968-01-09 00:00:00   

              HireDate                      Address        City State Country  \
0  2003-10-17 00:00:00         5827 Bowness Road NW     Calgary    AB  Canada   
1  2004-01-02 00:00:00  590 Columbia Boulevard West  Lethbridge    AB  Canada   
2  2004-03-04 00:00:00                  923 7 ST NW  Lethbridge    AB  Canada   

  PostalCode              Phone                Fax                    Email  
0    T3B 0C5  +1 (403) 246-9887  +1 (403) 246-9899  michael@chinookcorp.com  
1    T1K 5N8  +1 (403) 456-9986  +1 (403) 456-8485   robert@chinookcorp.com  
2    T1H 1Y8  +1 (403) 467-3351  +1 (403) 467-8772    laura@chinookcorp.com  


In [14]:
# Create engine: engine

engine = create_engine('sqlite:///Chinook.sqlite')

# Open engine in context manager
with engine.connect() as con:
    rs = con.execute("SELECT * FROM  Employee ORDER BY BirthDate")
    df = pd.DataFrame(rs.fetchall())

    # Set the DataFrame's column names
    df.columns = rs.keys()

# Print head of DataFrame
print(df.head())


   EmployeeId  LastName FirstName                Title  ReportsTo  \
0           4      Park  Margaret  Sales Support Agent        2.0   
1           2   Edwards     Nancy        Sales Manager        1.0   
2           1     Adams    Andrew      General Manager        NaN   
3           5   Johnson     Steve  Sales Support Agent        2.0   
4           8  Callahan     Laura             IT Staff        6.0   

             BirthDate             HireDate              Address        City  \
0  1947-09-19 00:00:00  2003-05-03 00:00:00     683 10 Street SW     Calgary   
1  1958-12-08 00:00:00  2002-05-01 00:00:00         825 8 Ave SW     Calgary   
2  1962-02-18 00:00:00  2002-08-14 00:00:00  11120 Jasper Ave NW    Edmonton   
3  1965-03-03 00:00:00  2003-10-17 00:00:00         7727B 41 Ave     Calgary   
4  1968-01-09 00:00:00  2004-03-04 00:00:00          923 7 ST NW  Lethbridge   

  State Country PostalCode              Phone                Fax  \
0    AB  Canada    T2P 5G3  +1 (403)

# Querying relational databases directly with pandas

## The pandas way to query

In [15]:
from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('sqlite:///Northwind.sqlite')

with engine.connect() as con:
    rs = con.execute("SELECT * FROM Customer")
    df = pd.DataFrame(rs.fetchall())
    df.columns = rs.keys()

In [16]:
df = pd.read_sql_query('SELECT * FROM Customer', engine)

In [17]:
df.head()

Unnamed: 0,Id,CompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax
0,ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,030-0076545
1,ANATR,Ana Trujillo Emparedados y helados,Ana Trujillo,Owner,Avda. de la Constitución 2222,México D.F.,,05021,Mexico,(5) 555-4729,(5) 555-3745
2,ANTON,Antonio Moreno Taquería,Antonio Moreno,Owner,Mataderos 2312,México D.F.,,05023,Mexico,(5) 555-3932,
3,AROUT,Around the Horn,Thomas Hardy,Sales Representative,120 Hanover Sq.,London,,WA1 1DP,UK,(171) 555-7788,(171) 555-6750
4,BERGS,Berglunds snabbköp,Christina Berglund,Order Administrator,Berguvsvägen 8,Luleå,,S-958 22,Sweden,0921-12 34 65,0921-12 34 67


---
# Let’s practice!

In [18]:
# Import packages
from sqlalchemy import create_engine
import pandas as pd

# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')

# Execute query and store records in DataFrame: df
df = pd.read_sql_query("SELECT * FROM Album", engine)

# Print head of DataFrame
print(df.head())

# Open engine in context manager
# Perform query and save results to DataFrame: df1
with engine.connect() as con:
    rs = con.execute("SELECT * FROM Album")
    df1 = pd.DataFrame(rs.fetchall())
    df1.columns = rs.keys()

# Confirm that both methods yield the same result: does df = df1 ?
print(df.equals(df1))


   AlbumId                                  Title  ArtistId
0        1  For Those About To Rock We Salute You         1
1        2                      Balls to the Wall         2
2        3                      Restless and Wild         2
3        4                      Let There Be Rock         1
4        5                               Big Ones         3
True


In [19]:
# Import packages
from sqlalchemy import create_engine
import pandas as pd

# Create engine: engine
engine = create_engine('sqlite:///Chinook.sqlite')

# Execute query and store records in DataFrame: df
df = pd.read_sql_query('''
                    SELECT * 
                    FROM Employee
                    WHERE EmployeeId >=6 
                    ORDER BY BirthDate
''',engine)

# Print head of DataFrame
print(df.head())

   EmployeeId  LastName FirstName       Title  ReportsTo            BirthDate  \
0           8  Callahan     Laura    IT Staff          6  1968-01-09 00:00:00   
1           7      King    Robert    IT Staff          6  1970-05-29 00:00:00   
2           6  Mitchell   Michael  IT Manager          1  1973-07-01 00:00:00   

              HireDate                      Address        City State Country  \
0  2004-03-04 00:00:00                  923 7 ST NW  Lethbridge    AB  Canada   
1  2004-01-02 00:00:00  590 Columbia Boulevard West  Lethbridge    AB  Canada   
2  2003-10-17 00:00:00         5827 Bowness Road NW     Calgary    AB  Canada   

  PostalCode              Phone                Fax                    Email  
0    T1H 1Y8  +1 (403) 467-3351  +1 (403) 467-8772    laura@chinookcorp.com  
1    T1K 5N8  +1 (403) 456-9986  +1 (403) 456-8485   robert@chinookcorp.com  
2    T3B 0C5  +1 (403) 246-9887  +1 (403) 246-9899  michael@chinookcorp.com  


# Advanced querying: exploiting table relationships

In [20]:
from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('sqlite:///Northwind.sqlite')


df = pd.read_sql_query("SELECT ProductName, UnitPrice, CompanyName FROM Supplier INNER JOIN Product on Supplier.Id = Product.SupplierId", engine)


In [21]:
df.head()

Unnamed: 0,ProductName,UnitPrice,CompanyName
0,Chai,18.0,Exotic Liquids
1,Chang,19.0,Exotic Liquids
2,Aniseed Syrup,10.0,Exotic Liquids
3,Chef Anton's Cajun Seasoning,22.0,New Orleans Cajun Delights
4,Chef Anton's Gumbo Mix,21.35,New Orleans Cajun Delights


``

In [22]:
# Open engine in context manager
# Perform query and save results to DataFrame: df
with engine.connect() as con:
    rs = con.execute("SELECT Title, Name FROM Album INNER JOIN  Artist on Album.ArtistID = Artist.ArtistID")
    df =pd.DataFrame(rs.fetchall())
    df.columns = rs.keys()

# Print head of DataFrame df
print(df.head())


OperationalError: (sqlite3.OperationalError) no such table: Album [SQL: 'SELECT Title, Name FROM Album INNER JOIN  Artist on Album.ArtistID = Artist.ArtistID'] (Background on this error at: http://sqlalche.me/e/e3q8)

In [23]:
# Execute query and store records in DataFrame: df
df = pd.read_sql_query('''
SELECT * 
FROM PlaylistTrack 
INNER JOIN Track on PlaylistTrack.TrackId = Track.TrackId
WHERE  Milliseconds < 250000
''', engine)

# Print head of DataFrame
print(df.head())

ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 0))



OperationalError: (sqlite3.OperationalError) no such table: PlaylistTrack [SQL: '\nSELECT * \nFROM PlaylistTrack \nINNER JOIN Track on PlaylistTrack.TrackId = Track.TrackId\nWHERE  Milliseconds < 250000\n'] (Background on this error at: http://sqlalche.me/e/e3q8)