# Exercise 1 -  Sakila Star Schema & ETL  

All the database tables in this demo are based on public database samples and transformations
- `Sakila` is a sample database created by `MySql` [Link](https://dev.mysql.com/doc/sakila/en/sakila-structure.html)
- The postgresql version of it is called `Pagila` [Link](https://github.com/devrimgunduz/pagila)
- The facts and dimension tables design is based on O'Reilly's public dimensional modelling tutorial schema [Link](http://archive.oreilly.com/oreillyschool/courses/dba3/index.html)

<img src="files/sakila.png">

# STEP0: Using ipython-sql

- Load ipython-sql: `%load_ext sql`

- To execute SQL queries you write one of the following atop of your cell: 
    - `%sql`
        - For a one-liner SQL query
        - You can access a python var using `$`    
    - `%%sql`
        - For a multi-line SQL query
        - You can **NOT** access a python var using `$`


- Running a connection string like:
`postgresql://postgres:postgres@db:5432/pagila` connects to the database


##  1.1 Create the pagila db and fill it with data
- Adding `"!"` at the beginning of a jupyter cell runs a command in a shell, i.e. we are not running python code but we are running the `createdb` and `psql` postgresql commmand-line utilities

In [2]:
!PGPASSWORD=macbook createdb -h 127.0.0.1 -U student pagila
!PGPASSWORD=macbook psql -q -h 127.0.0.1 -U student -d pagila -f Data/pagila-schema.sql
!PGPASSWORD=macbook psql -q -h 127.0.0.1 -U student -d pagila -f Data/pagila-data.sql

/bin/sh: createdb: command not found
/bin/sh: psql: command not found
/bin/sh: psql: command not found


## 1.2 Connect to the newly created db

In [3]:
%load_ext sql

In [4]:
db_endpoint="127.0.0.1"
db="pagila"
db_user="postgres"
db_password="macbook"
db_port='5432'
# postgresql://username:password@host:port/database

conn_string="postgresql://{}:{}@{}:{}/{}".format(db_user,db_password,db_endpoint,db_port,db)
print(conn_string)

postgresql://postgres:macbook@127.0.0.1:5432/pagila


In [5]:
%sql $conn_string

'Connected: postgres@pagila'

# STEP2 : Explore the  3NF Schema

#### Count all the rows in each tables

In [7]:
#if the sql cmd end with ";"
nStores=%sql select count(*) from store;
nFilm=%sql select count(*) from film;
nInventory=%sql select count(*) from inventory;
nPayment=%sql select count(*) from payment;

print(type(nStores),nFilm,nInventory,nPayment)

 * postgresql://postgres:***@127.0.0.1:5432/pagila
1 rows affected.
 * postgresql://postgres:***@127.0.0.1:5432/pagila
1 rows affected.
 * postgresql://postgres:***@127.0.0.1:5432/pagila
1 rows affected.
 * postgresql://postgres:***@127.0.0.1:5432/pagila
1 rows affected.
<class 'sql.run.ResultSet'> +-------+
| count |
+-------+
|  1000 |
+-------+ +-------+
| count |
+-------+
|  4581 |
+-------+ +-------+
| count |
+-------+
| 16049 |
+-------+


## 2.2 When? What time period are we talking about? for payment

In [17]:
%sql select min(payment_date),max(payment_date) from payment;


 * postgresql://postgres:***@127.0.0.1:5432/pagila
1 rows affected.


min,max
2007-01-24 21:21:56.996577,2007-05-14 13:44:29.996577


## 2.3 Where? Where do events in this database occur?
TODO: Write a query that displays the number of addresses by district in the address table. Limit the table to the top 10 districts. Your results should match the table below.

In [21]:
%sql select district,count(*) as f from address group by district order by f desc limit 10

 * postgresql://postgres:***@127.0.0.1:5432/pagila
10 rows affected.


district,f
Buenos Aires,10
Shandong,9
California,9
West Bengali,9
Uttar Pradesh,8
So Paulo,8
England,7
Maharashtra,7
Southern Tagalog,6
Gois,5


<div class="p-Widget jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"><table>
    <tbody><tr>
        <th>district</th>
        <th>n</th>
    </tr>
    <tr>
        <td>Buenos Aires</td>
        <td>10</td>
    </tr>
    <tr>
        <td>California</td>
        <td>9</td>
    </tr>
    <tr>
        <td>Shandong</td>
        <td>9</td>
    </tr>
    <tr>
        <td>West Bengali</td>
        <td>9</td>
    </tr>
    <tr>
        <td>So Paulo</td>
        <td>8</td>
    </tr>
    <tr>
        <td>Uttar Pradesh</td>
        <td>8</td>
    </tr>
    <tr>
        <td>Maharashtra</td>
        <td>7</td>
    </tr>
    <tr>
        <td>England</td>
        <td>7</td>
    </tr>
    <tr>
        <td>Southern Tagalog</td>
        <td>6</td>
    </tr>
    <tr>
        <td>Punjab</td>
        <td>5</td>
    </tr>
</tbody></table></div>