# STEP3: Perform some simple data analysis

Start by connecting to the database by running the cells below. If you are coming back to this exercise, then uncomment and run the first cell to recreate the database. If you recently completed steps 1 and 2, then skip to the second cell.

In [1]:
%load_ext sql

DB_ENDPOINT = "127.0.0.1"
DB = 'pagila'
DB_USER = 'postgres'
DB_PASSWORD = 'macbook'
DB_PORT = '5432'

# postgresql://username:password@host:port/database
conn_string = "postgresql://{}:{}@{}:{}/{}" \
                        .format(DB_USER, DB_PASSWORD, DB_ENDPOINT, DB_PORT, DB)

print(conn_string)
%sql $conn_string

postgresql://postgres:macbook@127.0.0.1:5432/pagila


'Connected: postgres@pagila'

### 3NF - Entity Relationship Diagram

<img src="./sakila.png" >

## 3.1 Insight 1:   Top Grossing Movies 
- Payments amounts are in table `payment`
- Movies are in table `film`
- They are not directly linked, `payment` refers to a `rental`, `rental` refers to an `inventory` item and `inventory` item refers to a `film`
- `payment` &rarr; `rental` &rarr; `inventory` &rarr; `film`

In [2]:
%sql select f.title,sum(amount) as amm from film as f join inventory as i on i.film_id=f.film_id join rental as r on i.inventory_id=r.inventory_id join payment as p on p.rental_id=r.rental_id group by f.film_id  order by amm desc limit 10

 * postgresql://postgres:***@127.0.0.1:5432/pagila
10 rows affected.


title,amm
TELEGRAPH VOYAGE,231.73
WIFE TURN,223.69
ZORRO ARK,214.69
GOODFELLAS SALUTE,209.69
SATURDAY LAMBS,204.72
TITANS JERK,201.71
TORQUE BOUND,198.72
HARRY IDAHO,195.7
INNOCENT USUAL,191.74
HUSTLER PARTY,190.78


<div class="p-Widget jp-RenderedHTMLCommon jp-RenderedHTML jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"><table>
    <tbody><tr>
        <th>title</th>
        <th>revenue</th>
    </tr>
    <tr>
        <td>TELEGRAPH VOYAGE</td>
        <td>231.73</td>
    </tr>
    <tr>
        <td>WIFE TURN</td>
        <td>223.69</td>
    </tr>
    <tr>
        <td>ZORRO ARK</td>
        <td>214.69</td>
    </tr>
    <tr>
        <td>GOODFELLAS SALUTE</td>
        <td>209.69</td>
    </tr>
    <tr>
        <td>SATURDAY LAMBS</td>
        <td>204.72</td>
    </tr>
    <tr>
        <td>TITANS JERK</td>
        <td>201.71</td>
    </tr>
    <tr>
        <td>TORQUE BOUND</td>
        <td>198.72</td>
    </tr>
    <tr>
        <td>HARRY IDAHO</td>
        <td>195.70</td>
    </tr>
    <tr>
        <td>INNOCENT USUAL</td>
        <td>191.74</td>
    </tr>
    <tr>
        <td>HUSTLER PARTY</td>
        <td>190.78</td>
    </tr>
</tbody></table></div>

## 3.2 Insight 2:   Top grossing cities 
- Payments amounts are in table `payment`
- Cities are in table `cities`
- `payment` &rarr; `customer` &rarr; `address` &rarr; `city`

In [3]:
%%sql select p.customer_id, p.rental_id, p.amount, ci.city from payment as p 
join customer as c on c.customer_id=p.customer_id 
join address as a on a.address_id=c.address_id 
join city as ci on ci.city_id=a.city_id  order by p.payment_date limit 10

 * postgresql://postgres:***@127.0.0.1:5432/pagila
10 rows affected.


customer_id,rental_id,amount,city
130,1,2.99,guas Lindas de Gois
459,2,2.99,Qomsheh
408,3,3.99,Jaffna
333,4,4.99,Baku
222,5,6.99,Jaroslavl
549,6,0.99,Santiago de Compostela
269,7,1.99,Salinas
239,8,4.99,Ciomas
126,9,4.99,Po
399,10,5.99,Okara


### 3.2.2 Top grossing cities
TODO: Write a query that returns the total amount of revenue by city as measured by the `amount` variable in the `payment` table. Limit the results to the top 10 cities. Your result should match the table below.

In [4]:
%%sql select 
    city.city,sum(payment.amount) as revenue 
from payment 
join customer as c on c.customer_id=payment.customer_id 
join address as a on a.address_id=c.customer_id 
join city on city.city_id=a.city_id
group by city.city
order by 
revenue desc limit 10

 * postgresql://postgres:***@127.0.0.1:5432/pagila
10 rows affected.


city,revenue
Lethbridge,254.42
Nha Trang,221.55
London,218.45
Molodetno,216.54
Woodridge,210.51
Clarksville,195.58
Torren,194.61
Ocumare del Tuy,194.61
Uruapan,186.62
Rancagua,177.6


<div class="p-Widget jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"><table>
    <tbody><tr>
        <th>city</th>
        <th>revenue</th>
    </tr>
    <tr>
        <td>Cape Coral</td>
        <td>221.55</td>
    </tr>
    <tr>
        <td>Saint-Denis</td>
        <td>216.54</td>
    </tr>
    <tr>
        <td>Aurora</td>
        <td>198.50</td>
    </tr>
    <tr>
        <td>Molodetno</td>
        <td>195.58</td>
    </tr>
    <tr>
        <td>Apeldoorn</td>
        <td>194.61</td>
    </tr>
    <tr>
        <td>Santa Brbara dOeste</td>
        <td>194.61</td>
    </tr>
    <tr>
        <td>Qomsheh</td>
        <td>186.62</td>
    </tr>
    <tr>
        <td>London</td>
        <td>180.52</td>
    </tr>
    <tr>
        <td>Ourense (Orense)</td>
        <td>177.60</td>
    </tr>
    <tr>
        <td>Bijapur</td>
        <td>175.61</td>
    </tr>
</tbody></table></div>

 * postgresql://postgres:***@127.0.0.1:5432/pagila
10 rows affected.


city,revenue
Cape Coral,221.55
Saint-Denis,216.54
Aurora,198.5
Molodetno,195.58
Santa Brbara dOeste,194.61
Apeldoorn,194.61
Qomsheh,186.62
London,180.52
Ourense (Orense),177.6
Bijapur,175.61


## 3.3 Insight 3 : Revenue of a movie by customer city and by month 

### 3.3.1 Total revenue by month


In [9]:
%%sql
SELECT sum(p.amount) as revenue, EXTRACT(month FROM p.payment_date) as month
from payment p
group by month
order by revenue desc
limit 10;

 * postgresql://postgres:***@127.0.0.1:5432/pagila
5 rows affected.


revenue,month
28559.46,4.0
23886.56,3.0
9631.88,2.0
4824.43,1.0
514.18,5.0


### 3.3.2 Each movie by customer city and by month (data cube)

In [12]:
%%sql select f.title,p.amount,p.customer_id,extract(month from p.payment_date) as month
from payment p join rental r on r.rental_id=p.rental_id
join customer c on c.customer_id=p.customer_id
join inventory i on i.inventory_id=r.inventory_id
join film f on f.film_id=i.film_id 
join address a on a.address_id=c.address_id
join city ci on ci.city_id=a.city_id
order by p.payment_date limit 10

 * postgresql://postgres:***@127.0.0.1:5432/pagila
10 rows affected.


title,amount,customer_id,month
BLANKET BEVERLY,2.99,130,1.0
FREAKY POCUS,2.99,459,1.0
GRADUATE LORD,3.99,408,1.0
LOVE SUICIDES,4.99,333,1.0
IDOLS SNATCHERS,6.99,222,1.0
MYSTIC TRUMAN,0.99,549,1.0
SWARM GOLD,1.99,269,1.0
LAWLESS VISION,4.99,239,1.0
MATRIX SNOWMAN,4.99,126,1.0
HANGING DEEP,5.99,399,1.0


### 3.3.3 Sum of revenue of each movie by customer city and by month

TODO: Write a query that returns the total amount of revenue for each movie by customer city and by month. Limit the results to the top 10 movies. Your result should match the table below.

In [55]:
%%sql select f.title,ci.city,sum(p.amount) as revenue, EXTRACT(month FROM p.payment_date) 
from film f 
join inventory i on i.film_id=f.film_id 
join rental r on r.inventory_id=i.inventory_id
join payment p on p.rental_id=r.rental_id
join customer c on c.customer_id=r.customer_id
join address a on a.address_id=c.address_id
join city ci on ci.city_id=a.city_id
group by  f.title ,ci.city, EXTRACT(month FROM p.payment_date)
order by EXTRACT(month FROM p.payment_date),sum(p.amount) desc
limit 100

 * postgresql://postgres:***@127.0.0.1:5432/pagila
100 rows affected.


title,city,revenue,date_part
SHOW LORD,Mannheim,11.99,1.0
KISSING DOLLS,Toulon,10.99,1.0
CASUALTIES ENCINO,Warren,10.99,1.0
TELEGRAPH VOYAGE,Naala-Porto,10.99,1.0
AMERICAN CIRCUS,Callao,10.99,1.0
MIDSUMMER GROUNDHOG,Vaduz,9.99,1.0
MOONSHINE CABIN,Balaiha,9.99,1.0
MILLION ACE,Bergamo,9.99,1.0
TITANS JERK,Kimberley,9.99,1.0
DAY UNFAITHFUL,Baybay,9.99,1.0


<div class="p-Widget jp-RenderedHTMLCommon jp-RenderedHTML jp-mod-trusted jp-OutputArea-output jp-OutputArea-executeResult" data-mime-type="text/html"><table>
    <tbody><tr>
        <th>title</th>
        <th>city</th>
        <th>month</th>
        <th>revenue</th>
    </tr>
    <tr>
        <td>SHOW LORD</td>
        <td>Mannheim</td>
        <td>1.0</td>
        <td>11.99</td>
    </tr>
    <tr>
        <td>AMERICAN CIRCUS</td>
        <td>Callao</td>
        <td>1.0</td>
        <td>10.99</td>
    </tr>
    <tr>
        <td>CASUALTIES ENCINO</td>
        <td>Warren</td>
        <td>1.0</td>
        <td>10.99</td>
    </tr>
    <tr>
        <td>TELEGRAPH VOYAGE</td>
        <td>Naala-Porto</td>
        <td>1.0</td>
        <td>10.99</td>
    </tr>
    <tr>
        <td>KISSING DOLLS</td>
        <td>Toulon</td>
        <td>1.0</td>
        <td>10.99</td>
    </tr>
    <tr>
        <td>MILLION ACE</td>
        <td>Bergamo</td>
        <td>1.0</td>
        <td>9.99</td>
    </tr>
    <tr>
        <td>TITANS JERK</td>
        <td>Kimberley</td>
        <td>1.0</td>
        <td>9.99</td>
    </tr>
    <tr>
        <td>DARKO DORADO</td>
        <td>Bhilwara</td>
        <td>1.0</td>
        <td>9.99</td>
    </tr>
    <tr>
        <td>SUNRISE LEAGUE</td>
        <td>Nagareyama</td>
        <td>1.0</td>
        <td>9.99</td>
    </tr>
    <tr>
        <td>MILLION ACE</td>
        <td>Gaziantep</td>
        <td>1.0</td>
        <td>9.99</td>
    </tr>
</tbody></table></div>

### There is some ordering problem else result is right