# OLAP Cubes - Slicing and Dicing

All the databases table in this demo are based on public database samples and transformations
- `Sakila` is a sample database created by `MySql` [Link](https://video.udacity-data.com/topher/2021/August/61120e06_pagila-3nf/pagila-3nf.png)
- The postgresql version of it is called `Pagila` [Link](https://github.com/devrimgunduz/pagila)
- The facts and dimension tables design is based on O'Reilly's public dimensional modelling tutorial schema [Link](https://video.udacity-data.com/topher/2021/August/61120d38_pagila-star/pagila-star.png)

In [1]:
# creating and connecting to the database
# !PGPASSWORD=student createdb -h 127.0.0.1 -U student pagila_star
# !PGPASSWORD=student psql -q -h 127.0.0.1 -U student -d pagila_star -f Data/pagila-star.sql

In [2]:
# connecting to the local database where Pagila is loaded
import psycopg2
%load_ext sql

DB_ENDPOINT = "127.0.0.1"
DB = 'pagila'
DB_USER = 'postgres'
DB_PASSWORD = 'natbernard'
DB_PORT = '5432'

# postgresql://username:password@host:port/database
conn_string = "postgresql://{}:{}@{}:{}/{}" \
                        .format(DB_USER, DB_PASSWORD, DB_ENDPOINT, DB_PORT, DB)

print(conn_string)
%sql $conn_string

postgresql://postgres:natbernard@127.0.0.1:5432/pagila


'Connected: postgres@pagila'

### Creating a simple cube by calculates the revenue (sales_amount) by day, rating, and city

In [5]:
%%time
%%sql

SELECT d.day, m.rating, c.city, sum(f.sales_amount) AS revenue
FROM factSales AS f
JOIN dimDate AS d ON(f.date_key = d.date_key)
JOIN dimMovie AS m ON(f.movie_key = m.movie_key)
JOIN dimCustomer AS c ON(f.store_key = c.customer_key)
GROUP BY (d.day, m.rating, c.city)
ORDER BY revenue desc
LIMIT 20;

 * postgresql://postgres:***@127.0.0.1:5432/pagila
20 rows affected.
CPU times: user 6.23 ms, sys: 384 µs, total: 6.61 ms
Wall time: 154 ms


day,rating,city,revenue
30,PG-13,Sasebo,784.21
30,G,Sasebo,730.48
30,R,Sasebo,683.46
30,NC-17,San Bernardino,667.49
30,NC-17,Sasebo,646.51
30,PG-13,San Bernardino,635.48
30,PG,San Bernardino,593.57
30,G,San Bernardino,587.58
20,PG-13,Sasebo,538.93
30,PG,Sasebo,521.78


### Slicing

In [6]:
%%time
%%sql

SELECT d.day, m.rating, c.city, sum(f.sales_amount) AS revenue
FROM factSales AS f
JOIN dimDate AS d ON(f.date_key = d.date_key)
JOIN dimMovie AS m ON(f.movie_key = m.movie_key)
JOIN dimCustomer AS c ON(f.store_key = c.customer_key)
WHERE m.rating = 'PG-13'
GROUP BY (d.day, m.rating, c.city)
ORDER BY revenue desc
LIMIT 20;

 * postgresql://postgres:***@127.0.0.1:5432/pagila
20 rows affected.
CPU times: user 5.6 ms, sys: 209 µs, total: 5.81 ms
Wall time: 168 ms


day,rating,city,revenue
30,PG-13,Sasebo,784.21
30,PG-13,San Bernardino,635.48
20,PG-13,Sasebo,538.93
21,PG-13,Sasebo,499.92
17,PG-13,San Bernardino,488.83
18,PG-13,Sasebo,466.92
19,PG-13,Sasebo,465.87
28,PG-13,Sasebo,455.97
27,PG-13,San Bernardino,444.9
19,PG-13,San Bernardino,430.01


### Dicing

In [9]:
%%time
%%sql

SELECT d.day, m.rating, c.city, sum(f.sales_amount) AS revenue
FROM factSales AS f
JOIN dimDate AS d ON(f.date_key = d.date_key)
JOIN dimMovie AS m ON(f.movie_key = m.movie_key)
JOIN dimCustomer AS c ON(f.store_key = c.customer_key)
WHERE m.rating in ('PG', 'PG-13') 
    AND c.city in ('Sasebo') 
    AND d.day in ('1','15','30')
GROUP BY (d.day, m.rating, c.city)
ORDER BY revenue desc
LIMIT 20;

 * postgresql://postgres:***@127.0.0.1:5432/pagila
6 rows affected.
CPU times: user 2.9 ms, sys: 3.84 ms, total: 6.73 ms
Wall time: 35.7 ms


day,rating,city,revenue
30,PG-13,Sasebo,784.21
30,PG,Sasebo,521.78
1,PG-13,Sasebo,310.3
1,PG,Sasebo,296.3
15,PG-13,Sasebo,151.61
15,PG,Sasebo,140.65
