# OLAP Cubes

All the databases table in this demo are based on public database samples and transformations
- `Sakila` is a sample database created by `MySql` [Link](https://video.udacity-data.com/topher/2021/August/61120e06_pagila-3nf/pagila-3nf.png)
- The postgresql version of it is called `Pagila` [Link](https://github.com/devrimgunduz/pagila)
- The facts and dimension tables design is based on O'Reilly's public dimensional modelling tutorial schema [Link](https://video.udacity-data.com/topher/2021/August/61120d38_pagila-star/pagila-star.png)

In [5]:
# creating and connecting to the database
# !PGPASSWORD=student createdb -h 127.0.0.1 -U student pagila_star
# !PGPASSWORD=student psql -q -h 127.0.0.1 -U student -d pagila_star -f Data/pagila-star.sql

In [6]:
# connecting to the local database where Pagila is loaded
import psycopg2
%load_ext sql

DB_ENDPOINT = "127.0.0.1"
DB = 'pagila'
DB_USER = 'postgres'
DB_PASSWORD = 'natbernard'
DB_PORT = '5432'

# postgresql://username:password@host:port/database
conn_string = "postgresql://{}:{}@{}:{}/{}" \
                        .format(DB_USER, DB_PASSWORD, DB_ENDPOINT, DB_PORT, DB)

print(conn_string)
%sql $conn_string

postgresql://postgres:natbernard@127.0.0.1:5432/pagila


'Connected: postgres@pagila'

### Creating a simple cube by calculates the revenue (sales_amount) by day, rating, and city

In [7]:
%%time
%%sql

SELECT d.day, m.rating, c.city, sum(f.sales_amount) AS revenue
FROM factSales AS f
JOIN dimDate AS d ON(f.date_key = d.date_key)
JOIN dimMovie AS m ON(f.movie_key = m.movie_key)
JOIN dimCustomer AS c ON(f.store_key = c.customer_key)
GROUP BY (d.day, m.rating, c.city)
ORDER BY revenue desc
LIMIT 20;

 * postgresql://postgres:***@127.0.0.1:5432/pagila
20 rows affected.
CPU times: user 6.24 ms, sys: 0 ns, total: 6.24 ms
Wall time: 49.2 ms


day,rating,city,revenue
30,PG-13,Sasebo,784.21
30,G,Sasebo,730.48
30,R,Sasebo,683.46
30,NC-17,San Bernardino,667.49
30,NC-17,Sasebo,646.51
30,PG-13,San Bernardino,635.48
30,PG,San Bernardino,593.57
30,G,San Bernardino,587.58
20,PG-13,Sasebo,538.93
30,PG,Sasebo,521.78


## Slicing

In [8]:
%%time
%%sql

SELECT d.day, m.rating, c.city, sum(f.sales_amount) AS revenue
FROM factSales AS f
JOIN dimDate AS d ON(f.date_key = d.date_key)
JOIN dimMovie AS m ON(f.movie_key = m.movie_key)
JOIN dimCustomer AS c ON(f.store_key = c.customer_key)
WHERE m.rating = 'PG-13'
GROUP BY (d.day, m.rating, c.city)
ORDER BY revenue desc
LIMIT 20;

 * postgresql://postgres:***@127.0.0.1:5432/pagila
20 rows affected.
CPU times: user 5.1 ms, sys: 103 µs, total: 5.2 ms
Wall time: 18.4 ms


day,rating,city,revenue
30,PG-13,Sasebo,784.21
30,PG-13,San Bernardino,635.48
20,PG-13,Sasebo,538.93
21,PG-13,Sasebo,499.92
17,PG-13,San Bernardino,488.83
18,PG-13,Sasebo,466.92
19,PG-13,Sasebo,465.87
28,PG-13,Sasebo,455.97
27,PG-13,San Bernardino,444.9
19,PG-13,San Bernardino,430.01


## Dicing

In [9]:
%%time
%%sql

SELECT d.day, m.rating, c.city, sum(f.sales_amount) AS revenue
FROM factSales AS f
JOIN dimDate AS d ON(f.date_key = d.date_key)
JOIN dimMovie AS m ON(f.movie_key = m.movie_key)
JOIN dimCustomer AS c ON(f.store_key = c.customer_key)
WHERE m.rating in ('PG', 'PG-13') 
    AND c.city in ('Sasebo') 
    AND d.day in ('1','15','30')
GROUP BY (d.day, m.rating, c.city)
ORDER BY revenue desc
LIMIT 20;

 * postgresql://postgres:***@127.0.0.1:5432/pagila
6 rows affected.
CPU times: user 7.03 ms, sys: 0 ns, total: 7.03 ms
Wall time: 32.7 ms


day,rating,city,revenue
30,PG-13,Sasebo,784.21
30,PG,Sasebo,521.78
1,PG-13,Sasebo,310.3
1,PG,Sasebo,296.3
15,PG-13,Sasebo,151.61
15,PG,Sasebo,140.65


## Grouping Sets

### Total Revenue

In [10]:
%%sql
SELECT sum(sales_amount) AS revenue
FROM factSales; 

 * postgresql://postgres:***@127.0.0.1:5432/pagila
1 rows affected.


revenue
67416.51


### Revenue by Country

In [11]:
%%sql
SELECT c.country, sum(f.sales_amount) AS revenue
FROM factSales AS f
JOIN dimCustomer AS c ON(f.customer_key = c.customer_key)
GROUP BY c.country
ORDER BY c.country, revenue desc
LIMIT 5;

 * postgresql://postgres:***@127.0.0.1:5432/pagila
5 rows affected.


country,revenue
Afghanistan,67.82
Algeria,383.1
American Samoa,71.8
Angola,215.48
Anguilla,106.65


### Revenue by Month

In [12]:
%%sql
SELECT d.month, sum(f.sales_amount) AS revenue
FROM factSales AS f
JOIN dimDate AS d ON(f.date_key = d.date_key)
GROUP BY d.month
ORDER BY d.month, revenue desc
LIMIT 5;

 * postgresql://postgres:***@127.0.0.1:5432/pagila
5 rows affected.


month,revenue
1,4824.43
2,9631.88
3,23886.56
4,28559.46
5,514.18


### Revenue by Month & Country

In [13]:
%%sql
SELECT d.month, c.country, sum(f.sales_amount) AS revenue
FROM factSales AS f
JOIN dimDate AS d ON(f.date_key = d.date_key)
JOIN dimCustomer AS c ON(f.customer_key = c.customer_key)
GROUP BY (d.month, c.country)
ORDER BY d.month, c.country, revenue desc
LIMIT 5;

 * postgresql://postgres:***@127.0.0.1:5432/pagila
5 rows affected.


month,country,revenue
1,Algeria,33.92
1,Angola,27.93
1,Anguilla,6.97
1,Argentina,81.78
1,Austria,22.92


### Revenue Total, by Month, by Country, by Month & Country All in one shot

In [14]:
%%sql
SELECT d.month, c.country, sum(f.sales_amount) AS revenue
FROM factSales AS f
JOIN dimDate AS d ON(f.date_key = d.date_key)
JOIN dimCustomer AS c ON(f.customer_key = c.customer_key)
GROUP BY GROUPING SETS ((), d.month, c.country, (d.month, c.country))
LIMIT 5;

 * postgresql://postgres:***@127.0.0.1:5432/pagila
5 rows affected.


month,country,revenue
,,67416.51
1.0,France,39.92
4.0,Puerto Rico,134.68
3.0,Italy,314.26
1.0,Kazakstan,5.97


## CUBE

In [15]:
%%sql
SELECT d.month, c.country, sum(f.sales_amount) AS revenue
FROM factSales AS f
JOIN dimDate AS d ON(f.date_key = d.date_key)
JOIN dimCustomer AS c ON(f.customer_key = c.customer_key)
GROUP BY CUBE (d.month, c.country)
LIMIT 5;

 * postgresql://postgres:***@127.0.0.1:5432/pagila
5 rows affected.


month,country,revenue
,,67416.51
1.0,France,39.92
4.0,Puerto Rico,134.68
3.0,Italy,314.26
1.0,Kazakstan,5.97
