### Data Driven Decision Making with OLAP SQL queries
- Apply the OLAP extensions in SQL to aggregated data on multiple levels. 
- These extensions are the CUBE, ROLLUP and GROUPING SETS operators.

#### Introduction to OLAP: 
- on-line analytical processing
- Aggregate data for a better overview
    - Count number of rentings for each customer.
    - Average rating of movies for each genre and each country.
- Produce pivot tables to present aggregation results


In [1]:
import pandas as pd
import sqlalchemy
import psycopg2
from sqlalchemy.engine import create_engine
engine = create_engine('postgresql://postgres:123456@localhost:5432/MovieNow', paramstyle='format') 
%reload_ext sql
%sql postgresql://postgres:123456@localhost:5432/MovieNow
conn = psycopg2.connect(host='localhost',
                       dbname='MovieNow',
                       user='postgres',
                       password='123456',
                       port='5432')  
cursor = conn.cursor()    



In [5]:
df1  = pd.read_csv('actsin_181127_2.csv')
df2 = pd.read_csv('actors_181127_2.csv')
df3 = pd.read_csv('customers_181127_2.csv')
df4= pd.read_csv('movies_181127_2.csv')
df5 = pd.read_csv('renting_181127_2.csv')

In [8]:
%%sql
DROP TABLE IF EXISTS "actsin";
DROP TABLE IF EXISTS "actors";
DROP TABLE IF EXISTS "customers";
DROP TABLE IF EXISTS "movies";
DROP TABLE IF EXISTS "renting";


 * postgresql://postgres:***@localhost:5432/MovieNow
Done.
Done.
Done.
Done.
Done.


[]

In [7]:
df1.to_sql('actsin', engine)
df2.to_sql('actors', engine)
df3.to_sql('customers', engine)
df4.to_sql('movies', engine)
df5.to_sql('renting', engine)

### Groups of customers
Use the CUBE operator to extract the content of a pivot table from the database. 
Create a table with the total number of male and female customers from each country.

In [11]:
%%sql
SELECT count(*), -- Extract information of a pivot table of gender and country for the number of customers
      gender,
      country
FROM customers
GROUP BY CUBE (country, gender)
ORDER BY country;

 * postgresql://postgres:***@localhost:5432/MovieNow
36 rows affected.


count,gender,country
3,female,Austria
2,male,Austria
5,,Austria
3,male,Belgium
6,,Belgium
3,female,Belgium
7,,Denmark
4,female,Denmark
3,male,Denmark
8,male,France


### Categories of movies
- Give an overview on the movies available on MovieNow. 
- List the number of movies for different genres and release years.

In [19]:
%%sql
SELECT year_of_releas,
       genre,
       COUNT(*)
FROM movies
GROUP BY CUBE (year_of_releas, genre)
ORDER BY year_of_releas
LIMIT 10;

 * postgresql://postgres:***@localhost:5432/MovieNow
10 rows affected.


year_of_releas,genre,count
2001,Drama,2
2001,Science Fiction & Fantasy,2
2001,,6
2001,Comedy,2
2002,Drama,2
2002,Science Fiction & Fantasy,2
2002,Comedy,3
2002,,7
2003,Science Fiction & Fantasy,1
2003,Mystery & Suspense,2


### Analyzing average ratings
Prepare a table for a report about the national preferences of the customers from MovieNow comparing the average rating of movies across countries and genres.

In [21]:
%%sql
SELECT 
    country, 
    genre, 
    AVG(r.rating) AS avg_rating -- Calculate the average rating 
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY CUBE(country, genre)
LIMIT 10; -- For all aggregation levels of country and genre

 * postgresql://postgres:***@localhost:5432/MovieNow
10 rows affected.


country,genre,avg_rating
,,7.9390243902439
France,Mystery & Suspense,6.0
Slovenia,Action & Adventure,
Spain,Animation,
Poland,Comedy,8.2
Denmark,Drama,7.6
Italy,Science Fiction & Fantasy,7.66666666666667
France,Science Fiction & Fantasy,8.125
USA,Mystery & Suspense,3.0
Austria,Mystery & Suspense,


### Number of customers
Give an overview of the number of customers for a presentation.

In [25]:
%%sql
--# Count the total number of customers, the number of customers for each country, and the number of female and male customers for each country
SELECT country,
       gender,
       COUNT(*)
FROM customers
GROUP BY ROLLUP (country, gender)
ORDER BY country, gender;

 * postgresql://postgres:***@localhost:5432/MovieNow
34 rows affected.


country,gender,count
Austria,female,3
Austria,male,2
Austria,,5
Belgium,female,3
Belgium,male,3
Belgium,,6
Denmark,female,4
Denmark,male,3
Denmark,,7
France,female,5


### Analyzing preferences of genres across countries
Study the preferences of genres across countries. 
- Are there particular genres which are more popular in specific countries? 
- Evaluate the preferences of customers by averaging their ratings and counting the number of movies rented from each genre.

In [28]:
%%sql
SELECT 
    c.country, -- Select country
    m.genre, -- Select genre
    avg(r.rating), -- Average ratings
    count(*)  -- Count number of movie rentals
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY (c.country, m.genre) -- Aggregate for each country and each genre
ORDER BY c.country, m.genre
LIMIT 10;

 * postgresql://postgres:***@localhost:5432/MovieNow
10 rows affected.


country,genre,avg,count
Austria,Animation,10.0,1
Austria,Comedy,8.0,2
Austria,Drama,6.0,4
Austria,Mystery & Suspense,,1
Austria,Science Fiction & Fantasy,6.66666666666667,4
Belgium,Comedy,,1
Belgium,Drama,9.16666666666667,15
Belgium,Mystery & Suspense,,1
Belgium,Science Fiction & Fantasy,8.5,7
Denmark,Drama,7.6,12


In [30]:
%%sql
-- Group by each county and genre with OLAP extension
SELECT 
    c.country, 
    m.genre, 
    AVG(r.rating) AS avg_rating, 
    COUNT(*) AS num_rating
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY ROLLUP(country, genre)
ORDER BY c.country, m.genre
LIMIT 10;

 * postgresql://postgres:***@localhost:5432/MovieNow
10 rows affected.


country,genre,avg_rating,num_rating
Austria,Animation,10.0,1
Austria,Comedy,8.0,2
Austria,Drama,6.0,4
Austria,Mystery & Suspense,,1
Austria,Science Fiction & Fantasy,6.66666666666667,4
Austria,,7.14285714285714,12
Belgium,Comedy,,1
Belgium,Drama,9.16666666666667,15
Belgium,Mystery & Suspense,,1
Belgium,Science Fiction & Fantasy,8.5,7


### Exploring nationality and gender of actors

For each movie in the database, the three most important actors are identified and listed in the table actors. 
This table includes 
- the nationality and 
- gender of the actors. 

interested in how much diversity there is in the nationalities of the actors and how many actors and actresses are in the list.

In [31]:
%%sql
SELECT 
    nationality, -- Select nationality of the actors
    gender, -- Select gender of the actors
    COUNT(*) -- Count the number of actors
FROM actors
GROUP BY GROUPING SETS ((nationality), (gender), ()); -- Use the correct GROUPING SETS operation

 * postgresql://postgres:***@localhost:5432/MovieNow
20 rows affected.


nationality,gender,count
,,145
Somalia,,1
,,2
Argentina,,1
Spain,,3
Italy,,1
Puerto Rico,,1
Iran,,1
Northern Ireland,,2
USA,,91


### Exploring rating by country and gender
Investigate the average rating of customers aggregated by country and gender.

In [33]:
%%sql
SELECT 
    c.country, 
    c.gender,
    avg(r.rating) -- Calculate average rating
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY country, gender -- Order and group by country and gender
ORDER BY country, gender;

 * postgresql://postgres:***@localhost:5432/MovieNow
22 rows affected.


country,gender,avg
Austria,female,7.0
Austria,male,7.33333333333333
Belgium,female,9.125
Belgium,male,8.0
Denmark,female,8.44444444444444
Denmark,male,7.33333333333333
France,female,8.0
France,male,7.66666666666667
Great Britan,female,7.27272727272727
Great Britan,male,7.71428571428571


In [35]:
%%sql
SELECT 
    c.country, 
    c.gender,
    AVG(r.rating)
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY GROUPING SETS ((country, gender)); -- Group by country and gender with GROUPING SETS

 * postgresql://postgres:***@localhost:5432/MovieNow
22 rows affected.


country,gender,avg
Austria,male,7.33333333333333
France,female,8.0
Hungary,female,7.28571428571429
Spain,female,7.61290322580645
Belgium,male,8.0
USA,male,7.5
Denmark,female,8.44444444444444
Austria,female,7.0
Slovenia,male,8.09090909090909
Belgium,female,9.125


In [37]:
%%sql
SELECT 
    c.country, 
    c.gender,
    AVG(r.rating)
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
-- Report all info from a Pivot table for country and gender
GROUP BY GROUPING SETS ((country, gender), (country), (gender), ());

 * postgresql://postgres:***@localhost:5432/MovieNow
36 rows affected.


country,gender,avg
,,7.9390243902439
Austria,male,7.33333333333333
France,female,8.0
Hungary,female,7.28571428571429
Spain,female,7.61290322580645
Belgium,male,8.0
USA,male,7.5
Denmark,female,8.44444444444444
Austria,female,7.0
Slovenia,male,8.09090909090909


### Customer preference for genres
- Customers have no clear preference for more recent movies over older ones. 
- Considers investing money in movies of the best rated genres.

In [41]:
%%sql
SELECT *
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE r.movie_id IN ( -- Select records of movies with at least 4 ratings
	SELECT movie_id
	FROM renting
	GROUP BY movie_id
	HAVING count(rating)>4)
AND r.date_renting >= '2018-04-01' 
LIMIT 10; -- Select records of movie rentals since 2018-04-01

 * postgresql://postgres:***@localhost:5432/MovieNow
10 rows affected.


index,renting_id,customer_id,movie_id,rating,date_renting,index_1,movie_id_1,title,genre,runtime,year_of_releas,renting_price
2,3,108,45,4.0,2018-06-08,44,45,Burn After Reading,Drama,96,2008,2.39
3,4,39,66,8.0,2018-10-22,65,66,The Hunger Games,Drama,142,2012,1.59
7,8,73,65,10.0,2018-06-05,64,65,Ghost Rider: Spirit of Vengeance,Action & Adventure,96,2012,1.79
11,12,52,65,10.0,2018-06-29,64,65,Ghost Rider: Spirit of Vengeance,Action & Adventure,96,2012,1.79
13,14,8,29,,2018-08-03,28,29,Two for the Money,Drama,122,2005,2.79
18,19,99,12,,2018-09-16,11,12,The Two Towers,Science Fiction & Fantasy,179,2002,2.39
22,23,46,6,9.0,2018-04-09,5,6,Harry Potter and the Philosopher's Stone,Science Fiction & Fantasy,152,2001,2.69
23,24,57,41,7.0,2018-08-16,40,41,The Kingdom,Drama,110,2007,2.09
25,26,118,27,8.0,2018-12-01,26,27,Monster,Drama,109,2004,2.09
26,27,7,36,,2019-03-14,35,36,World Trade Center,Drama,129,2006,1.59


In [43]:
%%sql
SELECT genre,
    AVG(rating) AS avg_rating,
    COUNT(rating) AS n_rating,
    COUNT(*) AS n_rentals,     
    COUNT(DISTINCT m.movie_id) AS n_movies 
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE r.movie_id IN ( 
    SELECT movie_id
    FROM renting
    GROUP BY movie_id
    HAVING COUNT(rating) >= 3 )
AND r.date_renting >= '2018-01-01'
GROUP BY genre
ORDER BY avg_rating desc; -- Order the table by decreasing average rating

 * postgresql://postgres:***@localhost:5432/MovieNow
8 rows affected.


genre,avg_rating,n_rating,n_rentals,n_movies
Action & Adventure,8.71428571428571,7,9,2
Art House & International,8.5,4,5,1
Other,8.42857142857143,7,16,2
Science Fiction & Fantasy,8.27659574468085,47,70,10
Comedy,7.95,20,31,5
Animation,7.83333333333333,6,10,2
Drama,7.74825174825175,143,245,34
Mystery & Suspense,7.42857142857143,7,19,3


### Customer preference for actors


In [45]:
%%sql
SELECT a.nationality,
       a.gender,
    AVG(r.rating) AS avg_rating,
    COUNT(r.rating) AS n_rating,
    COUNT(*) AS n_rentals,
    COUNT(DISTINCT a.actor_id) AS n_actors
FROM renting AS r
LEFT JOIN actsin AS ai
ON ai.movie_id = r.movie_id
LEFT JOIN actors AS a
ON ai.actor_id = a.actor_id
WHERE r.movie_id IN ( 
    SELECT movie_id
    FROM renting
    GROUP BY movie_id
    HAVING COUNT(rating) >= 4)
AND r.date_renting >= '2018-04-01'
GROUP BY CUBE( (a.nationality), (a.gender)); -- Provide results for all aggregation levels represented in a pivot table

 * postgresql://postgres:***@localhost:5432/MovieNow
36 rows affected.


nationality,gender,avg_rating,n_rating,n_rentals,n_actors
Argentina,male,8.5,4,5,1
Argentina,,8.5,4,5,1
Australia,female,8.66666666666667,3,5,1
Australia,male,7.45454545454545,11,17,3
Australia,,7.71428571428571,14,22,4
Austria,male,8.5,2,6,1
Austria,,8.5,2,6,1
British,female,7.83333333333333,54,78,3
British,male,8.10526315789474,114,175,9
British,,8.01785714285714,168,253,12
