# 4 Data Driven Decision Making with OLAP SQL queries

The OLAP extensions in SQL are introduced and applied to aggregated data on multiple levels. These extensions are the CUBE, ROLLUP and GROUPING SETS operators.

# Groups of customers

Use the CUBE operator to extract the content of a pivot table from the database. Create a table with the total number of male and female customers from each country.

# Instructions:

- Create a table with the total number of customers, of all female and male customers, of the number of customers for each country and the number of men and women from each country.

In [None]:
SELECT Country, -- Extract information of a pivot table of gender and country for the number of customers
	   Gender,
	   COUNT(*)
FROM customers
GROUP BY CUBE (Country, Gender)
ORDER BY country;

# Categories of movies

Give an overview on the movies available on MovieNow. List the number of movies for different genres and release years.

# Instructions:

- List the number of movies for different genres and the year of release on all aggregation levels by using the CUBE operator.

In [None]:
SELECT year_of_release,
       COUNT(*),
       Genre
FROM movies
GROUP BY 
    CUBE (Genre, year_of_release)
ORDER BY year_of_release;

# Question

Which statement is NOT correct about the result table?

# Possible answers

( ) From all genres (ignoring the year of release) there are most movies in the category Drama.

( ) In total there are 71 movies available on MovieNow.

(x) The year of release with most movies is 2014.

( ) From 2002 there are 2 dramas available on MovieNow.

# Analyzing average ratings

Prepare a table for a report about the national preferences of the customers from MovieNow comparing the average rating of movies across countries and genres.

# Instructions:

- Augment the records of movie rentals with information about movies and customers, in this order. Use the first letter of the table names as alias.

In [None]:
-- Augment the records of movie rentals with information about movies and customers
SELECT *
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id;

- Calculate the average rating for each country.

In [None]:
-- Calculate the average rating for each country
SELECT 
	Country,
    AVG(rating)
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY 
    C.Country;

- Calculate the average rating for all aggregation levels of country and genre.

In [None]:
SELECT 
	c.country, 
	m.Genre, 
	AVG(r.rating) AS avg_rating -- Calculate the average rating 
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY 
    CUBE(c.country, m.genre); -- For all aggregation levels of country and genre

# Question

What is the average rating over all records, rounded to two digits?
Possible answers

(x) 7.94

( ) null

( ) .80

( ) 7.86

# Number of customers

You have to give an overview of the number of customers for a presentation.

# Instructions:

- Generate a table with the total number of customers, the number of customers for each country, and the number of female and male customers for each country.
- Order the result by country and gender.

In [None]:
-- Count the total number of customers, the number of customers for each country, and the number of female and male customers for each country
SELECT country,
       gender,
	   COUNT(*)
FROM customers
GROUP BY ROLLUP (country, gender)
ORDER BY country, gender; -- Order the result by country and gender

# Analyzing preferences of genres across countries

You are asked to study the preferences of genres across countries. Are there particular genres which are more popular in specific countries? Evaluate the preferences of customers by averaging their ratings and counting the number of movies rented from each genre.

# Instructions:

- Augment the renting records with information about movies and customers.

In [None]:
-- Join the tables
SELECT *
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id;

- Calculate the average ratings and the number of ratings for each country and each genre. Include the columns country and genre in the SELECT clause.

In [None]:
SELECT 
	c.country, -- Select country
	m.genre, -- Select genre
	AVG(r.rating), -- Average ratings
	COUNT(*)  -- Count number of movie rentals
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY c.country, m.genre -- Aggregate for each country and each genre
ORDER BY c.country, m.genre;

- Finally, calculate the average ratings and the number of ratings for each country and genre, as well as an aggregation over all genres for each country and the overall average and total number.

In [None]:
-- Group by each county and genre with OLAP extension
SELECT 
	c.country, 
	m.genre, 
	AVG(r.rating) AS avg_rating, 
	COUNT(*) AS num_rating
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY ROLLUP (c.country, m.genre)
ORDER BY c.country, m.genre;

# Exploring nationality and gender of actors

For each movie in the database, the three most important actors are identified and listed in the table actors. This table includes the nationality and gender of the actors. We are interested in how much diversity there is in the nationalities of the actors and how many actors and actresses are in the list.

# Instructions:

- Count the number of actors in the table actors from each country, the number of male and female actors and the total number of actors.

In [None]:
SELECT 
	nationality, -- Select nationality of the actors
    gender, -- Select gender of the actors
    COUNT(*) -- Count the number of actors
FROM actors
GROUP BY GROUPING SETS ((nationality), (gender), ()); -- Use the correct GROUPING SETS operation

# Exploring rating by country and gender

Now you will investigate the average rating of customers aggregated by country and gender.

# Instructions:

- Select the columns country, gender, and rating and use the correct join to combine the table renting with customer.

In [None]:
SELECT 
	c.country, -- Select country, gender and rating
	c.gender,
	r.rating
FROM renting AS r
LEFT JOIN customers AS c -- Use the correct join
ON r.customer_id = c.customer_id;

- Use GROUP BY to calculate the average rating over country and gender. Order the table by country and gender.

In [None]:
SELECT 
	c.country, 
    c.gender,
	AVG(r.rating) -- Calculate average rating
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY country, gender  -- Order and group by country and gender
ORDER BY country, gender;

- Now, use GROUPING SETS to get the same result, i.e. the average rating over country and gender.

In [None]:
SELECT 
	c.country, 
    c.gender,
	AVG(r.rating)
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
GROUP BY GROUPING SETS ((country, gender)); -- Group by country and gender with GROUPING SETS

- Report all information that is included in a pivot table for country and gender in one SQL table.

In [None]:
SELECT 
	c.country, 
    c.gender,
	AVG(r.rating)
FROM renting AS r
LEFT JOIN customers AS c
ON r.customer_id = c.customer_id
-- Report all info from a Pivot table for country and gender
GROUP BY GROUPING SETS ((country, gender), (country), (gender), ());

# Customer preference for genres

You just saw that customers have no clear preference for more recent movies over older ones. Now the management considers investing money in movies of the best rated genres.

# Instructions:

- Augment the records of movie rentals with information about movies. Use the first letter of the table as alias.

In [None]:
SELECT *
FROM renting AS r
LEFT JOIN movies AS m -- Augment the table with information about movies
ON r.movie_id = m.movie_id;

- Select records of movies with at least 4 ratings, starting from 2018-04-01.

In [None]:
SELECT *
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE r.movie_id IN ( -- Select records of movies with at least 4 ratings
	SELECT movie_id
	FROM renting
	GROUP BY movie_id
	HAVING COUNT(rating) >= 4)
AND r.date_renting >= '2018-04-01'; -- Select records of movie rentals since 2018-04-01

- For each genre, calculate the average rating (use the alias avg_rating), the number of ratings (use the alias n_rating), the number of movie rentals (use the alias n_rentals), and the number of distinct movies (use the alias n_movies).

In [None]:
SELECT m.genre, -- For each genre, calculate:
	   AVG(r.rating) AS avg_rating, -- The average rating and use the alias avg_rating
	   COUNT(r.rating) AS n_rating, -- The number of ratings and use the alias n_rating
	   COUNT(*) AS n_rentals,     -- The number of movie rentals and use the alias n_rentals
	   COUNT(DISTINCT m.movie_id) AS n_movies -- The number of distinct movies and use the alias n_movies
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE r.movie_id IN ( 
	SELECT movie_id
	FROM renting
	GROUP BY movie_id
	HAVING COUNT(rating) >= 3)
AND r.date_renting >= '2018-01-01'
GROUP BY m.genre;

- Order the table by decreasing average rating.

In [None]:
SELECT genre,
	   AVG(rating) AS avg_rating,
	   COUNT(rating) AS n_rating,
       COUNT(*) AS n_rentals,     
	   COUNT(DISTINCT m.movie_id) AS n_movies 
FROM renting AS r
LEFT JOIN movies AS m
ON m.movie_id = r.movie_id
WHERE r.movie_id IN ( 
	SELECT movie_id
	FROM renting
	GROUP BY movie_id
	HAVING COUNT(rating) >= 3)
AND r.date_renting >= '2018-01-01'
GROUP BY genre
ORDER BY avg_rating DESC; -- Order the table by decreasing average rating

# Customer preference for actors

The last aspect you have to analyze are customer preferences for certain actors.

# Instructions:

- Join the tables.

In [None]:
-- Join the tables
SELECT *
FROM renting AS r
LEFT JOIN actsin AS ai
ON ai.movie_id = r.movie_id
LEFT JOIN actors AS a
ON ai.actor_id = a.actor_id;

- For each combination of the actors' nationality and gender, calculate the average rating, the number of ratings, the number of movie rentals, and the number of actors.

In [None]:
SELECT a.nationality,
       a.gender,
	   AVG(r.rating) AS avg_rating, -- The average rating
	   COUNT(r.rating) AS n_rating, -- The number of ratings
	   COUNT(*) AS n_rentals, -- The number of movie rentals
	   COUNT(DISTINCT a.actor_id) AS n_actors -- The number of actors
FROM renting AS r
LEFT JOIN actsin AS ai
ON ai.movie_id = r.movie_id
LEFT JOIN actors AS a
ON ai.actor_id = a.actor_id
WHERE r.movie_id IN ( 
	SELECT movie_id
	FROM renting
	GROUP BY movie_id
	HAVING COUNT(rating) >=4 )
AND r.date_renting >= '2018-04-01'
GROUP BY a.nationality, a.gender;

- Provide results for all aggregation levels represented in a pivot table.

In [None]:
SELECT a.nationality,
       a.gender,
	   AVG(r.rating) AS avg_rating,
	   COUNT(r.rating) AS n_rating,
	   COUNT(*) AS n_rentals,
	   COUNT(DISTINCT a.actor_id) AS n_actors
FROM renting AS r
LEFT JOIN actsin AS ai
ON ai.movie_id = r.movie_id
LEFT JOIN actors AS a
ON ai.actor_id = a.actor_id
WHERE r.movie_id IN ( 
	SELECT movie_id
	FROM renting
	GROUP BY movie_id
	HAVING COUNT(rating) >= 4)
AND r.date_renting >= '2018-04-01'
GROUP BY CUBE (a.nationality, a.gender); -- Provide results for all aggregation levels represented in a pivot table