# Window Functions

Please remember to use the `EXPLAIN` before you execute a query to help avoid unnecessary load on the DBMS and indefinite waits by you for results.

Therefore, for each question, we are providing a cell for the `EXPLAIN` as well as the final SQL.


## Our practice schema:

We will use the same database as in the Day 1 practice.

A PDF of the _Entity-Relationship Diagrams_ (ERD) is available [here](https://web.dsa.missouri.edu/static/PDF/DVD_Rental_ERD2.pdf).   
Printing it out is recommended.


**NOTE**: These queries are more complex that the previous day's.
If you get stuck on one, skip and come back to it later.


**NOTE**: For this notebook, it is desired taht you construct solutions using Window Functions.

In [2]:
%load_ext sql
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dvdrental

'Connected: dsa_ro_user@dvdrental'

# 1

### For the following customers: list each movie they have rented, its `film.rental_duration`, and the comparison of the  `film.rental_duration` versus their average `rental` duration `(return_date  - rental_date)` as a column named `cmp`.

Customer IDs: 
  * 318
  * 110
  * 281
  * 61

In [None]:
%%sql
EXPLAIN
SELECT r.customer_id, f.title, f.rental_duration, sum(f.rental_duration - x.i) as cmp
from film f
JOIN (
    SELECT r.customer_id, avg(r.return_date::date - r.rental_date::date) as i
    from rental r
    JOIN inventory i USING (inventory_id)
    WHERE r.customer IN (318, 110, 281, 61)
    AND r.return_date IS NOT NULL
    GROUP BY r.customer_id
) as x
GROUP BY r.customer_id ;


In [20]:
%%sql
SELECT r.customer_id, f.title, f.rental_duration, sum(f.rental_duration - x.i) as cmp
from film f
JOIN (
    SELECT r.customer_id, avg(r.return_date::date - r.rental_date::date) as i
    from rental r
    JOIN inventory i USING (inventory_id)
    WHERE r.customer IN (318, 110, 281, 61)
    AND r.return_date IS NOT NULL
    GROUP BY r.customer_id
) as x
GROUP BY r.customer_id ;


 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
(psycopg2.errors.SyntaxError) syntax error at or near ";"
LINE 10: ) as x;
               ^

[SQL: SELECT r.customer_id, f.title, f.rental_duration, sum(f.rental_duration - x.i) as cmp
from film f
JOIN (
    SELECT r.customer_id, avg(r.return_date::date - r.rental_date::date) as i
    from rental r
    JOIN inventory i USING (inventory_id)
    WHERE r.customer IN (318, 110, 281, 61)
    AND r.return_date IS NOT NULL
    GROUP BY r.customer_id
) as x;]
(Background on this error at: http://sqlalche.me/e/f405)


[Helpful Hints Video](https://youtu.be/cm1_d1qWLhg)  
 

--- 

# 2

### For each store (inventory.store_id), list the top three films that have been rented based on accumulated rental durations.

Hint: Use the `rank()` function and a derived table

In [None]:
%%sql
EXPLAIN
SELECT i.store_id, i.film_id, f.title, f.rental_duration, rank
from film f
JOIN (
    SELECT store_id
    , rank() OVER (PARTITION BY store_id ORDER BY rental_duration DESC)
    , i.film_id, f.title, f.rental_duration
    FROM inventory i
    JOIN rental r USING (inventory_id)
    WHERE r.return_date IS NOT NULL
    GROUP BY i.store_id, i.film_id, f.title, f.rental_duration
) as x
WHERE RANK <= 3;

In [29]:
%%sql
SELECT i.store_id, i.film_id, f.title, f.rental_duration, rank
from film f
JOIN (
    SELECT store_id
    , rank() OVER (PARTITION BY store_id ORDER BY rental_duration DESC)
    , i.film_id, f.title, f.rental_duration
    FROM inventory i
    JOIN rental r USING (inventory_id)
    WHERE r.return_date IS NOT NULL
    GROUP BY i.store_id, i.film_id, f.title, f.rental_duration
) as x
WHERE RANK <= 3;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
(psycopg2.errors.SyntaxError) syntax error at or near "WHERE"
LINE 12: WHERE RANK <= 3;
         ^

[SQL: SELECT i.store_id, i.film_id, f.title, f.rental_duration, rank
from film f
JOIN (
    SELECT store_id
    , rank() OVER (PARTITION BY store_id ORDER BY rental_duration DESC)
    , i.film_id, f.title, f.rental_duration
    FROM inventory i
    JOIN rental r USING (inventory_id)
    WHERE r.return_date IS NOT NULL
    GROUP BY i.store_id, i.film_id, f.title, f.rental_duration
) as x
WHERE RANK <= 3;]
(Background on this error at: http://sqlalche.me/e/f405)


In [24]:
%%sql
SELECT store_id, film_id, title, rental_duration, rank
FROM (
    SELECT i.store_id, i.film_id
    , rank() OVER (PARTITION BY store_id ORDER BY rental_duration DESC)
    , sum(r.return_date-r.rental_date) as duration
    FROM inventory i
    JOIN rental r USING (inventory_id)
    JOIN film f USING (film_id)
    WHERE r.return_date IS NOT NULL
    GROUP BY i.store_id, i.film_id
) AS x
GROUP BY 
WHERE RANK <= 3;


 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
(psycopg2.errors.GroupingError) column "f.rental_duration" must appear in the GROUP BY clause or be used in an aggregate function
LINE 4:     , rank() OVER (PARTITION BY store_id ORDER BY rental_dur...
                                                          ^

[SQL: SELECT store_id, film_id, title, rental_duration, rank
FROM (
    SELECT i.store_id, i.film_id
    , rank() OVER (PARTITION BY store_id ORDER BY rental_duration DESC)
    , sum(r.return_date-r.rental_date) as duration
    FROM inventory i
    JOIN rental r USING (inventory_id)
    JOIN film f USING (film_id)
    WHERE r.return_date IS NOT NULL
    GROUP BY i.store_id, i.film_id
) AS x
WHERE RANK <= 3;]
(Background on this error at: http://sqlalche.me/e/f405)


[Helpful Hints Video](https://youtu.be/COQem8x3kR4)  
 

--- 

# 3

### For each category, list the three longest movies

In [None]:
%%sql
EXPLAIN
SELECT name, title, length, rank
FROM (
    SELECT name
    , rank() OVER (PARTITION BY name ORDER BY length DESC)
    , title, length
    FROM film 
    JOIN film_category USING (film_id)
    JOIN category USING (category_id)
) as x
WHERE RANK <= 3;

In [41]:
%%sql
SELECT name, title, length, rank
FROM (
    SELECT name
    , rank() OVER (PARTITION BY name ORDER BY length DESC)
    , title, length
    FROM film 
    JOIN film_category USING (film_id)
    JOIN category USING (category_id)
) as x
WHERE RANK <= 3;


 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
53 rows affected.


name,title,length,rank
Action,Worst Banger,185,1
Action,Darn Forrester,185,1
Action,Casualties Encino,179,3
Animation,Pond Seattle,185,1
Animation,Gangs Pride,185,1
Animation,Theory Mermaid,184,3
Animation,Sons Interview,184,3
Children,Wrong Behavior,178,1
Children,Fury Murder,178,1
Children,Empire Malkovich,177,3


# 4

### For each customer, list their two shortest movie rentals.

In [46]:
%%sql
SELECT first_name, last_name, title, rank
FROM (
    SELECT last_name
    , rank() OVER (PARTITION BY last_name ORDER BY (return_date-rental_date) ASC)
    ,first_name, title
    FROM customer
    JOIN rental USING (customer_id)
    JOIN inventory USING (inventory_id)
    JOIN film USING (film_id)
) as x
WHERE RANK <= 2
ORDER BY first_name ASC;



 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
20 rows affected.


QUERY PLAN
Sort (cost=2637.61..2650.98 rows=5348 width=36)
Sort Key: x.first_name
-> Subquery Scan on x (cost=1785.01..2306.44 rows=5348 width=36)
Filter: (x.rank <= 2)
-> WindowAgg (cost=1785.01..2105.89 rows=16044 width=36)
-> Sort (cost=1785.01..1825.12 rows=16044 width=28)
"Sort Key: customer.last_name, film.title DESC"
-> Hash Join (cost=227.05..664.36 rows=16044 width=28)
Hash Cond: (inventory.film_id = film.film_id)
-> Hash Join (cost=150.55..545.57 rows=16044 width=15)


In [63]:
%%sql
SELECT first_name, last_name, title, rank
FROM (
    SELECT last_name
    , rank() OVER (PARTITION BY last_name ORDER BY (return_date-rental_date) ASC)
    ,first_name, title
    FROM customer
    JOIN rental USING (customer_id)
    JOIN inventory USING (inventory_id)
    JOIN film USING (film_id)
) as x
WHERE RANK <= 2
ORDER BY first_name ASC;



 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1198 rows affected.


first_name,last_name,title,rank
Aaron,Selby,Drumline Cyclone,1
Aaron,Selby,Liaisons Sweet,2
Adam,Gooch,Spiking Element,1
Adam,Gooch,Polish Brooklyn,2
Adrian,Clary,Volcano Texas,1
Adrian,Clary,Hollywood Anonymous,2
Agnes,Bishop,Lucky Flying,1
Agnes,Bishop,Igby Maker,2
Alan,Kahn,Holocaust Highball,2
Alan,Kahn,Roses Treasure,1


# 5

### List the quartile statististics of the movie lengths, grouped by release year.

In [None]:
%%sql
EXPLAIN
SELECT 
  release_year,
  percentile_cont(array[0.25, 0.5, 0.75]) WITHIN GROUP (ORDER BY length)
FROM film
GROUP BY release_year


In [61]:
%%sql
SELECT 
  release_year,
  percentile_cont(array[0.25, 0.5, 0.75]) WITHIN GROUP (ORDER BY length)
FROM film
GROUP BY release_year


 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1 rows affected.


release_year,percentile_cont
2006,"[80.0, 114.0, 149.25]"


# Save your notebook, then `File > Close and Halt`