# Window Functions

Please remember to use the `EXPLAIN` before you execute a query to help avoid unnecessary load on the DBMS and indefinite waits by you for results.

Therefore, for each question, we are providing a cell for the `EXPLAIN` as well as the final SQL.


## Our practice schema:

We will use the DVD rental database.

A PDF of the _Entity-Relationship Diagrams_ (ERD) is available [here](https://web.dsa.missouri.edu/static/PDF/DVD_Rental_ERD2.pdf).   
Printing it out is recommended.


**NOTE**: These queries are more complex that the previous day's.
If you get stuck on one, skip and come back to it later.


**NOTE**: For this notebook, it is desired that you construct solutions using Window Functions.

In [2]:
%load_ext sql
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dvdrental

'Connected: dsa_ro_user@dvdrental'

# 1

### For the following customers: list each movie they have rented, its `film.rental_duration`, and the comparison of the  `film.rental_duration` versus their average `rental` duration `(return_date  - rental_date)` as a column named `cmp`.

Customer IDs: 
  * 318
  * 110
  * 281
  * 61

In [2]:
%%sql
EXPLAIN
SELECT customer_id, film_id, rental_duration, (rental_duration - rental_duration_2) as cmp
FROM (SELECT AVG(return_date::date - rental_date::date) OVER (PARTITION BY i.inventory_id) as rental_duration_2, customer_id, i.film_id, rental_duration
      FROM rental as r
    JOIN inventory as i on r.inventory_id = i.inventory_id
    JOIN film as f on i.film_id = f.film_id
    WHERE customer_id = 318 OR customer_id = 110 OR customer_id = 281 OR customer_id = 61
     GROUP BY rental_duration, return_date, rental_date, customer_id, i.film_id,i.inventory_id) as cool
GROUP BY film_id, customer_id, rental_duration, cool.rental_duration_2



 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
21 rows affected.


QUERY PLAN
Group (cost=635.01..636.81 rows=103 width=70)
"Group Key: cool.film_id, cool.customer_id, cool.rental_duration, cool.rental_duration_2"
-> Sort (cost=635.01..635.26 rows=103 width=38)
"Sort Key: cool.film_id, cool.customer_id, cool.rental_duration, cool.rental_duration_2"
-> Subquery Scan on cool (cost=627.96..631.56 rows=103 width=38)
-> WindowAgg (cost=627.96..630.53 rows=103 width=58)
-> Sort (cost=627.96..628.21 rows=103 width=26)
Sort Key: i.inventory_id
-> Group (cost=622.97..624.51 rows=103 width=26)
"Group Key: f.rental_duration, r.return_date, r.rental_date, r.customer_id, i.inventory_id"


In [3]:
%%sql
SELECT customer_id, film_id, rental_duration, rental_duration_2,(rental_duration - rental_duration_2) as cmp
FROM (SELECT AVG(return_date::date - rental_date::date) OVER (PARTITION BY i.inventory_id) as rental_duration_2, customer_id, i.film_id, rental_duration
      FROM rental as r
    JOIN inventory as i on r.inventory_id = i.inventory_id
    JOIN film as f on i.film_id = f.film_id
    WHERE customer_id = 318 OR customer_id = 110 OR customer_id = 281 OR customer_id = 61
     GROUP BY rental_duration, return_date, rental_date, customer_id, i.film_id,i.inventory_id) as cool
GROUP BY film_id, customer_id, rental_duration, cool.rental_duration_2
ORDER BY customer_id, film_id






 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
54 rows affected.


customer_id,film_id,rental_duration,rental_duration_2,cmp
61,27,7,4.0,3.0
61,46,3,8.0,-5.0
61,56,6,5.0,1.0
61,317,4,2.0,2.0
61,454,6,9.0,-3.0
61,469,7,1.0,6.0
61,618,3,4.0,-1.0
61,630,7,5.0,2.0
61,723,3,8.0,-5.0
61,724,5,9.0,-4.0


[Helpful Hints Video](https://youtu.be/cm1_d1qWLhg)  
 

--- 

# 2

### For each store (inventory.store_id), list the top three films that have been rented based on accumulated rental durations.

Hint: Use the `rank()` function and a derived table

In [11]:
%%sql
EXPLAIN
SELECT rnked, film_id, store_id, duration
FROM (
SELECT RANK() OVER (PARTITION BY store_id ORDER BY duration DESC) as rnked, film_id, store_id, duration
FROM(
    SELECT SUM(return_date - rental_date) as duration, f.film_id, store_id
    FROM rental as r JOIN inventory as i ON r.inventory_id = i.inventory_id JOIN film as f ON i.film_id = f.film_id
    WHERE return_date - rental_date IS NOT NULL
    GROUP BY f.film_id, store_id) as first) as second
WHERE rnked = 1 OR rnked = 2 OR rnked = 3

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
18 rows affected.


QUERY PLAN
Subquery Scan on second (cost=948.44..1023.44 rows=30 width=30)
Filter: ((second.rnked = 1) OR (second.rnked = 2) OR (second.rnked = 3))
-> WindowAgg (cost=948.44..988.44 rows=2000 width=30)
-> Sort (cost=948.44..953.44 rows=2000 width=22)
"Sort Key: first.store_id, first.duration DESC"
-> Subquery Scan on first (cost=798.78..838.78 rows=2000 width=22)
-> HashAggregate (cost=798.78..818.78 rows=2000 width=22)
"Group Key: f.film_id, i.store_id"
-> Hash Join (cost=204.57..639.14 rows=15964 width=22)
Hash Cond: (i.film_id = f.film_id)


In [12]:
%%sql

SELECT rnked, film_id, store_id, duration
FROM (
SELECT RANK() OVER (PARTITION BY store_id ORDER BY duration DESC) as rnked, film_id, store_id, duration
FROM(
    SELECT SUM(return_date - rental_date) as duration, f.film_id, store_id
    FROM rental as r JOIN inventory as i ON r.inventory_id = i.inventory_id JOIN film as f ON i.film_id = f.film_id
    WHERE return_date - rental_date IS NOT NULL
    GROUP BY f.film_id, store_id) as first) as second
WHERE rnked = 1 OR rnked = 2 OR rnked = 3


 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
6 rows affected.


rnked,film_id,store_id,duration
1,971,1,"111 days, 10:21:00"
2,109,1,"105 days, 14:17:00"
3,852,1,"104 days, 9:19:00"
1,552,2,"104 days, 17:25:00"
2,891,2,"100 days, 19:05:00"
3,491,2,"100 days, 8:14:00"


[Helpful Hints Video](https://youtu.be/COQem8x3kR4)  
 

--- 

# 3

### For each category, list the three longest movies

In [111]:
%%sql
EXPLAIN

SELECT done, film_id, category_id, length
FROM(
SELECT RANK() over (PARTITION BY category_id ORDER BY LENGTH DESC) as done, f.film_id, category_id, length
FROM film_category as fc JOIN film as f ON fc.film_id = f.film_id ) as ranked
WHERE done = 1 OR done = 2 OR done = 3 
ORDER BY category_id, done




 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
12 rows affected.


QUERY PLAN
Sort (cost=182.76..182.80 rows=15 width=16)
"Sort Key: ranked.category_id, ranked.done"
-> Subquery Scan on ranked (cost=144.97..182.47 rows=15 width=16)
Filter: ((ranked.done = 1) OR (ranked.done = 2) OR (ranked.done = 3))
-> WindowAgg (cost=144.97..164.97 rows=1000 width=16)
-> Sort (cost=144.97..147.47 rows=1000 width=8)
"Sort Key: fc.category_id, f.length DESC"
-> Hash Join (cost=76.50..95.14 rows=1000 width=8)
Hash Cond: (fc.film_id = f.film_id)
-> Seq Scan on film_category fc (cost=0.00..16.00 rows=1000 width=4)


In [110]:
%%sql
SELECT done, film_id, category_id, length
FROM(
SELECT RANK() over (PARTITION BY category_id ORDER BY LENGTH DESC) as done, f.film_id, category_id, length
FROM film_category as fc JOIN film as f ON fc.film_id = f.film_id ) as ranked
WHERE done = 1 OR done = 2 OR done = 3 
ORDER BY category_id, done





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
53 rows affected.


done,film_id,category_id,length
1,991,1,185
1,212,1,185
3,126,1,179
1,690,2,185
1,349,2,185
3,886,2,184
3,820,2,184
1,993,3,178
1,344,3,178
3,280,3,177


# 4

### For each customer, list their two shortest movie rentals.

In [122]:
%%sql
EXPLAIN
SELECT rnked, customer_id, rental_id, duration
FROM (
SELECT RANK() OVER (PARTITION BY customer_id ORDER BY duration ASC) as rnked, customer_id, rental_id, duration
FROM(
    SELECT return_date - rental_date as duration, rental_id, customer_id
    FROM rental as r 
    WHERE return_date - rental_date IS NOT NULL) as first) as second
WHERE rnked = 1 OR rnked = 2 



 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
7 rows affected.


QUERY PLAN
Subquery Scan on second (cost=1504.95..2103.60 rows=159 width=30)
Filter: ((second.rnked = 1) OR (second.rnked = 2))
-> WindowAgg (cost=1504.95..1864.14 rows=15964 width=30)
-> Sort (cost=1504.95..1544.86 rows=15964 width=22)
"Sort Key: r.customer_id, ((r.return_date - r.rental_date))"
-> Seq Scan on rental r (cost=0.00..390.46 rows=15964 width=22)
Filter: ((return_date - rental_date) IS NOT NULL)


In [121]:
%%sql
SELECT rnked, customer_id, rental_id, duration
FROM (
SELECT RANK() OVER (PARTITION BY customer_id ORDER BY duration ASC) as rnked, customer_id, rental_id, duration
FROM(
    SELECT return_date - rental_date as duration, rental_id, customer_id
    FROM rental as r 
    WHERE return_date - rental_date IS NOT NULL) as first) as second
WHERE rnked = 1 OR rnked = 2 





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1198 rows affected.


rnked,customer_id,rental_id,duration
1,1,14762,"1 day, 1:57:00"
2,1,8074,"1 day, 2:44:00"
1,2,10918,19:13:00
2,2,320,"1 day, 4:21:00"
1,3,7911,19:33:00
2,3,15038,"1 day, 5:02:00"
1,4,12151,23:38:00
2,4,1633,"1 day, 3:04:00"
1,5,5016,20:10:00
2,5,10625,21:49:00


# 5

### List the quartile statististics of the movie lengths, grouped by release year.

In [141]:
%%sql
EXPLAIN

SELECT percentile_cont(.25) WITHIN GROUP (ORDER BY length),
        percentile_cont(.5) WITHIN GROUP (ORDER BY length),
        percentile_cont(.75) WITHIN GROUP (ORDER BY length),
        percentile_cont(1) WITHIN GROUP (ORDER BY length), 
        release_year
FROM film as f
GROUP BY release_year ; 






 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
5 rows affected.


QUERY PLAN
GroupAggregate (cost=113.83..138.85 rows=1 width=36)
Group Key: release_year
-> Sort (cost=113.83..116.33 rows=1000 width=6)
Sort Key: release_year
-> Seq Scan on film f (cost=0.00..64.00 rows=1000 width=6)


In [140]:
%%sql



SELECT percentile_cont(.25) WITHIN GROUP (ORDER BY length),
        percentile_cont(.5) WITHIN GROUP (ORDER BY length),
        percentile_cont(.75) WITHIN GROUP (ORDER BY length),
        percentile_cont(1) WITHIN GROUP (ORDER BY length), 
        release_year
FROM film as f
GROUP BY release_year ; 



 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1 rows affected.


percentile_cont,percentile_cont_1,percentile_cont_2,percentile_cont_3,release_year
80.0,114.0,149.25,185.0,2006


# Save your notebook, then `File > Close and Halt`