# Advanced Aggregates

Please remember to use the `EXPLAIN` before you execute a query to help avoid unnecessary load on the DBMS and indefinite waits by you for results.

Therefore, for each question, we are providing a cell for the `EXPLAIN` as well as the final SQL.


## Our practice schema:

We will use the DVD Rental database.

A PDF of the _Entity-Relationship Diagrams_ (ERD) is available [here](https://web.dsa.missouri.edu/static/PDF/DVD_Rental_ERD2.pdf).   
Printing it out is recommended.


**NOTE**: These queries are more complex that the others.
If you get stuck on one, skip and come back to it later.

**NOTE**: For this notebook, it is desired that you construct solutions using advanced aggregates and derived tables.

In [1]:
%load_ext sql
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dvdrental

'Connected: dsa_ro_user@dvdrental'

### 1
### What is the average, variance, and standard deviation of the film length?


In [2]:
%%sql
EXPLAIN
SELECT AVG(length), VARIANCE(length), STDDEV(length)
FROM film





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
2 rows affected.


QUERY PLAN
Aggregate (cost=71.51..71.52 rows=1 width=96)
-> Seq Scan on film (cost=0.00..64.00 rows=1000 width=2)


In [4]:
%%sql

SELECT AVG(length), VARIANCE(length), STDDEV(length)
FROM film



 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1 rows affected.


avg,variance,stddev
115.272,1634.2883043043043,40.426331818559845


### 2
### What is the average, variance, and standard deviation of the film length; broken down by film category.

In [7]:
%%sql
EXPLAIN

SELECT  DISTINCT name, AVG(length), VARIANCE(length), STDDEV(length)
FROM category as C JOIN film_category as fc ON c.category_id = fc.category_id
JOIN film as f ON fc.film_id = f.film_id
GROUP BY c.name




 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
14 rows affected.


QUERY PLAN
Unique (cost=110.41..110.61 rows=16 width=232)
-> Sort (cost=110.41..110.45 rows=16 width=232)
"Sort Key: c.name, (avg(f.length)), (variance(f.length)), (stddev(f.length))"
-> HashAggregate (cost=109.81..110.09 rows=16 width=232)
Group Key: c.name
-> Hash Join (cost=77.86..99.81 rows=1000 width=70)
Hash Cond: (fc.film_id = f.film_id)
-> Hash Join (cost=1.36..20.67 rows=1000 width=70)
Hash Cond: (fc.category_id = c.category_id)
-> Seq Scan on film_category fc (cost=0.00..16.00 rows=1000 width=4)


In [8]:
%%sql

SELECT  DISTINCT name, AVG(length), VARIANCE(length), STDDEV(length)
FROM category as C JOIN film_category as fc ON c.category_id = fc.category_id
JOIN film as f ON fc.film_id = f.film_id
GROUP BY c.name






 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
16 rows affected.


name,avg,variance,stddev
Action,111.609375,1848.3687996031745,42.99265983401323
Animation,111.01515151515152,1723.0920745920744,41.51014423718706
Children,109.8,1500.9084745762711,38.74156004314064
Classics,111.66666666666666,1475.2261904761904,38.40867337563471
Comedy,115.82758620689656,1781.3381730187537,42.2059021111829
Documentary,108.75,1814.6679104477607,42.598919122998424
Drama,120.83870967741936,1658.399788471708,40.72345501638715
Family,114.78260869565216,1523.2314578005116,39.02859794817784
Foreign,121.6986301369863,1861.9079147640791,43.14983099345905
Games,127.83606557377048,1262.4726775956285,35.531291527266895


[Helpful Hints Video](https://youtu.be/jy9H2KLI4Iw) 

### 3
### A movie's "cumulative rented duration" is the sum of all rentals from rental table.  What is the average _cumulative rented duration_ per store (inventory.store_id).

In [9]:
%%sql
EXPLAIN

SELECT film_id, SUM(rental_id) as cumulative_rented_duration, store_id
FROM inventory as i JOIN rental as r ON i.inventory_id = r.inventory_id
GROUP BY film_id, store_id





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
7 rows affected.


QUERY PLAN
HashAggregate (cost=601.00..610.58 rows=958 width=12)
"Group Key: i.film_id, i.store_id"
-> Hash Join (cost=128.07..480.67 rows=16044 width=8)
Hash Cond: (r.inventory_id = i.inventory_id)
-> Seq Scan on rental r (cost=0.00..310.44 rows=16044 width=8)
-> Hash (cost=70.81..70.81 rows=4581 width=8)
-> Seq Scan on inventory i (cost=0.00..70.81 rows=4581 width=8)


In [14]:
%%sql

SELECT film_id, SUM(rental_id) as cumulative_rented_duration, store_id
FROM inventory as i JOIN rental as r ON i.inventory_id = r.inventory_id
GROUP BY film_id, store_id
ORDER BY store_id, film_id





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1521 rows affected.


film_id,cumulative_rented_duration,store_id
1,114242,1
4,110300,1
6,80480,1
7,59551,1
9,83550,1
10,106642,1
11,108801,1
12,80121,1
15,67355,1
16,56254,1


[Helpful Hints Video](https://youtu.be/Scyn7exzUcY)  

### 4
### Which three categories of film have the highest average number of actors per film?

In [33]:
%%sql

EXPLAIN
SELECT AVG(act_count), name
FROM(SELECT COUNT(actor_id) as act_count, name, fa.film_id
     FROM category as c JOIN film_category as fc ON c.category_id = fc.category_id JOIN film as f ON fc.film_id = f.film_id JOIN film_actor as fa ON f.film_id = fa.film_id
    GROUP BY name, fa.film_id) as count
GROUP BY name 


 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
17 rows affected.


QUERY PLAN
HashAggregate (cost=449.55..452.05 rows=200 width=100)
Group Key: c.name
-> HashAggregate (cost=313.00..367.62 rows=5462 width=78)
"Group Key: c.name, fa.film_id"
-> Hash Join (cost=112.31..272.03 rows=5462 width=72)
Hash Cond: (fa.film_id = fc.film_id)
-> Seq Scan on film_actor fa (cost=0.00..84.62 rows=5462 width=4)
-> Hash (cost=99.81..99.81 rows=1000 width=74)
-> Hash Join (cost=77.86..99.81 rows=1000 width=74)
Hash Cond: (fc.film_id = f.film_id)


In [34]:
%%sql

SELECT AVG(act_count), name
FROM(SELECT COUNT(actor_id) as act_count, name, fa.film_id
     FROM category as c JOIN film_category as fc ON c.category_id = fc.category_id JOIN film as f ON fc.film_id = f.film_id JOIN film_actor as fa ON f.film_id = fa.film_id
    GROUP BY name, fa.film_id) as count
GROUP BY name 



 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
16 rows affected.


avg,name
6.0410958904109595,Sports
5.385964912280701,Classics
5.444444444444445,New
5.028985507246377,Family
4.931034482758621,Comedy
5.46969696969697,Animation
5.732142857142857,Travel
5.509803921568628,Music
5.660714285714286,Horror
5.737704918032787,Drama


### 5
### For each staff member, list their average daily payment amount processed.

In [36]:
%%sql
EXPLAIN

SELECT s.staff_id, payment_date::date, AVG(amount)
FROM staff as s JOIN payment as p ON s.staff_id = p.staff_id
GROUP BY s.staff_id, payment_date::date
ORDER BY s.staff_id, payment_date::date;





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
7 rows affected.


QUERY PLAN
HashAggregate (cost=483.98..666.43 rows=14596 width=44)
"Group Key: s.staff_id, p.payment_date"
-> Hash Join (cost=1.04..374.51 rows=14596 width=18)
Hash Cond: (p.staff_id = s.staff_id)
-> Seq Scan on payment p (cost=0.00..253.96 rows=14596 width=16)
-> Hash (cost=1.02..1.02 rows=2 width=4)
-> Seq Scan on staff s (cost=0.00..1.02 rows=2 width=4)


In [45]:
%%sql

SELECT s.staff_id, payment_date::date, AVG(amount)
FROM staff as s JOIN payment as p ON s.staff_id = p.staff_id
GROUP BY s.staff_id, payment_date::date
ORDER BY s.staff_id, payment_date::date;


 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
64 rows affected.


staff_id,payment_date,avg
1,2007-02-14,4.59
1,2007-02-15,3.7229192546583847
1,2007-02-16,4.131891891891892
1,2007-02-17,4.188412698412698
1,2007-02-18,4.038780487804878
1,2007-02-19,4.095590062111801
1,2007-02-20,4.183103448275863
1,2007-02-21,4.435544554455445
1,2007-03-01,4.116760563380281
1,2007-03-02,4.434444444444445


### 6
### What is the statistical correlation between film length and rental rate?

In [47]:
%%sql
EXPLAIN

SELECT corr(length, rental_rate)
FROM film as f




 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
2 rows affected.


QUERY PLAN
Aggregate (cost=71.50..71.51 rows=1 width=8)
-> Seq Scan on film f (cost=0.00..64.00 rows=1000 width=8)


In [48]:
%%sql
SELECT corr(length, rental_rate)
FROM film as f






 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1 rows affected.


corr
0.0297892586459086


[Helpful Hints Video](https://youtu.be/3d2vgLn9KVs)  

# Save your Notebook, then `File > Close and Halt`