# Basic Statistics Practice

Please remember to use the `EXPLAIN` before you execute a query to help avoid unnecessary load on the DBMS and indefinite waits by you for results.

Therefore, for each question, we are providing a cell for the `EXPLAIN` as well as the final SQL.


## Our practice schema:

We will use the same database as in the Day 1 practice.

A PDF of the _Entity-Relationship Diagrams_ (ERD) is available [here](https://web.dsa.missouri.edu/static/PDF/DVD_Rental_ERD2.pdf).   
Printing it out is recommended.


In [1]:
%load_ext sql
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dvdrental

'Connected: dsa_ro_user@dvdrental'

# 1

### List the average length of the movies (film.length) in the database.

In [5]:
%%sql
EXPLAIN 
SELECT AVG(length) FROM film;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
2 rows affected.


QUERY PLAN
Aggregate (cost=66.50..66.51 rows=1 width=32)
-> Seq Scan on film (cost=0.00..64.00 rows=1000 width=2)


In [4]:
%%sql
SELECT AVG(length) FROM film;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1 rows affected.


avg
115.272


[Helpful Hints](https://youtu.be/yMrP0cr_rqo)  
 

--- 

# 2

### List the number of rows in the payment table.

In [8]:
%%sql
EXPLAIN 
SELECT COUNT(*) FROM payment;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
2 rows affected.


QUERY PLAN
Aggregate (cost=290.45..290.46 rows=1 width=8)
-> Seq Scan on payment (cost=0.00..253.96 rows=14596 width=0)


In [7]:
%%sql
SELECT COUNT(*) FROM payment;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1 rows affected.


count
14596


# 3

### List each category (`category.name`) and the number of films in that category.

In [23]:
%%sql
EXPLAIN 
SELECT name, count(*)
FROM category as C
JOIN film_category as S
  ON (C.category_id=S.category_id)
GROUP BY name

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
7 rows affected.


QUERY PLAN
HashAggregate (cost=25.67..25.83 rows=16 width=76)
Group Key: c.name
-> Hash Join (cost=1.36..20.67 rows=1000 width=68)
Hash Cond: (s.category_id = c.category_id)
-> Seq Scan on film_category s (cost=0.00..16.00 rows=1000 width=2)
-> Hash (cost=1.16..1.16 rows=16 width=72)
-> Seq Scan on category c (cost=0.00..1.16 rows=16 width=72)


In [27]:
%%sql
SELECT name, count(*)
FROM category
JOIN film_category
  ON (film_category.category_id = category.category_id)
GROUP BY name

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
16 rows affected.


name,count
Family,69
Games,61
Animation,66
Classics,57
Documentary,68
New,63
Sports,74
Children,60
Music,51
Travel,57


[Helpful Hints](https://youtu.be/YRMI8myh9WY)  
 

--- 

# 4

### List each film title and the number of actors in that film.

In [31]:
%%sql
EXPLAIN 
Select title, count(*)
FROM film
JOIN film_actor
  ON (film_actor.film_id = film.film_id)
GROUP BY title

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
7 rows affected.


QUERY PLAN
HashAggregate (cost=202.82..212.82 rows=1000 width=23)
Group Key: film.title
-> Hash Join (cost=76.50..175.51 rows=5462 width=15)
Hash Cond: (film_actor.film_id = film.film_id)
-> Seq Scan on film_actor (cost=0.00..84.62 rows=5462 width=2)
-> Hash (cost=64.00..64.00 rows=1000 width=19)
-> Seq Scan on film (cost=0.00..64.00 rows=1000 width=19)


In [32]:
%%sql
Select title, count(*)
FROM film
JOIN film_actor
  ON (film_actor.film_id = film.film_id)
GROUP BY title;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
997 rows affected.


title,count
Graceland Dynamite,6
Opus Ice,4
Braveheart Human,4
Wonderful Drop,4
Rush Goodfellas,5
Purple Movie,8
Minority Kiss,1
Luke Mummy,9
Fantasy Troopers,9
Grinch Massage,6


# 5

### List each film title and the number of actors in that film, for films with more than 10 actors.

In [34]:
%%sql
EXPLAIN 
Select title, count(*)
FROM film
JOIN film_actor
  ON (film_actor.film_id = film.film_id)
GROUP BY title
HAVING COUNT(*) > 10;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
8 rows affected.


QUERY PLAN
HashAggregate (cost=216.48..226.48 rows=1000 width=23)
Group Key: film.title
Filter: (count(*) > 10)
-> Hash Join (cost=76.50..175.51 rows=5462 width=15)
Hash Cond: (film_actor.film_id = film.film_id)
-> Seq Scan on film_actor (cost=0.00..84.62 rows=5462 width=2)
-> Hash (cost=64.00..64.00 rows=1000 width=19)
-> Seq Scan on film (cost=0.00..64.00 rows=1000 width=19)


In [33]:
%%sql
Select title, count(*)
FROM film
JOIN film_actor
  ON (film_actor.film_id = film.film_id)
GROUP BY title
HAVING COUNT(*) > 10;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
27 rows affected.


title,count
Arabia Dogma,12
Lonely Elephant,12
Maker Gables,11
Lesson Cleopatra,12
Boondock Ballroom,13
Lambs Cincinatti,15
Crazy Home,13
Spice Sorority,11
Dracula Crystal,13
Chitty Lock,13


[Helpful Hints](https://youtu.be/4dZJoRfP7Kw)  
 

--- 

# 6

### List the average length of the movies in the database, per `language_id`

In [43]:
%%sql
EXPLAIN 
SELECT language_id, AVG(length) 
FROM film
GROUP BY language_id;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
3 rows affected.


QUERY PLAN
HashAggregate (cost=69.00..69.01 rows=1 width=34)
Group Key: language_id
-> Seq Scan on film (cost=0.00..64.00 rows=1000 width=4)


In [42]:
%%sql
SELECT language_id, AVG(length) 
FROM film
GROUP BY language_id;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1 rows affected.


language_id,avg
1,115.272


# 7

### List the average length of the movies in the database, per language name

In [None]:
%%sql
EXPLAIN 


In [44]:
%%sql 
SELECT name, AVG(length) 
FROM language
JOIN film
  ON (film.language_id = language.language_id)
GROUP BY name;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1 rows affected.


name,avg
English,115.272


# 8

### List each film title and its average rental duration in days.

**HINT** `return_date::date` casts the return _timestamp_ to date.  PostgreSQL can do date math natively.

In [48]:
%%sql
EXPLAIN 
SELECT title, AVG(rental_duration)
FROM film
JOIN inventory ON (film.film_id = inventory.film_id)
JOIN rental ON (inventory.inventory_id = rental.inventory_id)
GROUP BY title;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
11 rows affected.


QUERY PLAN
HashAggregate (cost=679.69..692.19 rows=1000 width=47)
Group Key: film.title
-> Hash Join (cost=204.57..599.47 rows=16044 width=17)
Hash Cond: (inventory.film_id = film.film_id)
-> Hash Join (cost=128.07..480.67 rows=16044 width=2)
Hash Cond: (rental.inventory_id = inventory.inventory_id)
-> Seq Scan on rental (cost=0.00..310.44 rows=16044 width=4)
-> Hash (cost=70.81..70.81 rows=4581 width=6)
-> Seq Scan on inventory (cost=0.00..70.81 rows=4581 width=6)
-> Hash (cost=64.00..64.00 rows=1000 width=21)


In [47]:
%%sql
SELECT title, AVG(rental_duration)
FROM film
JOIN inventory ON (film.film_id = inventory.film_id)
JOIN rental ON (inventory.inventory_id = rental.inventory_id)
GROUP BY title;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
958 rows affected.


title,avg
Graceland Dynamite,5.0
Opus Ice,5.0
Braveheart Human,7.0
Wonderful Drop,3.0
Rush Goodfellas,3.0
Purple Movie,4.0
Minority Kiss,4.0
Luke Mummy,5.0
Fantasy Troopers,6.0
Grinch Massage,7.0


# 9

### List the film title and the number of times it has been rented in each city (name and country).

In [54]:
%%sql
EXPLAIN 
SELECT title, city, country, COUNT(*)
FROM film
JOIN inventory ON (film.film_id = inventory.film_id)
JOIN rental ON (inventory.inventory_id = rental.inventory_id)
JOIN customer ON (rental.customer_id = customer.customer_id)
JOIN address ON (customer.address_id = address.address_id)
JOIN city ON (address.city_id = city.city_id)
JOIN country ON (city.country_id = country.country_id)
GROUP BY film.title, city.city, country.country;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
27 rows affected.


QUERY PLAN
HashAggregate (cost=996.90..1157.34 rows=16044 width=74)
"Group Key: film.title, city.city, country.country"
-> Hash Join (cost=270.57..836.46 rows=16044 width=33)
Hash Cond: (city.country_id = country.country_id)
-> Hash Join (cost=267.12..789.26 rows=16044 width=26)
Hash Cond: (address.city_id = city.city_id)
-> Hash Join (cost=248.62..728.34 rows=16044 width=17)
Hash Cond: (customer.address_id = address.address_id)
-> Hash Join (cost=227.05..664.36 rows=16044 width=17)
Hash Cond: (rental.customer_id = customer.customer_id)


In [53]:
%%sql 
SELECT title, city, country, COUNT(*)
FROM film
JOIN inventory ON (film.film_id = inventory.film_id)
JOIN rental ON (inventory.inventory_id = rental.inventory_id)
JOIN customer ON (rental.customer_id = customer.customer_id)
JOIN address ON (customer.address_id = address.address_id)
JOIN city ON (address.city_id = city.city_id)
JOIN country ON (city.country_id = country.country_id)
GROUP BY film.title, city.city, country.country;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
15827 rows affected.


title,city,country,count
Greek Everyone,Tabuk,Saudi Arabia,1
Insider Arizona,Pak Kret,Thailand,1
Fiddler Lost,Kurashiki,Japan,1
Trap Guys,Atlixco,Mexico,1
Mosquito Armageddon,Sungai Petani,Malaysia,1
Fish Opus,Kilis,Turkey,1
Camelot Vacation,Cuautla,Mexico,1
Barbarella Streetcar,Uruapan,Mexico,1
Army Flintstones,Kurashiki,Japan,1
Wedding Apollo,Ciomas,Indonesia,1


# 10

### List the film title, number of times it has been rented, and the most recent rental date, in order of least recently rented, then most rentals.

In [71]:
%%sql
EXPLAIN 
SELECT title, COUNT(*), MAX(rental_date)
FROM film
JOIN inventory ON (film.film_id = inventory.film_id)
JOIN rental ON (rental.inventory_id = inventory.inventory_id)
GROUP BY film.title
ORDER BY ((max(rental.rental_date))::date), (count(*)) DESC;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
13 rows affected.


QUERY PLAN
Sort (cost=822.23..824.73 rows=1000 width=50)
"Sort Key: ((max(rental.rental_date))::date), (count(*)) DESC"
-> HashAggregate (cost=759.91..772.41 rows=1000 width=50)
Group Key: film.title
-> Hash Join (cost=204.57..599.47 rows=16044 width=23)
Hash Cond: (inventory.film_id = film.film_id)
-> Hash Join (cost=128.07..480.67 rows=16044 width=10)
Hash Cond: (rental.inventory_id = inventory.inventory_id)
-> Seq Scan on rental (cost=0.00..310.44 rows=16044 width=12)
-> Hash (cost=70.81..70.81 rows=4581 width=6)


In [70]:
%%sql
SELECT title, COUNT(*), MAX(rental_date)
FROM film
JOIN inventory ON (film.film_id = inventory.film_id)
JOIN rental ON (rental.inventory_id = inventory.inventory_id)
GROUP BY film.title
ORDER BY ((max(rental.rental_date))::date), (count(*)) DESC;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
958 rows affected.


title,count,max
Papi Necklace,8,2005-08-17 17:16:42
Graceland Dynamite,6,2005-08-17 07:46:54
Towers Hurricane,11,2005-08-18 02:37:07
Superfly Trip,10,2005-08-18 18:58:35
Dares Pluto,9,2005-08-18 08:07:45
Miracle Virtual,9,2005-08-18 15:15:44
Impact Aladdin,9,2005-08-18 03:37:31
Hoosiers Birdcage,8,2005-08-18 00:14:03
Lights Deer,8,2005-08-18 13:42:14
Eve Resurrection,7,2005-08-18 19:36:05


# Save your Notebook, then `File > Close and Halt`

---