# Basic Statistics Practice

Please remember to use the `EXPLAIN` before you execute a query to help avoid unnecessary load on the DBMS and indefinite waits by you for results.

Therefore, for each question, we are providing a cell for the `EXPLAIN` as well as the final SQL.


## Our practice schema:

A PDF of the _Entity-Relationship Diagrams_ (ERD) is available [here](https://web.dsa.missouri.edu/static/PDF/DVD_Rental_ERD2.pdf).   
Printing it out is recommended.


In [1]:
%load_ext sql
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dvdrental


'Connected: dsa_ro_user@dvdrental'

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
0 rows affected.


table_catalog,table_schema,table_name,column_name,ordinal_position,column_default,is_nullable,data_type,character_maximum_length,character_octet_length,numeric_precision,numeric_precision_radix,numeric_scale,datetime_precision,interval_type,interval_precision,character_set_catalog,character_set_schema,character_set_name,collation_catalog,collation_schema,collation_name,domain_catalog,domain_schema,domain_name,udt_catalog,udt_schema,udt_name,scope_catalog,scope_schema,scope_name,maximum_cardinality,dtd_identifier,is_self_referencing,is_identity,identity_generation,identity_start,identity_increment,identity_maximum,identity_minimum,identity_cycle,is_generated,generation_expression,is_updatable


# 1

### List the average length of the movies (film.length) in the database.

In [9]:
%%sql
EXPLAIN 
SELECT AVG(length)
FROM film ;




 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
2 rows affected.


QUERY PLAN
Aggregate (cost=66.50..66.51 rows=1 width=32)
-> Seq Scan on film (cost=0.00..64.00 rows=1000 width=2)


In [10]:
%%sql
SELECT AVG(length)
FROM film ;






 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1 rows affected.


avg
115.272


[Helpful Hints](https://youtu.be/yMrP0cr_rqo)  
 

--- 

# 2

### List the number of rows in the payment table.

In [11]:
%%sql
EXPLAIN 
SELECT count(*)
FROM payment  ;






 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
2 rows affected.


QUERY PLAN
Aggregate (cost=290.45..290.46 rows=1 width=8)
-> Seq Scan on payment (cost=0.00..253.96 rows=14596 width=0)


In [12]:
%%sql
SELECT count(*)
FROM payment ;






 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
1 rows affected.


count
14596


# 3

### List each category (`category.name`) and the number of films in that category.

In [24]:
%%sql
EXPLAIN 
SELECT name, count(*)
FROM category 
INNER JOIN film_category
ON category.category_id = film_category.category_id
GROUP BY name ;






 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
7 rows affected.


QUERY PLAN
HashAggregate (cost=25.67..25.83 rows=16 width=76)
Group Key: category.name
-> Hash Join (cost=1.36..20.67 rows=1000 width=68)
Hash Cond: (film_category.category_id = category.category_id)
-> Seq Scan on film_category (cost=0.00..16.00 rows=1000 width=2)
-> Hash (cost=1.16..1.16 rows=16 width=72)
-> Seq Scan on category (cost=0.00..1.16 rows=16 width=72)


In [23]:
%%sql
SELECT name, count(*)
FROM category 
INNER JOIN film_category
ON category.category_id = film_category.category_id
GROUP BY name ;





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
16 rows affected.


name,count
Family,69
Games,61
Animation,66
Classics,57
Documentary,68
New,63
Sports,74
Children,60
Music,51
Travel,57


[Helpful Hints](https://youtu.be/YRMI8myh9WY)  
 

--- 

# 4

### List each film title and the number of actors in that film.

In [25]:
%%sql
EXPLAIN 
SELECT title, count(*)
FROM film JOIN film_actor
ON film.film_id = film_actor.film_id
GROUP BY title   ;





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
7 rows affected.


QUERY PLAN
HashAggregate (cost=202.82..212.82 rows=1000 width=23)
Group Key: film.title
-> Hash Join (cost=76.50..175.51 rows=5462 width=15)
Hash Cond: (film_actor.film_id = film.film_id)
-> Seq Scan on film_actor (cost=0.00..84.62 rows=5462 width=2)
-> Hash (cost=64.00..64.00 rows=1000 width=19)
-> Seq Scan on film (cost=0.00..64.00 rows=1000 width=19)


In [26]:
%%sql
SELECT title, count(*)
FROM film JOIN film_actor
ON film.film_id = film_actor.film_id
GROUP BY title ;





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
997 rows affected.


title,count
Graceland Dynamite,6
Opus Ice,4
Braveheart Human,4
Wonderful Drop,4
Rush Goodfellas,5
Purple Movie,8
Minority Kiss,1
Luke Mummy,9
Fantasy Troopers,9
Grinch Massage,6


# 5

### List each film title and the number of actors in that film, for films with more than 10 actors.

In [27]:
%%sql
EXPLAIN 
SELECT title, count(*)
FROM film JOIN film_actor
ON film.film_id = film_actor.film_id
GROUP BY title
HAVING count(*) > 10 ;





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
8 rows affected.


QUERY PLAN
HashAggregate (cost=216.48..226.48 rows=1000 width=23)
Group Key: film.title
Filter: (count(*) > 10)
-> Hash Join (cost=76.50..175.51 rows=5462 width=15)
Hash Cond: (film_actor.film_id = film.film_id)
-> Seq Scan on film_actor (cost=0.00..84.62 rows=5462 width=2)
-> Hash (cost=64.00..64.00 rows=1000 width=19)
-> Seq Scan on film (cost=0.00..64.00 rows=1000 width=19)


In [28]:
%%sql
SELECT title, count(*)
FROM film JOIN film_actor
ON film.film_id = film_actor.film_id
GROUP BY title
HAVING count(*) > 10 ;





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
27 rows affected.


title,count
Arabia Dogma,12
Lonely Elephant,12
Maker Gables,11
Lesson Cleopatra,12
Boondock Ballroom,13
Lambs Cincinatti,15
Crazy Home,13
Spice Sorority,11
Dracula Crystal,13
Chitty Lock,13


[Helpful Hints](https://youtu.be/4dZJoRfP7Kw)  
 

--- 

# 6

### List the average length of the movies in the database, per `language_id`

In [20]:
%%sql
EXPLAIN 
SELECT language.language_id, AVG(length)
FROM film RIGHT JOIN language
ON film.language_id = language.language_id
GROUP BY language.language_id ;




 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
7 rows affected.


QUERY PLAN
HashAggregate (cost=74.62..74.69 rows=6 width=36)
Group Key: language.language_id
-> Hash Right Join (cost=1.14..69.62 rows=1000 width=6)
Hash Cond: (film.language_id = language.language_id)
-> Seq Scan on film (cost=0.00..64.00 rows=1000 width=4)
-> Hash (cost=1.06..1.06 rows=6 width=4)
-> Seq Scan on language (cost=0.00..1.06 rows=6 width=4)


In [21]:
%%sql
SELECT language.language_id, AVG(length)
FROM film RIGHT JOIN language
ON film.language_id = language.language_id
GROUP BY language.language_id ;





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
6 rows affected.


language_id,avg
6,
1,115.272
3,
5,
4,
2,


# 7

### List the average length of the movies in the database, per language name

In [17]:
%%sql
EXPLAIN 
SELECT DISTINCT(name),AVG(length)
FROM film RIGHT JOIN language
ON film.language_id = language.language_id
GROUP BY name   ;





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
10 rows affected.


QUERY PLAN
Unique (cost=74.77..74.82 rows=6 width=116)
-> Sort (cost=74.77..74.79 rows=6 width=116)
"Sort Key: language.name, (avg(film.length))"
-> HashAggregate (cost=74.62..74.69 rows=6 width=116)
Group Key: language.name
-> Hash Right Join (cost=1.14..69.62 rows=1000 width=86)
Hash Cond: (film.language_id = language.language_id)
-> Seq Scan on film (cost=0.00..64.00 rows=1000 width=4)
-> Hash (cost=1.06..1.06 rows=6 width=88)
-> Seq Scan on language (cost=0.00..1.06 rows=6 width=88)


In [16]:
%%sql 
SELECT DISTINCT(name),AVG(length)
FROM film RIGHT JOIN language
ON film.language_id = language.language_id
GROUP BY name  ;





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
6 rows affected.


name,avg
English,115.272
French,
German,
Italian,
Japanese,
Mandarin,


# 8

### List each film title and its average rental duration in days.

**HINT** `return_date::date` casts the return _timestamp_ to date.  PostgreSQL can do date math natively.

In [3]:
%%sql
EXPLAIN 
SELECT title, AVG(return_date::timestamp - rental_date::timestamp) as rentalduration
FROM film JOIN inventory
ON film.film_id = inventory.film_id
JOIN rental 
ON inventory.inventory_id = rental.inventory_id
GROUP BY title  ;





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
11 rows affected.


QUERY PLAN
HashAggregate (cost=719.80..732.30 rows=1000 width=31)
Group Key: film.title
-> Hash Join (cost=204.57..599.47 rows=16044 width=31)
Hash Cond: (inventory.film_id = film.film_id)
-> Hash Join (cost=128.07..480.67 rows=16044 width=18)
Hash Cond: (rental.inventory_id = inventory.inventory_id)
-> Seq Scan on rental (cost=0.00..310.44 rows=16044 width=20)
-> Hash (cost=70.81..70.81 rows=4581 width=6)
-> Seq Scan on inventory (cost=0.00..70.81 rows=4581 width=6)
-> Hash (cost=64.00..64.00 rows=1000 width=19)


In [4]:
%%sql
SELECT title, AVG(return_date::timestamp - rental_date::timestamp) as average_rental_duration
FROM film JOIN inventory
ON film.film_id = inventory.film_id
JOIN rental 
ON inventory.inventory_id = rental.inventory_id
GROUP BY title   ;






 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
958 rows affected.


title,average_rental_duration
Graceland Dynamite,"5 days, 7:15:00"
Opus Ice,"4 days, 22:39:10.909091"
Braveheart Human,"4 days, 0:28:36"
Wonderful Drop,"6 days, 0:31:20"
Rush Goodfellas,"4 days, 11:33:52.258064"
Purple Movie,"3 days, 23:57:46.153846"
Minority Kiss,"4 days, 23:39:49.090909"
Luke Mummy,"5 days, 0:10:42.857142"
Fantasy Troopers,"5 days, 5:32:46.153846"
Grinch Massage,"5 days, 5:10:04.285715"


# 9

### List the film title and the number of times it has been rented in each city (name and country).

In [44]:
%%sql
EXPLAIN 
SELECT title, city, country, count(*)
FROM film JOIN inventory 
ON film.film_id = inventory.film_id
JOIN rental 
ON inventory.inventory_id = rental.inventory_id
JOIN customer 
ON customer.customer_id = rental.customer_id
JOIN address 
ON customer.address_id = address.address_id
JOIN city 
ON address.city_id = city.city_id
JOIN country 
ON country.country_id = city.country_id
GROUP BY title, city, country   ;




 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
27 rows affected.


QUERY PLAN
HashAggregate (cost=996.90..1157.34 rows=16044 width=41)
"Group Key: film.title, city.city, country.country"
-> Hash Join (cost=270.57..836.46 rows=16044 width=33)
Hash Cond: (city.country_id = country.country_id)
-> Hash Join (cost=267.12..789.26 rows=16044 width=26)
Hash Cond: (address.city_id = city.city_id)
-> Hash Join (cost=248.62..728.34 rows=16044 width=17)
Hash Cond: (customer.address_id = address.address_id)
-> Hash Join (cost=227.05..664.36 rows=16044 width=17)
Hash Cond: (rental.customer_id = customer.customer_id)


In [45]:
%%sql 
SELECT title, city, country, count(*)
FROM film JOIN inventory 
ON film.film_id = inventory.film_id
JOIN rental 
ON inventory.inventory_id = rental.inventory_id
JOIN customer 
ON customer.customer_id = rental.customer_id
JOIN address 
ON customer.address_id = address.address_id
JOIN city 
ON address.city_id = city.city_id
JOIN country 
ON country.country_id = city.country_id
GROUP BY title, city, country  ;





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
15827 rows affected.


title,city,country,count
Greek Everyone,Tabuk,Saudi Arabia,1
Insider Arizona,Pak Kret,Thailand,1
Fiddler Lost,Kurashiki,Japan,1
Trap Guys,Atlixco,Mexico,1
Mosquito Armageddon,Sungai Petani,Malaysia,1
Fish Opus,Kilis,Turkey,1
Camelot Vacation,Cuautla,Mexico,1
Barbarella Streetcar,Uruapan,Mexico,1
Army Flintstones,Kurashiki,Japan,1
Wedding Apollo,Ciomas,Indonesia,1


# 10

### List the film title, number of times it has been rented, and the most recent rental date, in order of least recently rented, then most rentals.

In [53]:
%%sql
EXPLAIN 
SELECT title, Count(*), MAX(rental_date)
FROM film JOIN inventory 
ON film.film_id = inventory.film_id
JOIN rental 
ON inventory.inventory_id = rental.inventory_id
GROUP BY title
ORDER BY MAX(rental_date) asc ;






 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
13 rows affected.


QUERY PLAN
Sort (cost=779.62..782.12 rows=1000 width=31)
Sort Key: (max(rental.rental_date))
-> HashAggregate (cost=719.80..729.80 rows=1000 width=31)
Group Key: film.title
-> Hash Join (cost=204.57..599.47 rows=16044 width=23)
Hash Cond: (inventory.film_id = film.film_id)
-> Hash Join (cost=128.07..480.67 rows=16044 width=10)
Hash Cond: (rental.inventory_id = inventory.inventory_id)
-> Seq Scan on rental (cost=0.00..310.44 rows=16044 width=12)
-> Hash (cost=70.81..70.81 rows=4581 width=6)


In [52]:
%%sql
SELECT title, Count(*), MAX(rental_date)
FROM film JOIN inventory 
ON film.film_id = inventory.film_id
JOIN rental 
ON inventory.inventory_id = rental.inventory_id
GROUP BY title
ORDER BY MAX(rental_date) asc   ;






 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
958 rows affected.


title,count,max
Graceland Dynamite,6,2005-08-17 07:46:54
Papi Necklace,8,2005-08-17 17:16:42
Hoosiers Birdcage,8,2005-08-18 00:14:03
Towers Hurricane,11,2005-08-18 02:37:07
Impact Aladdin,9,2005-08-18 03:37:31
Leathernecks Dwarfs,7,2005-08-18 07:16:58
Dares Pluto,9,2005-08-18 08:07:45
Conspiracy Spirit,5,2005-08-18 12:58:40
Lights Deer,8,2005-08-18 13:42:14
Miracle Virtual,9,2005-08-18 15:15:44


# Save your Notebook, then `File > Close and Halt`

---