# Nested Query Exercise

Please remember to use the `EXPLAIN` before you execute a query to help avoid unnecessary load on the DBMS and indefinite waits by you for results.

Therefore, for each question, we are providing a cell for the `EXPLAIN` as well as the final SQL.


## Our practice schema:

We will be using the DVD rental schema for this exercise.

The ERD is available [here](../images/ERD-Rental.pdf).  
Printing is recommended.


<span style="font-weight:900; background:yellow">Each query should be implemented with at least one nested query.</span>

In [1]:
%load_ext sql
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dvdrental

'Connected: dsa_ro_user@dvdrental'

# 1

### Which films have no rentals on the date of 2005-05-31

**HINT:** PostgreSQL can cast a _timestamp_ to a _date_ as so: `rental.rental_date::date`.

In [17]:
%%sql
EXPLAIN 
SELECT DISTINCT film_id 
FROM film 
WHERE film_id NOT IN (SELECT f.film_id
                  FROM film f JOIN inventory i on f.film_id = i.film_id 
                  JOIN rental r ON i.inventory_id = r.inventory_id
                  WHERE rental_date::date = '2005-05-31'
                  GROUP BY  f.film_id, rental_date::date
                 )       
ORDER BY film_id ;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
17 rows affected.


QUERY PLAN
Unique (cost=534.88..631.27 rows=500 width=4)
-> Index Only Scan using film_pkey on film (cost=534.88..630.02 rows=500 width=4)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
-> Group (cost=533.60..534.40 rows=80 width=8)
"Group Key: f.film_id, ((r.rental_date)::date)"
-> Sort (cost=533.60..533.80 rows=80 width=8)
Sort Key: f.film_id
-> Nested Loop (cost=391.93..531.08 rows=80 width=8)
-> Hash Join (cost=391.66..503.35 rows=80 width=10)


In [18]:
%%sql
SELECT DISTINCT film_id 
FROM film 
WHERE film_id NOT IN (SELECT f.film_id
                  FROM film f JOIN inventory i on f.film_id = i.film_id 
                  JOIN rental r ON i.inventory_id = r.inventory_id
                  WHERE rental_date::date = '2005-05-31'
                  GROUP BY  f.film_id, rental_date::date
                 )       
ORDER BY film_id ;


 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
851 rows affected.


film_id
1
2
5
6
7
9
10
11
12
13


[Helpful Hints](https://youtu.be/MWpp2ioeAb8)  
 

--- 

# 2

### Which customers (name, phone number) have outstanding rentals (film name, rental_date)?

In [21]:
%%sql
EXPLAIN 
SELECT title, rental_date, first_name, last_name, phone
FROM film 
JOIN inventory ON film.film_id = inventory.film_id
JOIN rental ON inventory.inventory_id = rental.inventory_id
JOIN customer ON rental.customer_id = customer.customer_id
JOIN address ON customer.address_id = address.address_id 
WHERE customer.customer_id IN (SELECT customer_id FROM rental WHERE return_date IS NUll )












 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
24 rows affected.


QUERY PLAN
Hash Join (cost=560.32..1007.94 rows=5320 width=48)
Hash Cond: (inventory.film_id = film.film_id)
-> Hash Join (cost=483.82..917.42 rows=5320 width=35)
Hash Cond: (rental.inventory_id = inventory.inventory_id)
-> Hash Join (cost=355.75..775.37 rows=5320 width=37)
Hash Cond: (rental.customer_id = customer.customer_id)
-> Seq Scan on rental (cost=0.00..310.44 rows=16044 width=14)
-> Hash (cost=353.46..353.46 rows=183 width=31)
-> Hash Join (cost=335.34..353.46 rows=183 width=31)
Hash Cond: (address.address_id = customer.address_id)


In [22]:
%%sql
SELECT title, rental_date, first_name, last_name, phone
FROM film 
JOIN inventory ON film.film_id = inventory.film_id
JOIN rental ON inventory.inventory_id = rental.inventory_id
JOIN customer ON rental.customer_id = customer.customer_id
JOIN address ON customer.address_id = address.address_id 
WHERE customer.customer_id IN (SELECT customer_id FROM rental WHERE return_date IS NUll )









 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
4388 rows affected.


title,rental_date,first_name,last_name,phone
Swarm Gold,2005-05-24 23:11:53,Cassandra,Walters,685010736240
Whale Bikini,2005-05-25 00:09:02,April,Burns,581174211853
King Evolution,2005-05-25 00:22:55,Raymond,Mcwhorter,834626715837
Apache Divine,2005-05-25 01:59:46,Craig,Morrell,426255288071
Whale Bikini,2005-05-25 02:40:21,Barry,Lovelace,689681677428
Chainsaw Uptown,2005-05-25 03:47:12,Marie,Turner,177727722820
Witches Panic,2005-05-25 04:05:17,Fred,Wheat,561729882725
Microcosmos Paradise,2005-05-25 04:19:28,Freddie,Duggan,644021380889
Massage Image,2005-05-25 05:39:25,Neil,Renner,478380208348
Gun Bonnie,2005-05-25 06:20:46,Jenny,Castro,62781725285


# 3

### List the movies that are not categorized as children's movies.

In [56]:
%%sql
EXPLAIN 
SELECT title FROM film as f JOIN film_category as fc ON f.film_id = fc.film_id
WHERE NOT EXISTS (SELECT * FROM category WHERE fc.category_id = category.category_id
                 AND name = 'Children')






 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
10 rows affected.


QUERY PLAN
Hash Join (cost=77.71..108.26 rows=938 width=15)
Hash Cond: (fc.film_id = f.film_id)
-> Hash Anti Join (cost=1.21..29.29 rows=938 width=2)
Hash Cond: (fc.category_id = category.category_id)
-> Seq Scan on film_category fc (cost=0.00..16.00 rows=1000 width=4)
-> Hash (cost=1.20..1.20 rows=1 width=4)
-> Seq Scan on category (cost=0.00..1.20 rows=1 width=4)
Filter: ((name)::text = 'Children'::text)
-> Hash (cost=64.00..64.00 rows=1000 width=19)
-> Seq Scan on film f (cost=0.00..64.00 rows=1000 width=19)


In [63]:
%%sql
SELECT title FROM film as f JOIN film_category as fc ON f.film_id = fc.film_id
WHERE NOT EXISTS (SELECT * FROM category WHERE fc.category_id = category.category_id
                 AND name = 'Children') ;








 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
940 rows affected.


title
Academy Dinosaur
Ace Goldfinger
Adaptation Holes
Affair Prejudice
African Egg
Agent Truman
Airplane Sierra
Airport Pollock
Alabama Devil
Aladdin Calendar


[Helpful Hints](https://youtu.be/9WR0ByMn__E)  
 

--- 

# 4

### List the names of the customers who have rented the 5 least popular movies.

**The five least popular movies are those movies with the least film rentals**

(Do not include movies that have never been rented, also do not worry about ties go with the 5 even though there may be other movies rented the same number of times as some in the 5 least popular.)

In [39]:
%%sql
EXPLAIN
select first_name, last_name 
FROM customer c JOIN rental r on c.customer_id = r.customer_id JOIN inventory i ON r.inventory_id = i.inventory_id
WHERE film_id IN (SELECT film_id 
                  FROM(
                    select f.film_id, count(*)
                     FROM film f JOIN inventory i ON f.film_id = i.film_id JOIN rental r ON r.inventory_id = i.inventory_id
                     GROUP BY f.film_id 
                     ORDER BY COUNT(*) ASC
                     LIMIT 5 
                      ) as cool  )



 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
25 rows affected.


QUERY PLAN
Nested Loop (cost=706.98..827.74 rows=84 width=13)
-> Nested Loop (cost=706.71..802.90 rows=84 width=2)
-> Hash Join (cost=706.42..789.31 rows=24 width=4)
Hash Cond: (i.film_id = cool.film_id)
-> Seq Scan on inventory i (cost=0.00..70.81 rows=4581 width=6)
-> Hash (cost=706.36..706.36 rows=5 width=4)
-> Subquery Scan on cool (cost=706.30..706.36 rows=5 width=4)
-> Limit (cost=706.30..706.31 rows=5 width=12)
-> Sort (cost=706.30..708.80 rows=1000 width=12)
Sort Key: (count(*))


In [40]:
%%sql
select first_name, last_name 
FROM customer c JOIN rental r on c.customer_id = r.customer_id JOIN inventory i ON r.inventory_id = i.inventory_id
WHERE film_id IN (SELECT film_id 
                  FROM(
                    select f.film_id, count(*)
                     FROM film f JOIN inventory i ON f.film_id = i.film_id JOIN rental r ON r.inventory_id = i.inventory_id
                     GROUP BY f.film_id 
                     ORDER BY COUNT(*) ASC
                     LIMIT 5 
                      ) as cool  )



 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
22 rows affected.


first_name,last_name
Ramon,Choate
William,Satterfield
Raul,Fortier
Hector,Poindexter
Craig,Morrell
Marsha,Douglas
April,Burns
Bob,Pfeiffer
Julia,Flores
Pauline,Henry


# 5

### List the movies that have been rented by the top ten renters.

In [171]:
%%sql
EXPLAIN 
SELECT title from film WHERE film_id in (
SELECT film_id from inventory WHERE inventory_id in (
SELECT inventory_id from rental WHERE customer_id in ( 
SELECT customer_id from rental GROUP BY customer_id 
                               ORDER BY count(customer_id) DESC LIMIT 10)))





 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
19 rows affected.


QUERY PLAN
Hash Semi Join (cost=855.22..924.82 rows=268 width=15)
Hash Cond: (film.film_id = inventory.film_id)
-> Seq Scan on film (cost=0.00..64.00 rows=1000 width=19)
-> Hash (cost=851.87..851.87 rows=268 width=2)
-> Hash Semi Join (cost=766.05..851.87 rows=268 width=2)
Hash Cond: (inventory.inventory_id = rental.inventory_id)
-> Seq Scan on inventory (cost=0.00..70.81 rows=4581 width=6)
-> Hash (cost=762.70..762.70 rows=268 width=4)
-> Hash Join (cost=409.84..762.70 rows=268 width=4)
"Hash Cond: (rental.customer_id = ""ANY_subquery"".customer_id)"


In [172]:
%%sql
SELECT title from film WHERE film_id in (
SELECT film_id from inventory WHERE inventory_id in (
SELECT inventory_id from rental WHERE customer_id in ( 
SELECT customer_id from rental GROUP BY customer_id 
                               ORDER BY count(customer_id) DESC LIMIT 10)))






 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
324 rows affected.


title
Chamber Italian
Airport Pollock
Adaptation Holes
Affair Prejudice
Alabama Devil
Date Speed
Ali Forever
Alone Trip
Alter Victory
American Circus


# 6

### Consider the previous question and the answer SQL.  Now add a column to the result that is the total number of movie rentals for the _top-ten renters_ per film.

In [63]:
%%sql
EXPLAIN
SELECT count(*), title
FROM rental r JOIN inventory i on i.inventory_id = r.inventory_id JOIN film f ON f.film_id = i.film_id
WHERE r.customer_id in ( 
    SELECT customer_id 
    FROM (
            SELECT COUNT(*),r.customer_id
            FROM rental r 
            GROUP BY r.customer_id 
            ORDER BY COUNT(*) DESC  
            LIMIT 10 ) as neat )
GROUP BY f.title
ORDER BY f.title



 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
22 rows affected.


QUERY PLAN
GroupAggregate (cost=933.78..938.47 rows=268 width=38)
Group Key: f.title
-> Sort (cost=933.78..934.45 rows=268 width=15)
Sort Key: f.title
-> Hash Join (cost=486.63..922.98 rows=268 width=15)
Hash Cond: (i.film_id = f.film_id)
-> Nested Loop (cost=410.13..845.77 rows=268 width=2)
-> Hash Join (cost=409.84..762.70 rows=268 width=4)
Hash Cond: (r.customer_id = neat.customer_id)
-> Seq Scan on rental r (cost=0.00..310.44 rows=16044 width=6)


In [59]:
%%sql
SELECT count(*), title
FROM rental r JOIN inventory i on i.inventory_id = r.inventory_id JOIN film f ON f.film_id = i.film_id
WHERE r.customer_id in ( 
    SELECT customer_id 
    FROM (
            SELECT COUNT(*),r.customer_id
            FROM rental r 
            GROUP BY r.customer_id 
            ORDER BY COUNT(*) DESC  
            LIMIT 10 ) as neat )
GROUP BY f.title
ORDER BY f.title






 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
324 rows affected.


count,title
1,Adaptation Holes
2,Affair Prejudice
1,Airport Pollock
1,Alabama Devil
1,Ali Forever
1,Alone Trip
1,Alter Victory
3,American Circus
1,Amistad Midsummer
2,Anaconda Confessions


# 7

### List the city of rental stores, `store_id` and the movies that have not been rented from that store.

**Note:** A video walk through for this challenging SQL is provided below.

In [188]:
%%sql
EXPLAIN 
SELECT c.city, s.store_id, f.title, f.film_id
FROM store s 
JOIN address a using (address_id)
JOIN city c using (city_id)
, film f
WHERE NOT EXISTS(
    SELECT 'Z'
    FROM film f2 JOIn inventory i USING (film_id)
    JOIN rental r USING (inventory_id)
    JOIN payment p USING (rental_id, staff_id)
    JOIN staff ss USING (staff_id)
    WHERE f2.film_id = f.film_id 
    AND s.store_id = ss.store_id
)

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
31 rows affected.


QUERY PLAN
Hash Anti Join (cost=1296.26..1427.01 rows=1 width=32)
Hash Cond: ((s.store_id = ss.store_id) AND (f.film_id = f2.film_id))
-> Nested Loop (cost=1.32..107.07 rows=2000 width=32)
-> Seq Scan on film f (cost=0.00..64.00 rows=1000 width=19)
-> Materialize (cost=1.32..18.07 rows=2 width=13)
-> Nested Loop (cost=1.32..18.06 rows=2 width=13)
-> Hash Join (cost=1.04..17.36 rows=2 width=6)
Hash Cond: (a.address_id = s.address_id)
-> Seq Scan on address a (cost=0.00..14.03 rows=603 width=6)
-> Hash (cost=1.02..1.02 rows=2 width=6)


In [189]:
%%sql
SELECT c.city, s.store_id, f.title, f.film_id
FROM store s 
JOIN address a using (address_id)
JOIN city c using (city_id)
, film f
WHERE NOT EXISTS(
    SELECT 'Z'
    FROM film f2 JOIn inventory i USING (film_id)
    JOIN rental r USING (inventory_id)
    JOIN payment p USING (rental_id, staff_id)
    JOIN staff ss USING (staff_id)
    WHERE f2.film_id = f.film_id 
    AND s.store_id = ss.store_id
)






 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dvdrental
145 rows affected.


city,store_id,title,film_id
Lethbridge,1,Grosse Wonderful,384
Lethbridge,1,Alice Fantasia,14
Woodridge,2,Alice Fantasia,14
Lethbridge,1,Apollo Teen,33
Woodridge,2,Apollo Teen,33
Lethbridge,1,Argonauts Town,36
Woodridge,2,Argonauts Town,36
Lethbridge,1,Ark Ridgemont,38
Woodridge,2,Ark Ridgemont,38
Lethbridge,1,Arsenic Independence,41


#### Helpful Hints
  1. For the first hint watch only the first 5:57 of the video where the conceptual aspects of the task are discussed.
  1. Then attempt to construct SQL based on the video explanation of the concept.
  1. If you get stuck again, the remainder of the video after that looks directly at the SQL construction.
  
[Helpful Hints](https://youtu.be/GyMODTEDfu4)  


# Save your Notebook, then `File > Close and Halt`