In [1]:
# import necessary library
import psycopg2
import pandas as pd
import pandas.io.sql as sqlio

#create database connection variable 
conn = psycopg2.connect(user="postgres", password="root1234", host="localhost", database="DVDRental")

### Subquery: 
* Writing a query nested inside a query

Suppose you want to find the films whose rental rate is higher than the average rental rate.

In [2]:
# average rate
query = """SELECT AVG (rental_rate) FROM film;;"""
sqlio.read_sql_query(query,conn)

Unnamed: 0,avg
0,2.98


In [3]:
# average rate
query = """SELECT film_id, title, rental_rate
           FROM film
           WHERE rental_rate > 2.98 limit 5;"""
sqlio.read_sql_query(query,conn)

Unnamed: 0,film_id,title,rental_rate
0,133,Chamber Italian,4.99
1,384,Grosse Wonderful,4.99
2,8,Airport Pollock,4.99
3,98,Bright Encounters,4.99
4,2,Ace Goldfinger,4.99


In [4]:
query = """SELECT film_id, title, rental_rate
           FROM film
           WHERE rental_rate > (SELECT AVG (rental_rate) FROM film) 
           Limit 5;"""
sqlio.read_sql_query(query,conn)

Unnamed: 0,film_id,title,rental_rate
0,133,Chamber Italian,4.99
1,384,Grosse Wonderful,4.99
2,8,Airport Pollock,4.99
3,98,Bright Encounters,4.99
4,2,Ace Goldfinger,4.99


ANY: <br>
retrieve data by comparing a value within a set of values returned by a subquery

#### Write a query that finds the films whose lengths are greater than or equal to the maximum length of any film category.

It has two parts: <br>
    1. Calculate the maximum length of any film category <br>
    2. Find the films whose lengths are greater than or equal to the maximum length of any film category

In [5]:
# Calculate the maximum length of any film category
query = """SELECT fc.category_id, MAX( length )
           FROM film f
           INNER JOIN film_category fc
           on f.film_id = fc.film_id
           GROUP BY category_id;"""
sqlio.read_sql_query(query,conn)

Unnamed: 0,category_id,max
0,4,184
1,14,185
2,3,178
3,10,185
4,7,181
5,13,183
6,9,184
7,1,185
8,5,185
9,2,185


In [7]:
##Find the films whose lengths are greater than or equal to the 
## maximum length of any film category
query = """select title, f.film_id  from film f where 
           length > any 
           (select MAX( length ) FROM film f
           INNER JOIN film_category fc
           on f.film_id = fc.film_id
           GROUP BY category_id);"""
sqlio.read_sql_query(query,conn)

Unnamed: 0,title,film_id
0,Alley Evolution,16
1,Analyze Hoosiers,24
2,Anonymous Human,27
3,Baked Cleopatra,50
4,Casualties Encino,126
5,Born Spinal,88
6,Catch Amistad,128
7,Cause Date,129
8,Chicago North,141
9,Confidential Interview,174


### ANY vs IN
= ANY is equivalnt IN operator

In [8]:
query = """select title, f.film_id  from film f where 
           length = any 
           (select MAX( length ) FROM film f
           INNER JOIN film_category fc
           on f.film_id = fc.film_id
           GROUP BY category_id);"""
sqlio.read_sql_query(query,conn)

Unnamed: 0,title,film_id
0,Analyze Hoosiers,24
1,Catch Amistad,128
2,Chicago North,141
3,Conspiracy Spirit,180
4,Control Anthem,182
5,Crystal Breaking,198
6,Darn Forrester,212
7,Drop Waterfront,256
8,Express Lonely,296
9,Frontier Cabin,340


### ALL
1. query data by comparing a value with a list of values returned by a subquery

* The ALL operator must be preceded by a comparison operator such as equal (=), not equal (!=), greater than (>), greater than or equal to (>=), less than (<), and less than or equal to (<=).

* The ALL operator must be followed by a subquery which also must be surrounded by the parentheses.

With the assumption that the subquery returns some rows, the ALL operator works as follows:

1. **column_name > ALL (subquery)**- the expression evaluates to true if a value is greater than the biggest value returned by the subquery.
2. **column_name >= ALL (subquery)**- the expression evaluates to true if a value is greater than or equal to the biggest value returned by the subquery.
3. **column_name < ALL (subquery)**- the expression evaluates to true if a value is less than the smallest value returned by the subquery.
4. **column_name <= ALL (subquery)** the expression evaluates to true if a value is less than or equal to the smallest value returned by the subquery.
5. **column_name = ALL (subquery)** the expression evaluates to true if a value is equal to any value returned by the subquery.
6. **column_name != ALL (subquery)** the expression evaluates to true if a value is not equal to any value returned by the subquery.

In case the subquery returns no row, then the ALL operator always evaluates to true.

### Write a quey to find all films whose lengths are greater than the list of the average length.

Part 1. Find the average length of all films grouped by rating.

In [9]:
query = """SELECT rating,ROUND(AVG(length), 2) as avg_length
           FROM film 
           GROUP BY rating
           ORDER BY avg_length DESC; """
sqlio.read_sql_query(query,conn)

Unnamed: 0,rating,avg_length
0,PG-13,120.44
1,R,118.66
2,NC-17,113.23
3,PG,112.01
4,G,111.05


Part 2: find all films whose lengths are greater than the list of the average lengths above.

In [14]:
query = """SELECT film_id, title, length
           FROM film WHERE
           length > ALL (
                SELECT ROUND(AVG (length),2)
                FROM film
                GROUP BY rating)
            ORDER BY length;"""
sqlio.read_sql_query(query,conn)

Unnamed: 0,film_id,title,length
0,207,Dangerous Uptown,121
1,86,Boogie Amelie,121
2,403,Harry Idaho,121
3,93,Brannigan Sunrise,121
4,704,Pure Runner,121
...,...,...,...
452,426,Home Pity,185
453,872,Sweet Brotherhood,185
454,817,Soldiers Evolution,185
455,690,Pond Seattle,185


The query returns all films whose lengths are greater than the biggest value in the average length list returned by the subquery.

### EXISTS Operator: 
* check for existance of rows returned by a subquery
* If the subquery returns at least one row, the result of **EXISTS** is true.
* In case the subquery returns no row, the result of **EXISTS** is false.
* The result of EXISTS operator depends on whether any row returned by the subquery, and not on the row contents.
* Therefore, columns that appear on the SELECT clause of the subquery are not important.

### Syntex
**WHERE EXISTS (subquery);** <br>

* **subquery:** A SELECT statement that usually starts with SELECT * rather than a list of expressions or column names. 
* To increase performance, replace the SELECT * with SELECT 1 since the column result of the subquery is not relevant (only the rows returned matters).

#### Find customers who have at least one payment whose amount is greater than 11.

In [15]:
query = '''SELECT first_name, last_name, customer_id
           FROM customer c WHERE 
           EXISTS
            (SELECT 1 FROM payment p
             WHERE p.customer_id = c.customer_id
             AND amount > 11);'''
sqlio.read_sql_query(query,conn)

Unnamed: 0,first_name,last_name,customer_id
0,Karen,Jackson,13
1,Victoria,Gibson,116
2,Vanessa,Sims,195
3,Rosemary,Schmidt,204
4,Tanya,Gilbert,237
5,Nicholas,Barfield,362
6,Kent,Arsenault,591
7,Terrance,Roush,592


The above statement returns customers who have paid at least one rental with an amount greater than 11:

In [17]:
query = '''SELECT customer_id, amount FROM payment p
             WHERE p.customer_id = 195;'''
sqlio.read_sql_query(query,conn)

Unnamed: 0,customer_id,amount
0,195,4.99
1,195,7.99
2,195,2.99
3,195,0.99
4,195,0.99
5,195,0.99
6,195,7.99
7,195,11.99
8,195,6.99
9,195,2.99


To get the customers who have not made any payment greater than 11.

In [18]:
query = '''SELECT first_name, last_name, customer_id
           FROM customer c WHERE 
           NOT EXISTS
            (SELECT 1 FROM payment p
             WHERE p.customer_id = c.customer_id
             AND amount > 11);'''
sqlio.read_sql_query(query,conn)

Unnamed: 0,first_name,last_name,customer_id
0,Jared,Ely,524
1,Mary,Smith,1
2,Patricia,Johnson,2
3,Linda,Williams,3
4,Barbara,Jones,4
...,...,...,...
586,Terrence,Gunderson,595
587,Enrique,Forsythe,596
588,Freddie,Duggan,597
589,Wade,Delvalle,598


In [19]:
query = '''SELECT customer_id, amount FROM payment p
             WHERE p.customer_id = 595;'''
sqlio.read_sql_query(query,conn)

Unnamed: 0,customer_id,amount
0,595,2.99
1,595,4.99
2,595,2.99
3,595,2.99
4,595,0.99
5,595,0.99
6,595,0.99
7,595,2.99
8,595,0.99
9,595,2.99


#### EXISTS and NULL Relation
* if the subquery returns NULL, EXISTS returns true

In [20]:
query = '''SELECT first_name, last_name
           FROM customer
           WHERE EXISTS(SELECT NULL)
           ORDER BY first_name, last_name;
;'''
sqlio.read_sql_query(query,conn)

Unnamed: 0,first_name,last_name
0,Aaron,Selby
1,Adam,Gooch
2,Adrian,Clary
3,Agnes,Bishop
4,Alan,Kahn
...,...,...
594,Willie,Markham
595,Wilma,Richards
596,Yolanda,Weaver
597,Yvonne,Watkins


In [21]:
query = '''SELECT first_name, last_name
           FROM customer;'''
sqlio.read_sql_query(query,conn)

Unnamed: 0,first_name,last_name
0,Jared,Ely
1,Mary,Smith
2,Patricia,Johnson
3,Linda,Williams
4,Barbara,Jones
...,...,...
594,Terrence,Gunderson
595,Enrique,Forsythe
596,Freddie,Duggan
597,Wade,Delvalle
