# Install packages to work with PostgreSQL DB
* ipython-sql ==> enable SQL commands to work in Jupyter notebook environments
* psycopg-2 ==> the package to connect with postgreSQL database

In [1]:
!pip install ipython-sql psycopg2



# Activate the SQL environment

In [2]:
%load_ext sql

# Connect to postgresql database
* These exercises use the testdb.sql database that was provided
* Make sure to have postgresql installed on your machines
* the following command assumes that you have created a test database named "testdb" by running
``` postgres=# create database testdb;```
* import the test database provided (testdb.sql) into the newly created testdb database by running the following command from command line. It assumes "postgres" as the database user. Enter the password if you have any.
``` psql -U postgres -W -d testdb -f testdb.sql```

Note: If you have a password for your database, then replace the following command with
```%sql postgresql://<username>:<password>@<host>:<port>/<database>```

In [4]:
%sql postgresql://postgres:070804@localhost:5432/testdb_1

# Basic Query Structure

A typical SQL query has the following form:
```
select A_1, A_2,...,A_n
from R_1, R_2,..R_m
where P
```
Here
- $A_i$ represents an attribute
- $R_i$ represents a relation (table)
- $P$ is a predicate

The result of an SQL query is a relation (table)

### The select clause
- The **select** clause lists the attributes desired in the result of a query
    - corresponds to the projection operation $\pi$ of relational algebra

**Query:** Retrieve first and last name of all actors

In [None]:
%sql select first_name, last_name from actor;

*Note* SQL names are case insensitive (you may use upper- or lower-case letters)

In [None]:
%sql select FIRST_NAME, last_Name from actor;

**Query:** List names of all films and their descriptions

In [None]:
%sql select title, description from film;

**Exercise** Try out these simple queries
1. Show the email addresses of all customers from the `customer` relation
2. List the names of all languages available in the `language` relation
3. Which all ratings can a film have? (try out this query with inserting keyword **distinct** after select to force duplicate elimination)

In [5]:
%sql select distinct rating from film;

 * postgresql://postgres:***@localhost:5432/testdb_1
5 rows affected.


rating
PG-13
PG
R
G
NC-17


An asterisks (*) in the select clause denotes "all attributes"

In [None]:
%sql select * from customer;

An attribute can be literal with no from clause

In [6]:
%sql select '123';

 * postgresql://postgres:***@localhost:5432/testdb_1
1 rows affected.


?column?
123


You can give the a column a name using

In [7]:
%sql select '123' as foo;

 * postgresql://postgres:***@localhost:5432/testdb_1
1 rows affected.


foo
123


An attribute can be a literal with a **from** clause

In [9]:
%sql select '2006' from film limit 5;

 * postgresql://postgres:***@localhost:5432/testdb_1
5 rows affected.


?column?
2006
2006
2006
2006
2006


#### Generalized Projects

- Recall from previous lecture
- The **select** clause can contain arithmetic expressions involving operations +,-,*, and /, and operating on constants or attributes of tuples

Try out the following query

In [10]:
%%sql select customer_id, payment_date, amount, amount*10 as inflated_amount 
from payment limit 10; 

 * postgresql://postgres:***@localhost:5432/testdb_1
10 rows affected.


customer_id,payment_date,amount,inflated_amount
341,2007-02-15 22:25:46.996577,7.99,79.9
341,2007-02-16 17:23:14.996577,1.99,19.9
341,2007-02-16 22:41:45.996577,7.99,79.9
341,2007-02-19 19:39:56.996577,2.99,29.9
341,2007-02-20 17:31:48.996577,7.99,79.9
341,2007-02-21 12:33:49.996577,5.99,59.9
342,2007-02-17 23:58:17.996577,5.99,59.9
342,2007-02-20 02:11:44.996577,5.99,59.9
342,2007-02-20 13:57:39.996577,2.99,29.9
343,2007-02-16 00:10:50.996577,4.99,49.9


### The Where Clause
- The **where** clause specifies conditions that the result must satisfy
    - Corresponds to the selection predicate ($\theta$) of the relational algebra
- SQL allows the use of the logical connectives `and`, `or`, and `not`
- The operands of the logical connectives can be expressions involving the comparison operators <, <=, >=, =, and <>
- Comparisons can be applied to the results of arithmeric expressions

**Query:**  Find all films that were released in or after 2005

In [None]:
%%sql
select film_id, title
from film
where release_year >= 2005

**Query:** find all films that are PG-13 rated with rental duration of more than 5 days.

In [None]:
%%sql select film_id, title
from film
where rating = 'PG-13' and rental_duration > 5


### The from clause
- The **from** clause lists the relations involved in the query
    - Corresponds to the Cartesian product ($\times$) operation of the relational algebra
- Cartesian products are not very useful directly, but useful when combined with `where` clause condition

**Query:** Retrieve information about rentals and corresponding payments made.

In [None]:
%%sql select rental.rental_id, rental.rental_date, payment.amount
from rental, payment
where rental.rental_id = payment.rental_id;

**Query:** Find names of all action movies

In [12]:
%%sql select title, name 
from film, film_category, category 
where film.film_id = film_category.film_id 
    and film_category.category_id = category.category_id 
    and name = 'Action' limit 2;

 * postgresql://postgres:***@localhost:5432/testdb_1
2 rows affected.


title,name
Amadeus Holy,Action
American Circus,Action


### The Rename Operation
- SQL allows renaming relations and attributes using the **as** clause
- *Note* the **as** is an optional and may be omitted

**Query:** Find all films that have a longer duration than some PG rated film.

In [None]:
%%sql select distinct X.title
from film as X, film Y
where X.length > Y.length and Y.rating = 'PG'

### Joins
- **Join operation** take two relations and return as a
result another relation.
- A join operation is a Cartesian product which requires
that tuples in the two relations match (under some
condition). It also specifies the attributes that are
present in the result of the join
- The join operations are typically used as subquery
expressions in the from clause
- Three types of joins:
    * Natural join
    * Inner join
    * Outer join

**Query:** List names of actors along with film ID of the films they have acted in

In [None]:
%%sql
select film_id, first_name, last_name
from film_actor, actor
where film_actor.actor_id = actor.actor_id

**Q:** Why does the query below have empty result?

In [None]:
%%sql
select film_id, first_name, last_name
from film_actor natural join actor

**Query** Retrieve information about rentals and corresponding payments made.

In [None]:
%%sql SELECT rental.rental_id, rental.rental_date, payment.amount
from rental
join payment on rental.rental_id = payment.rental_id

**Query:** Retrieve all customers and their corresponding rentals, even if they have not rented any films. *Here we can use the left outer join between customer and rental tables.*

In [None]:
%%sql SELECT customer.customer_id, customer.first_name, customer.last_name,
rental.rental_id, rental.rental_date
FROM customer
LEFT JOIN rental ON customer.customer_id = rental.customer_id;

### Grouping and Aggregation

- Aggregate functions operator on the multiset of values of a column of a relation, and return a value
    * `avg`: average value
    * `min`: minimum value
    * `max`: maximum value
    * `sum`: sum of values
    * `count`: number of values

**Query** find the average rental duration of a PG rated film.

In [13]:
%%sql 
select avg(rental_duration) as average_duration
from film
where rating = 'PG'

 * postgresql://postgres:***@localhost:5432/testdb_1
1 rows affected.


average_duration
5.082474226804124


**Query** Find total number of customers who made a payment of more than 10. *What happens when you remove the `distinct` keyword?*

In [None]:
%%sql
select count (distinct customer_id)
from payment
where amount > 10

**Query:** find the total number of films

In [14]:
%sql select count(*) from film;

 * postgresql://postgres:***@localhost:5432/testdb_1
1 rows affected.


count
1000


**Query:** What is the total amount payed by each customer
- Here the `order by` clause lists tuples in ascending order. 
- We can also specify `order by customer_id desc`

In [None]:
%%sql select customer_id, sum(amount) 
from payment 
group by customer_id 
order by customer_id

**Query:** For each store, find the number of customers that are the members of that store

In [None]:
%%sql select store_id, count(customer_id) 
from customer 
group by store_id

**Query** Retrieve each movie and the number of times it got rented

In [None]:
%%sql select film_id, count(film_id) 
from rental r join inventory i on r.inventory_id = i.inventory_id 
group by film_id 
order by film_id;

**Query** Find first names of actors who share it with others.
- What is the having clause doing here? How is it different from where clause?

In [None]:
%%sql select first_name, count(*) 
from actor 
group by 
first_name 
having count(*) > 1

**Query** Find actors who share their both first and last name.

In [None]:
%%sql select distinct a1.first_name, a2.last_name 
from actor a1 join actor a2 on a1.actor_id <> a2.actor_id 
and a1.first_name = a2.first_name and a1.last_name = a2.last_name

## Exercises
- Write SQL queries for the following

**Query** Find the total number of films in each category

**Query** Retrieve the top 5 customers who have rented the most films

**Query** List all films that are currently not available for rent

**Query** Show the average rental duration for each film category:

**Query** Write a query that finds the number of movies each actor has acted in

**Query** Write a query that finds names of actors who had more than 40 film releases in a year

In [None]:
%%sql select customer_id, count(*)
from rental
group by customer_id
order by count desc