# Sakila Database

## Origin

The Sakila sample database was initially developed by Mike Hillyer, a former member of the MySQL AB documentation team, and is intended to provide a standard schema that can be used for examples in books, tutorials, articles, samples, and so forth.

https://dev.mysql.com/doc/sakila/en/

## Schema

![Image](https://raw.githubusercontent.com/nicshub/sakila-sqlite3/main/sakila.png)

## SQLite

SQLite is part of the operating system, library need to be installed according to the specific os.

## Google Colab

In Google Colab, based on linux container need to be installed

In [1]:
!apt-get -y install sqlite3

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Suggested packages:
  sqlite3-doc
The following NEW packages will be installed:
  sqlite3
0 upgraded, 1 newly installed, 0 to remove and 33 not upgraded.
Need to get 768 kB of archives.
After this operation, 1,873 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 sqlite3 amd64 3.37.2-2ubuntu0.3 [768 kB]
Fetched 768 kB in 0s (2,116 kB/s)
Selecting previously unselected package sqlite3.
(Reading database ... 121749 files and directories currently installed.)
Preparing to unpack .../sqlite3_3.37.2-2ubuntu0.3_amd64.deb ...
Unpacking sqlite3 (3.37.2-2ubuntu0.3) ...
Setting up sqlite3 (3.37.2-2ubuntu0.3) ...
Processing triggers for man-db (2.10.2-1) ...


Let's check if it's installed

In [2]:
!sqlite3 -version

3.37.2 2022-01-06 13:25:41 872ba256cbf61d9290b571c0e6d82a20c224ca3ad82971edc46b29818d5dalt1


## Get a database

In this notebook we are going a port of Sakila for sqlite by Bradley Grant

https://github.com/bradleygrant/sakila-sqlite3/

In [3]:
# Delete
!rm -f datasets/sakila*
# Create Directory
!mkdir -p datasets/
# Get database
!wget https://github.com/nicshub/sakila-sqlite3/raw/main/sakila_master.db -O datasets/sakila.db

--2024-02-16 08:55:30--  https://github.com/nicshub/sakila-sqlite3/raw/main/sakila_master.db
Resolving github.com (github.com)... 140.82.112.4
Connecting to github.com (github.com)|140.82.112.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/nicshub/sakila-sqlite3/main/sakila_master.db [following]
--2024-02-16 08:55:30--  https://raw.githubusercontent.com/nicshub/sakila-sqlite3/main/sakila_master.db
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5791744 (5.5M) [application/octet-stream]
Saving to: ‘datasets/sakila.db’


2024-02-16 08:55:31 (56.0 MB/s) - ‘datasets/sakila.db’ saved [5791744/5791744]



Let's see if it's created, we are going to use command line using two linux command
1. file to see the file headers
2. sqlite command line using pipe to avoid the "prompt"

In [4]:
!file datasets/sakila.db

datasets/sakila.db: SQLite 3.x database, last written using SQLite version 3011000, page size 1024, file counter 46364, database pages 5656, cookie 0x4b, schema 4, UTF-8, version-valid-for 46364


In [5]:
!echo ".tables" | sqlite3 datasets/sakila.db

actor                   film                    payment               
address                 film_actor              rental                
category                film_category           sales_by_film_category
city                    film_list               sales_by_store        
country                 film_text               staff                 
customer                inventory               staff_list            
customer_list           language                store                 


Please note the sakila.db it's created automatically by the command and it's the **database**

To check the entire schema (physical model), you can use the .schema command

# Query the database in a notebook

- iPython (and so Jupyter with Python kernek) offer a nice way to run python command without use the standard syntax [ref](https://ipython.readthedocs.io/en/stable/interactive/magics.html)

There are two kinds of magics

- line-oriented %
- cell-oriented %%

In [6]:
# List of built in magic commands
%lsmagic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %shell  %store  %sx  %system  %tb  %tensorflow_version  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%bigquery  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%late

It's also possible to extend with new magic function and then load the module

## SQL Magic

[ipython-sql](https://pypi.org/project/ipython-sql/) introduces the %sql magic function that can be used both for
- single line queries (line magic %sql)
- multiple lines (cell magic %%sql)

We will install using package installer for Python pip, directly in Jupyther using built-in function

In [7]:
%pip install ipython-sql

Collecting jedi>=0.16 (from ipython->ipython-sql)
  Downloading jedi-0.19.1-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m15.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: jedi
Successfully installed jedi-0.19.1


## Loading the SQL Magic extension

In [8]:
%load_ext sql

## Connecting to the database

In [9]:
%sql sqlite:///datasets/sakila.db

Let's check if we are connected, just using the %sql magic commands

In [10]:
%sql

 * sqlite:///datasets/sakila.db


# Let's start learning by doing

for sake of visibility we will conclude queries with LIMIT 10 to limit the result set

## Select from a single table

*Exercize*: Get the name of the film category

In [12]:
%%sql
SELECT
name
FROM
category
limit 5


 * sqlite:///datasets/sakila.db
Done.


name
Action
Animation
Children
Classics
Comedy


*Exercize* Get first name and last name of actors

In [14]:
%%sql
SELECT ALL
first_name, last_name
FROM
actor
limit 10

 * sqlite:///datasets/sakila.db
Done.


first_name,last_name
PENELOPE,GUINESS
NICK,WAHLBERG
ED,CHASE
JENNIFER,DAVIS
JOHNNY,LOLLOBRIGIDA
BETTE,NICHOLSON
GRACE,MOSTEL
MATTHEW,JOHANSSON
JOE,SWANK
CHRISTIAN,GABLE


### Distinct values

*Exercize* : Get the distinct rating of the films

In [15]:
%%sql
SELECT
DISTINCT rating
FROM
film

 * sqlite:///datasets/sakila.db
Done.


rating
PG
G
NC-17
PG-13
R


### All attributes (*)

*Exercize* Get all the attributes in the relation language

In [16]:
%%sql
SELECT
*
FROM
language

 * sqlite:///datasets/sakila.db
Done.


language_id,name,last_update
1,English,2020-12-23 07:12:12
2,Italian,2020-12-23 07:12:12
3,Japanese,2020-12-23 07:12:12
4,Mandarin,2020-12-23 07:12:12
5,French,2020-12-23 07:12:12
6,German,2020-12-23 07:12:12


### Column aliases
- In SQL it's possibile to define aliases for any attribute name
- In Relational Algegra it's the ρ operation
- The syntax it's simple just add "as xyz" after the attribute name, where xyz is the new name
- Better no use spaces but only character, number, underscore. In case use quotes

*Exercize* In the same relation, rename the "name" attribute into languagename

In [18]:
%%sql
SELECT
name as "language name"
FROM
language

 * sqlite:///datasets/sakila.db
Done.


language name
English
Italian
Japanese
Mandarin
French
German


### SQL Scalar function

https://www.sqlite.org/lang_corefunc.html

- SQL enables scalar function (i.e. returning a single value) based on attributes selected in the query
- Operations need to be consistent with the data type of the attribute


## Simple expressions

- Math function on numbers
- Concatenation of Strings

 *Exercize*: Get the total rent of a film computed as the product of rate * duration, include title and original value.

In [20]:
%%sql
SELECT
title,
rental_duration*rental_rate,
rental_duration,
rental_rate
FROM
film
limit 10

 * sqlite:///datasets/sakila.db
Done.


title,rental_duration*rental_rate,rental_duration,rental_rate
ACADEMY DINOSAUR,5.94,6,0.99
ACE GOLDFINGER,14.97,3,4.99
ADAPTATION HOLES,20.93,7,2.99
AFFAIR PREJUDICE,14.95,5,2.99
AFRICAN EGG,17.94,6,2.99
AGENT TRUMAN,8.97,3,2.99
AIRPLANE SIERRA,29.94,6,4.99
AIRPORT POLLOCK,29.94,6,4.99
ALABAMA DEVIL,8.97,3,2.99
ALADDIN CALENDAR,29.94,6,4.99


 *Exercize*: Get in a single attribute named full_name containing firstname, lastname of staff

 (Use operator || to concat two field)

In [24]:
%%sql
select
first_name,
last_name,
first_name || ' ' || last_name as fullname,
concat(first_name,last_name) as fullnameconc
from staff

 * sqlite:///datasets/sakila.db
(sqlite3.OperationalError) no such function: concat
[SQL: select 
first_name, 
last_name,
first_name || ' ' || last_name as fullname,
concat(first_name,last_name) as fullnameconc
from staff]
(Background on this error at: https://sqlalche.me/e/20/e3q8)


### Date time functions

https://www.sqlite.org/lang_datefunc.html

 *Exercize*: Get the creation date of the customer in the yyyy-mm-dd format

- Use strftime(format, attribute) to format the datetime field in the desidered format


In [28]:
%%sql
select
create_date,
strftime("%Y-%m-%d", create_date) as data
from customer limit 10


 * sqlite:///datasets/sakila.db
Done.


create_date,data
2006-02-14 22:04:36.000,2006-02-14
2006-02-14 22:04:36.000,2006-02-14
2006-02-14 22:04:36.000,2006-02-14
2006-02-14 22:04:36.000,2006-02-14
2006-02-14 22:04:36.000,2006-02-14
2006-02-14 22:04:36.000,2006-02-14
2006-02-14 22:04:36.000,2006-02-14
2006-02-14 22:04:36.000,2006-02-14
2006-02-14 22:04:36.000,2006-02-14
2006-02-14 22:04:36.000,2006-02-14


 *Exercize*: Compute the difference between your birthdate and today

- timediff function looks great but it's supported in SQLite >= 3.43
- To compute difference in years we can take the date function and the date in 'yyyy-mm-dd' format
- In SQLite no need to use FROM, it's reading from a dummy table

In [30]:
%%sql
SELECT date()-date("1975-04-09")

 * sqlite:///datasets/sakila.db
Done.


"date()-date(""1975-04-09"")"
49


### CASE
https://www.sqlite.org/lang_expr.html#case

CASE in SQL implements the statement IF-THEN-ELSE

#### Case  base expression

A expression is evaluated, if the when condition matches the result the correspondent then is the result.

```sql
CASE expr
WHEN condition THEN value
WHEN condition THEN value
ELSE value
END
```

 *Exercize*: Categorize the film based on the duration, if it's less than 75 min it's a short, otherwise is a full-lenght

In [33]:
%%sql
SELECT
title,
length,
CASE length < 75
WHEN true then 'Short'
ELSE 'Full'
END as "Duration Classification"
FROM
film
LIMIT 10


 * sqlite:///datasets/sakila.db
Done.


title,length,Duration Classification
ACADEMY DINOSAUR,86,Full
ACE GOLDFINGER,48,Short
ADAPTATION HOLES,50,Short
AFFAIR PREJUDICE,117,Full
AFRICAN EGG,130,Full
AGENT TRUMAN,169,Full
AIRPLANE SIERRA,62,Short
AIRPORT POLLOCK,54,Short
ALABAMA DEVIL,114,Full
ALADDIN CALENDAR,63,Short


#### Case without a base expression

In this "case" all conditions are valuated until the first matches

```sql
CASE
WHEN condition THEN value
WHEN condition THEN value
ELSE value
END
```

 *Exercize*: Categorize the amount  of payment based on ranges:
 - [0,1]: 'cheap'
 - [1,5]: 'medium'
 - [5,-]: 'expensive'

In [35]:
%%sql
SELECT
amount,
CASE
WHEN amount <=1 then 'Cheap'
WHEN amount <=5 then 'Medium'
ELSE 'Expensive'
END as "Cost evaluation"
FROM
payment
LIMIT 10

 * sqlite:///datasets/sakila.db
Done.


amount,Cost evaluation
2.99,Medium
0.99,Cheap
5.99,Expensive
0.99,Cheap
9.99,Expensive
4.99,Medium
4.99,Medium
0.99,Cheap
3.99,Medium
5.99,Expensive


### CAST
https://www.sqlite.org/lang_expr.html#cast

- In computer science, casting it's a procedure to change the data type
- In SQL it's used to change the domain of an attribute in the result relation
- Casting it's useful in two cases
-- When the source data type is too generic, for example a string containing a number (this can be a design issue)
-- When we need to work with different attributes, for example to count the number of digit in a number

The syntax is CAST(expr as type-name)

 *Exercize*: Convert the release_year in the movie relation into number

In [40]:
%%sql
SELECT
cast(release_year as number) * 2,
release_year*2
FROM
film
limit 10

 * sqlite:///datasets/sakila.db
Done.


cast(release_year as number) * 2,release_year*2
4012,4012
4012,4012
4012,4012
4012,4012
4012,4012
4012,4012
4012,4012
4012,4012
4012,4012
4012,4012


In [43]:
!echo ".schema film" | sqlite3 datasets/sakila.db

CREATE TABLE film (
  film_id int NOT NULL,
  title VARCHAR(255) NOT NULL,
  description BLOB SUB_TYPE TEXT DEFAULT NULL,
  release_year VARCHAR(4) DEFAULT NULL,
  language_id SMALLINT NOT NULL,
  original_language_id SMALLINT DEFAULT NULL,
  rental_duration SMALLINT  DEFAULT 3 NOT NULL,
  rental_rate DECIMAL(4,2) DEFAULT 4.99 NOT NULL,
  length SMALLINT DEFAULT NULL,
  replacement_cost DECIMAL(5,2) DEFAULT 19.99 NOT NULL,
  rating VARCHAR(10) DEFAULT 'G',
  special_features VARCHAR(100) DEFAULT NULL,
  last_update TIMESTAMP NOT NULL,
  PRIMARY KEY  (film_id),
  CONSTRAINT CHECK_special_features CHECK(special_features is null or
                                                           special_features like '%Trailers%' or
                                                           special_features like '%Commentaries%' or
                                                           special_features like '%Deleted Scenes%' or
                                                           spe

 *Exercize*: Compute the number of characters of the rental_duration in the film   

 Use the scalar function length(X) to count the number of characters in a column

In [47]:
%%sql
select
title,
length,
length(cast(length as string)) as numchars,
length(length) as numchars
from film limit 10

 * sqlite:///datasets/sakila.db
Done.


title,length,numchars,numchars_1
ACADEMY DINOSAUR,86,2,2
ACE GOLDFINGER,48,2,2
ADAPTATION HOLES,50,2,2
AFFAIR PREJUDICE,117,3,3
AFRICAN EGG,130,3,3
AGENT TRUMAN,169,3,3
AIRPLANE SIERRA,62,2,2
AIRPORT POLLOCK,54,2,2
ALABAMA DEVIL,114,3,3
ALADDIN CALENDAR,63,2,2


### Aggregate functions (part 1)
https://www.sqlite.org/lang_aggfunc.html

- At this stage, without WHERE and GROUP BY aggregate functions apply on the entire table
- Functions apply to attributes as parameter, also the "\*" is used in a special case

#### COUNT
- count(\*): counts the number of tuples in the group
- count(X): count the number of times the X is not NULL in the group
- count(distinct(X)): as above, but removing duplicates

 *Exercize*: Count the number of rentals in the Rental table

In [53]:
%%sql
SELECT
count(*)
FROM
rental

 * sqlite:///datasets/sakila.db
Done.


count(*)
16044


 *Exercize*: Count the number of return_date in the rental table

In [49]:
%%sql
SELECT
count(return_date)
FROM
rental

 * sqlite:///datasets/sakila.db
Done.


count(return_date)
15861


 *Exercize*: Count the number of distinct return_date in the rental table

In [51]:
%%sql
SELECT
count(distinct(return_date))
FROM
rental

 * sqlite:///datasets/sakila.db
Done.


count(distinct(return_date))
15836


#### MIN-MAX-AVG-SUM
- min(X): minimum non-NULL value of all values in the group
- max(X): maximum non-NULL value of all values in the group
- avg(X): avg non-NULL value of all values in the group
- sum(X): sum of non-NULL value of all values in the group

Yes, they should be used for numbers

 *Exercize*: Get the min,max,avg,sum of the amount in the payment table

In [54]:
%%sql
SELECT
min(amount),max(amount),avg(amount),sum(amount)
FROM
payment

 * sqlite:///datasets/sakila.db
Done.


min(amount),max(amount),avg(amount),sum(amount)
0,11.99,4.200667331297407,67416.50999999208


## Select from multiple tables

### Product Cartesian (CROSS JOIN)

- In Relational Algebra it's a binary operator X(A,B) or A X B

Creates a new table where
- the number of attributes it's the sum of the arity of the source relations
- the number of tuples it's the product of the cardinality of the source relations

In SQL it's possible to do in two equivalent ways:
- FROM A,B
- FROM A CROSS JOIN B

 *Exercize*: Get the potential number of rental by combining the film and the customer tables

In [57]:
%%sql
select count(*) from customer, film limit 10

 * sqlite:///datasets/sakila.db
Done.


count(*)
599000


In [58]:
%%sql
select count(*)
from
customer
cross join film
limit 10

 * sqlite:///datasets/sakila.db
Done.


count(*)
599000


In [59]:
%%sql
select count(*) from film;

 * sqlite:///datasets/sakila.db
Done.


count(*)
1000


In [60]:
%%sql
select count(*) from customer;

 * sqlite:///datasets/sakila.db
Done.


count(*)
599


### JOINS
Join clauses are made by three parts
- join operator: indicates which kind of join we are to use (natural, inner, outer, cross)
- table (or sub query): the relation to be joined
- join constraint: expression to define the condition of the join using "on" ("using" it's also an option but not always supported)

The generic syntax is
```sql
FROM A kindofjoin JOIN B on A.col=B.col
```

#### Natural JOIN
- Natural joins don't need "on"

 *Exercize*: Get the list of cities and correspondent country

In [65]:
%%sql
select * from city limit 10

 * sqlite:///datasets/sakila.db
Done.


city_id,city,country_id,last_update
1,A Corua (La Corua),87,2020-12-23 07:12:14
2,Abha,82,2020-12-23 07:12:14
3,Abu Dhabi,101,2020-12-23 07:12:14
4,Acua,60,2020-12-23 07:12:14
5,Adana,97,2020-12-23 07:12:14
6,Addis Abeba,31,2020-12-23 07:12:14
7,Aden,107,2020-12-23 07:12:14
8,Adoni,44,2020-12-23 07:12:14
9,Ahmadnagar,44,2020-12-23 07:12:14
10,Akishima,50,2020-12-23 07:12:14


In [66]:
%%sql
select * from country limit 10

 * sqlite:///datasets/sakila.db
Done.


country_id,country,last_update
1,Afghanistan,2020-12-23 07:12:12
2,Algeria,2020-12-23 07:12:12
3,American Samoa,2020-12-23 07:12:12
4,Angola,2020-12-23 07:12:12
5,Anguilla,2020-12-23 07:12:12
6,Argentina,2020-12-23 07:12:12
7,Armenia,2020-12-23 07:12:12
8,Australia,2020-12-23 07:12:12
9,Austria,2020-12-23 07:12:12
10,Azerbaijan,2020-12-23 07:12:12


In [61]:
%%sql
select
*
from
city
natural join country

 * sqlite:///datasets/sakila.db
Done.


city_id,city,country_id,last_update,country
7,Aden,107,2020-12-23 07:12:14,Yemen


In [63]:
%%sql
select country_id from country where last_update='2020-12-23 07:12:14'

 * sqlite:///datasets/sakila.db
Done.


country_id
106
107
108
109


In [64]:
%%sql
select * from city where last_update='2020-12-23 07:12:14'
and country_id in (select country_id from country where last_update='2020-12-23 07:12:14')

 * sqlite:///datasets/sakila.db
Done.


city_id,city,country_id,last_update
7,Aden,107,2020-12-23 07:12:14


#### Inner JOIN
- it's default join, che INNER can be omitted

 *Exercize*: Get the list of cities and correspondent country

In [68]:
%%sql
SELECT
city, country
FROM
city
join country on city.country_id=country.country_id
limit 10

 * sqlite:///datasets/sakila.db
Done.


city,country
A Corua (La Corua),Spain
Abha,Saudi Arabia
Abu Dhabi,United Arab Emirates
Acua,Mexico
Adana,Turkey
Addis Abeba,Ethiopia
Aden,Yemen
Adoni,India
Ahmadnagar,India
Akishima,Japan


Lista dei titoli dei film con i nomi delle categorie associate

In [71]:
%%sql
select
title, name
from
film
join film_category on film.film_id=film_category.film_id
join category on film_category.category_id = category.category_id
limit 10


 * sqlite:///datasets/sakila.db
Done.


title,name
ACADEMY DINOSAUR,Documentary
ACE GOLDFINGER,Horror
ADAPTATION HOLES,Documentary
AFFAIR PREJUDICE,Horror
AFRICAN EGG,Family
AGENT TRUMAN,Foreign
AIRPLANE SIERRA,Comedy
AIRPORT POLLOCK,Horror
ALABAMA DEVIL,Horror
ALADDIN CALENDAR,Sports


Lista dei titoli dei film con i nomi e cognomi degli attori

In [74]:
%%sql
select
f.title, a.first_name,a.last_name
from
film f
join film_actor fa on f.film_id=fa.film_id
join actor a on a.actor_id=fa.actor_id
limit 20

 * sqlite:///datasets/sakila.db
Done.


title,first_name,last_name
ACADEMY DINOSAUR,PENELOPE,GUINESS
ANACONDA CONFESSIONS,PENELOPE,GUINESS
ANGELS LIFE,PENELOPE,GUINESS
BULWORTH COMMANDMENTS,PENELOPE,GUINESS
CHEAPER CLYDE,PENELOPE,GUINESS
COLOR PHILADELPHIA,PENELOPE,GUINESS
ELEPHANT TROJAN,PENELOPE,GUINESS
GLEAMING JAWBREAKER,PENELOPE,GUINESS
HUMAN GRAFFITI,PENELOPE,GUINESS
KING EVOLUTION,PENELOPE,GUINESS


La lista dei titoli dei film noleggiati e dei relativi clienti con nome e cognome. Aggiungere l'importo pagato per il noleggio e il nome della persona dello staff che lo ha servito

In [85]:
%%sql
SELECT c.first_name, c.last_name, f.title,r.rental_date,r.return_date, p.amount,s.first_name as "Staff First Name",s.last_name as "Staff Last Name"
FROM
film f
join inventory i on f.film_id=i.film_id
join rental r on r.inventory_id=i.inventory_id
join customer c on c.customer_id=r.customer_id
join payment p on p.rental_id=r.rental_id
join staff s on s.staff_id=p.staff_id
limit 10

 * sqlite:///datasets/sakila.db
Done.


first_name,last_name,title,rental_date,return_date,amount,Staff First Name,Staff Last Name
MARY,SMITH,PATIENT SISTER,2005-05-25 11:30:37.000,2005-06-03 12:00:37.000,2.99,Mike,Hillyer
MARY,SMITH,TALENTED HOMICIDE,2005-05-28 10:35:23.000,2005-06-03 06:32:23.000,0.99,Mike,Hillyer
MARY,SMITH,MUSKETEERS WAIT,2005-06-15 00:54:12.000,2005-06-23 02:42:12.000,5.99,Mike,Hillyer
MARY,SMITH,DETECTIVE VISION,2005-06-15 18:02:53.000,2005-06-19 15:54:53.000,0.99,Jon,Stephens
MARY,SMITH,FERRIS MOTHER,2005-06-15 21:08:46.000,2005-06-25 02:26:46.000,9.99,Jon,Stephens
MARY,SMITH,CLOSER BANG,2005-06-16 15:18:57.000,2005-06-17 21:05:57.000,4.99,Mike,Hillyer
MARY,SMITH,ATTACKS HATE,2005-06-18 08:41:48.000,2005-06-22 03:36:48.000,4.99,Mike,Hillyer
MARY,SMITH,SAVANNAH TOWN,2005-06-18 13:33:59.000,2005-06-19 17:40:59.000,0.99,Jon,Stephens
MARY,SMITH,YOUTH KICK,2005-06-21 06:24:45.000,2005-06-28 03:28:45.000,3.99,Mike,Hillyer
MARY,SMITH,FIRE WOLVES,2005-07-08 03:17:05.000,2005-07-14 01:19:05.000,5.99,Jon,Stephens


#### Left/Right Outer JOIN
- Join that include the left, right and both rows that don't match

 *Exercize*: Get the distinct language name and release year

 Hint: start from language l and use left join

In [None]:
%%sql

### WHERE
- WHERE expressions are evalauted for each row in the input data (FROM)
- if the row satisfies the where condition is added into the result set
- in a inner-cross join constraints can be expressed both in "on" and in "where" (ihmo in join it's more readable)
- in left-right join the "on" adds rows in the result while in the where they can be filtered
The generic syntax is
```sql
WHERE
(x = y or z < w) and ...
```

 *Exercize*: Select the title, description, rating, movie length columns from the films table that last 3 hours or longer.

In [None]:
%%sql

 ### LIKE
 https://www.sqlite.org/lang_expr.html#the_like_glob_regexp_match_and_extract_operators

```sql
WHERE
x LIKE pattern
```
 - it's pattern matching comparison between a value and a pattern
 - pattern can contain string and % to indicate anysequence or _ to indicate a single char

 *Exercize*: Select the actors with first name starting with A

In [None]:
%%sql

 *Exercize*: Select the actors with last name of 3 characters

In [None]:
%%sql

 ### BETWEEN
The BETWEEN operator is logically equivalent to a pair of comparisons. "x BETWEEN y AND z" is equivalent to "x>=y AND x<=z" except that with BETWEEN, the x expression is only evaluated once.


 *Exercize*: Select the rental returned in September 2005

In [None]:
%%sql

 ### IN (NOT) IN

 - The IN and NOT IN operators take an expression on the left and a list of values or a subquery on the right
 - The right operand can be a sub query, i.e. a query returning the same columns as the left operand

 ```sql
WHERE
x in (a,b,c)
...
x in (select x from table)
```

 *Exercize*: Select the movies with category Action and Animation

 Hint: Use join to navigate the many to many relationship between FILM and CATEGORY

In [None]:
%%sql

 *Exercize*: Get the customer with same name of an actor

In [None]:
%%sql

 *Exercize*: Get the customer with same name and first name of an actor

In [None]:
%%sql

### GROUP BY
- Partition result set into groups, based on values on attributes
- Typically grouping is coupled with aggregate functions
- and attributes in group by are present in the select

The generic syntax is
```sql
select x,y, f()
GROUP BY x,y
```

 *Exercize*: Sum the amount of payments for rental made by customers

In [None]:
%%sql

 *Exercize*:

### HAVING
- Having filters the rows result sets based on aggregate groups
- It's similar to WHERE but it's executed in another stage of process, after groups are created
```sql
GROUP BY
x, y
HAVING f(x)>z
```

 *Exercize*: Sum the amount of payments for rental made by customers that have done at lease 30 rentals

In [None]:
%%sql

### ORDER BY
- Sort result set according to criteria
- Multiple criteria can be combined using ,
- Include calculated attributes like aggregrations

The generic syntax is
```sql
ORDER BY
x (asc/desc)
```

 *Exercize*: Get the longest movies

In [None]:
%%sql

### UNION
The generic syntax is
```sql
select a from x
union
select b from y
```

 *Exercize*: Get all the first_name, last_name of actors and customers

In [None]:
%%sql

### EXCEPT
The generic syntax is
```sql
select a from x
EXCEPT
select b from y
```

 *Exercize*: Get all the first_name, last_name of actors not equal to customers

In [None]:
%%sql

### INTERSECT
The generic syntax is
```sql
select a from x
INTERSECT
select b from y
```

 *Exercize*: Get the longest movies

In [None]:
%%sql