# SQL II

Exploring advanced SQL syntax.

### Loading the Data
In this lecture, we'll continue our work with the `Dish` table. In the cells below, we connect to the database and query the table.

In [168]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [169]:
%%sql
sqlite:///basic_examples.db

**Question**: Query the entire **Dish** table.

In [170]:
%%sql
select * from Dish;

 * sqlite:///basic_examples.db
   sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
   sqlite:///imdbmini2.db
Done.


name,type,cost
ravioli,entree,10
ramen,entree,13
taco,entree,7
edamame,appetizer,4
fries,appetizer,4
potsticker,appetizer,4
ice cream,dessert,5


### Filtering Groups Using `HAVING`

**Question**: Query the total dishes of each type having a maximum cost of less than 8.

In [171]:
%%sql
select type,count(*) as Occurence from dish
group by type
having max(cost)<8

 * sqlite:///basic_examples.db
   sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
   sqlite:///imdbmini2.db
Done.


type,Occurence
appetizer,3
dessert,1


### EDA in SQL

Our typical workflow when working with "big data" is:
* Use SQL to query data from a database
* Use Python (with `pandas`) to analyze this data in detail

We can, however, still perform simple data cleaning and re-structuring using SQL directly. To do so, we'll consider the `Title` table from the IMDB dataset. We use random ordering here to get a "snapshot" of representative rows sampled from throughout the table.

In [172]:
%%sql
sqlite:///imdbmini2.db

In [173]:
%%sql

select * from title

   sqlite:///basic_examples.db
   sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
 * sqlite:///imdbmini2.db
Done.


tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
417,short,A Trip to the Moon,Le voyage dans la lune,0,1902,,13.0,"Action,Adventure,Comedy"
4972,movie,The Birth of a Nation,The Birth of a Nation,0,1915,,195.0,"Drama,History,War"
10323,movie,The Cabinet of Dr. Caligari,Das Cabinet des Dr. Caligari,0,1920,,76.0,"Fantasy,Horror,Mystery"
12349,movie,The Kid,The Kid,0,1921,,68.0,"Comedy,Drama,Family"
13442,movie,Nosferatu,"Nosferatu, eine Symphonie des Grauens",0,1922,,94.0,"Fantasy,Horror"
15324,movie,Sherlock Jr.,Sherlock Jr.,0,1924,,45.0,"Action,Comedy,Romance"
15648,movie,Battleship Potemkin,Bronenosets Potemkin,0,1925,,75.0,"Drama,History,Thriller"
15864,movie,The Gold Rush,The Gold Rush,0,1925,,95.0,"Adventure,Comedy,Drama"
17136,movie,Metropolis,Metropolis,0,1927,,153.0,"Drama,Sci-Fi"
17925,movie,The General,The General,0,1926,,67.0,"Action,Adventure,Comedy"


In [175]:
%%sql
SELECT *
FROM Title
ORDER BY RANDOM()
LIMIT 10;

   sqlite:///basic_examples.db
   sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
 * sqlite:///imdbmini2.db
Done.


tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
115685,movie,The Birdcage,The Birdcage,0,1996,,117,Comedy
76363,movie,The Many Adventures of Winnie the Pooh,The Many Adventures of Winnie the Pooh,0,1977,,74,"Adventure,Animation,Comedy"
299172,movie,Home on the Range,Home on the Range,0,2004,,76,"Adventure,Animation,Comedy"
4320258,tvSeries,Dirilis: Ertugrul,Dirilis: Ertugrul,0,2014,2019.0,120,"Action,Adventure,Drama"
1334102,movie,The Resident,The Resident,0,2011,,91,"Drama,Horror,Mystery"
1288558,movie,Evil Dead,Evil Dead,0,2013,,91,Horror
77394,movie,Damien: Omen II,Damien: Omen II,0,1978,,107,Horror
99810,movie,The Hunt for Red October,The Hunt for Red October,0,1990,,135,"Action,Adventure,Thriller"
2431438,tvSeries,Sense8,Sense8,0,2015,2018.0,60,"Drama,Mystery,Sci-Fi"
5929776,movie,Before the Flood,Before the Flood,0,2016,,96,"Documentary,News"


#### Matching Text Using `LIKE`

**Question**: Query the title types and primary title names with the primary title including the phrase "Star Wars".

In [176]:
%%sql

select titleType,primaryTitle from title
where primaryTitle like "%Star Wars%"

   sqlite:///basic_examples.db
   sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
 * sqlite:///imdbmini2.db
Done.


titleType,primaryTitle
movie,Star Wars: Episode IV - A New Hope
movie,Star Wars: Episode V - The Empire Strikes Back
movie,Star Wars: Episode VI - Return of the Jedi
movie,Star Wars: Episode I - The Phantom Menace
movie,Star Wars: Episode II - Attack of the Clones
movie,Star Wars: Episode III - Revenge of the Sith
tvSeries,Star Wars: Clone Wars
tvSeries,Star Wars: The Clone Wars
movie,Star Wars: The Clone Wars
movie,Star Wars: Episode VII - The Force Awakens


_ means “look for exactly 1 character”

In [177]:
%%sql
SELECT titleType, primaryTitle
FROM Title
WHERE primaryTitle LIKE "Harry Potter and the Deathly Hallows: Part _"

   sqlite:///basic_examples.db
   sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
 * sqlite:///imdbmini2.db
Done.


titleType,primaryTitle
movie,Harry Potter and the Deathly Hallows: Part 1
movie,Harry Potter and the Deathly Hallows: Part 2


#### Converting Data Types Using `CAST`

**Question**: Query the primary title and runtime (cast as integer) of any 10 movies.

In [178]:
%%sql
select primaryTitle,cast(runtimeMinutes as int) as runtimeMinutes_as_INT
from title

   sqlite:///basic_examples.db
   sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
 * sqlite:///imdbmini2.db
Done.


primaryTitle,runtimeMinutes_as_INT
A Trip to the Moon,13.0
The Birth of a Nation,195.0
The Cabinet of Dr. Caligari,76.0
The Kid,68.0
Nosferatu,94.0
Sherlock Jr.,45.0
Battleship Potemkin,75.0
The Gold Rush,95.0
Metropolis,153.0
The General,67.0


### Applying Conditions With `CASE`

Here, we return a random order so we can see the various movie ages (otherwise, the top few entries happen to all be old movies).

**Question**: Classify each movie title as 'new' if it was released before 1950 and 'mid-aged' if was released before 2000; label this column "movie_age". Select "title_type", "startYear" and "movie_age" in your query.  

In [179]:
%%sql
select titleType,startYear,
case
    when startYear<1950 Then "old" 
    when startYear<2000 Then 'mid-ages'  
    Else "new"
    End as "movie_age"

from title


   sqlite:///basic_examples.db
   sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
 * sqlite:///imdbmini2.db
Done.


titleType,startYear,movie_age
short,1902,old
movie,1915,old
movie,1920,old
movie,1921,old
movie,1922,old
movie,1924,old
movie,1925,old
movie,1925,old
movie,1927,old
movie,1926,old


### Joining Tables

We combine data from multiple tables by performing a **join**. We will explore joins using the cats database, which includes two tables: `s` and `t`.

In [195]:
%%sql
sqlite:///basic_examples2.db

In [196]:
%%sql
SELECT * FROM s;

   sqlite:///basic_examples.db
 * sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
   sqlite:///imdbmini2.db
Done.


id,name
0,Apricot
1,Boots
2,Cally
4,Eugene


In [197]:
%%sql
SELECT * FROM t;

   sqlite:///basic_examples.db
 * sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
   sqlite:///imdbmini2.db
Done.


id,breed
1,persian
2,ragdoll
4,bengal
5,persian


#### Inner Join

**Question**: Perform inner join on tables **s** and **t**.

In [198]:
%%sql
select * 
from t inner join s
on t.id = s.id

   sqlite:///basic_examples.db
 * sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
   sqlite:///imdbmini2.db
Done.


id,breed,id_1,name
1,persian,1,Boots
2,ragdoll,2,Cally
4,bengal,4,Eugene


By default, `JOIN`ing without specifying a join type will default to an inner join.

**Question**: Perform inner join on tables **s** and **t** without specifying a join type.

In [184]:
%%sql
select *
from s  join t
on t.id=s.id

   sqlite:///basic_examples.db
 * sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
   sqlite:///imdbmini2.db
Done.


id,name,id_1,breed
1,Boots,1,persian
2,Cally,2,ragdoll
4,Eugene,4,bengal


### Cross Join

**Question**: Query every possible combination of rows across tables **s** and **t**.

In [185]:
%%sql
select * 
from t cross join s

   sqlite:///basic_examples.db
 * sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
   sqlite:///imdbmini2.db
Done.


id,breed,id_1,name
1,persian,0,Apricot
1,persian,1,Boots
1,persian,2,Cally
1,persian,4,Eugene
2,ragdoll,0,Apricot
2,ragdoll,1,Boots
2,ragdoll,2,Cally
2,ragdoll,4,Eugene
4,bengal,0,Apricot
4,bengal,1,Boots


Conceptually, an inner join is equivalent to a cross join where irrelevant rows are removed.

**Question**: Perform inner join using cross join on tables **s** and **t**.

In [None]:
%%sql
select * 
from t cross join s

#### Left Outer Join

**Question**: Perform left outer join on tables **s** and **t**.

In [199]:
%%sql

select *
from t left join  s
on t.id=s.id


   sqlite:///basic_examples.db
 * sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
   sqlite:///imdbmini2.db
Done.


id,breed,id_1,name
1,persian,1.0,Boots
2,ragdoll,2.0,Cally
4,bengal,4.0,Eugene
5,persian,,


#### Right Outer Join

**Question**: Perform right outer join on tables **s** and **t**.

In [200]:
%%sql

select * 
from s RIGHT OUTER JOIN t
ON s.id=t.id


SELECT *
FROM s RIGHT JOIN t
ON s.id = t.id

   sqlite:///basic_examples.db
 * sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
   sqlite:///imdbmini2.db
(sqlite3.OperationalError) RIGHT and FULL OUTER JOINs are not currently supported
[SQL: select * 
from s RIGHT OUTER JOIN t
ON s.id=t.id


SELECT *
FROM s RIGHT JOIN t
ON s.id = t.id]
(Background on this error at: https://sqlalche.me/e/20/e3q8)


#### Full Outer Join

**Question**: Perform full outer join on tables **s** and **t**.

In [188]:
%%sql

select * 
from t as  T full outer join s as S
on T.id=S.id

   sqlite:///basic_examples.db
 * sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
   sqlite:///imdbmini2.db
(sqlite3.OperationalError) RIGHT and FULL OUTER JOINs are not currently supported
[SQL: select * 
from t as  T full outer join s as S
on T.id=S.id]
(Background on this error at: https://sqlalche.me/e/20/e3q8)


#### Aliasing in Joins

Let's return to the IMDB dataset. Now, we'll consider two tables: `Title` and `Rating`.

In [190]:
%%sql
sqlite:///imdbmini2.db

When working with tables that have long names, we often create an **alias** using the `AS` keyword (much like we did with columns in the previous lecture). This makes it easier to reference these tables when performing a join.

**Question**: Perform inner join on tables **Title** (alias T) and **Rating** (alias R).

In [None]:
%%sql
select * from Title as T inner join  Rating as R
on T.tconst =R.tconst   

Referencing columns using the full or aliased table name is important to avoid ambiguity. Suppose the tables we are trying to join both include a column with the same name, like the `tconst` columns present in both the `Title` and `Rating` tables of the IMDB database. If we do not specify which table's column we wish to reference, SQL will not be able to process our query.

In the cell below, it is unclear if we are referring to the `tconst` column from the `Title` table or the `tconst` column from the `Rating` table. SQL errors.

In [191]:
%%sql
SELECT primaryTitle, averageRating
FROM Title AS T INNER JOIN Rating AS R
ON T.tconst = R.tconst;

   sqlite:///basic_examples.db
   sqlite:///basic_examples2.db
   sqlite:///basicexamples2.db
   sqlite:///imdbmini.db
 * sqlite:///imdbmini2.db
Done.


primaryTitle,averageRating
A Trip to the Moon,8.2
The Birth of a Nation,6.3
The Cabinet of Dr. Caligari,8.1
The Kid,8.3
Nosferatu,7.9
Sherlock Jr.,8.2
Battleship Potemkin,8.0
The Gold Rush,8.2
Metropolis,8.3
The General,8.1
