# SQL II

Exploring advanced SQL syntax.

### Loading the Data
In this lecture, we'll continue our work with the `Dish` table. In the cells below, we connect to the database and query the table.

In [None]:
%load_ext sql

In [None]:
%%sql
sqlite:///data/basic_examples.db

**Question**: Query the entire **Dish** table.

### Filtering Groups Using `HAVING`

**Question**: Query the total dishes of each type having a maximum cost of less than 8.

### EDA in SQL

Our typical workflow when working with "big data" is:
* Use SQL to query data from a database
* Use Python (with `pandas`) to analyze this data in detail

We can, however, still perform simple data cleaning and re-structuring using SQL directly. To do so, we'll consider the `Title` table from the IMDB dataset. We use random ordering here to get a "snapshot" of representative rows sampled from throughout the table.

In [None]:
%%sql
sqlite:///data/imdbmini.db

In [None]:
%%sql
SELECT *
FROM Title
ORDER BY RANDOM()
LIMIT 10;

#### Matching Text Using `LIKE`

**Question**: Query the title types and primary title names with the primary title including the phrase "Star Wars".

_ means “look for exactly 1 character”

In [None]:
%%sql
SELECT titleType, primaryTitle
FROM Title
WHERE primaryTitle LIKE "Harry Potter and the Deathly Hallows: Part _"

#### Converting Data Types Using `CAST`

**Question**: Query the primary title and runtime (cast as integer) of any 10 movies.

### Applying Conditions With `CASE`

Here, we return a random order so we can see the various movie ages (otherwise, the top few entries happen to all be old movies).

**Question**: Classify each movie title as 'new' if it was released before 1950 and 'mid-aged' if was released before 2000; label this column "movie_age". Select "title_type", "startYear" and "movie_age" in your query.  

### Joining Tables

We combine data from multiple tables by performing a **join**. We will explore joins using the cats database, which includes two tables: `s` and `t`.

In [None]:
%%sql
sqlite:///data/basic_examples.db

In [None]:
%%sql
SELECT * FROM s;

In [None]:
%%sql
SELECT * FROM t;

#### Inner Join

**Question**: Perform inner join on tables **s** and **t**.

By default, `JOIN`ing without specifying a join type will default to an inner join.

**Question**: Perform inner join on tables **s** and **t** without specifying a join type.

### Cross Join

**Question**: Query every possible combination of rows across tables **s** and **t**.

Conceptually, an inner join is equivalent to a cross join where irrelevant rows are removed.

**Question**: Perform inner join using cross join on tables **s** and **t**.

#### Left Outer Join

**Question**: Perform left outer join on tables **s** and **t**.

#### Right Outer Join

**Question**: Perform right outer join on tables **s** and **t**.

#### Full Outer Join

**Question**: Perform full outer join on tables **s** and **t**.

#### Aliasing in Joins

Let's return to the IMDB dataset. Now, we'll consider two tables: `Title` and `Rating`.

In [None]:
%%sql
sqlite:///data/imdbmini.db

When working with tables that have long names, we often create an **alias** using the `AS` keyword (much like we did with columns in the previous lecture). This makes it easier to reference these tables when performing a join.

**Question**: Perform inner join on tables **Title** (alias T) and **Rating** (alias R).

Referencing columns using the full or aliased table name is important to avoid ambiguity. Suppose the tables we are trying to join both include a column with the same name, like the `tconst` columns present in both the `Title` and `Rating` tables of the IMDB database. If we do not specify which table's column we wish to reference, SQL will not be able to process our query.

In the cell below, it is unclear if we are referring to the `tconst` column from the `Title` table or the `tconst` column from the `Rating` table. SQL errors.

In [None]:
%%sql
SELECT primaryTitle, averageRating
FROM Title AS T INNER JOIN Rating AS R
ON tconst = tconst;