## Joins and Subqueries

Joins make data interesting. You can get a broader and wider range of data if you join muliple tables together. However, joins are also one of the most complicated and expensive operations in SQL. Be sure your SQL coding style is clean and readable!

This section we'll be exploring:

* Join Syntax
* Duplication from Joins
* Types of Joins
* Subqueries and Join Optimization

In [1]:
# Load the SQL magic extension
%load_ext sql
# Connect to the default database (using SQLAlchemy)
%sql postgresql://localhost/postgres
# Truncate output of your queries so that it's not blowing up the notebook
%config SqlMagic.displaylimit = 10

### Join Syntax

So far, we have explored how to query information from a single table and transform the values within that table. However, in the situation where the information we need does not exist in our current table, a join is needed to append the appropriate data onto the data set. 

Let's look at the syntax of a basic join. Note that we have to specify the tables we are joining together, and the type of join, and then the join key to determine how we're joining the two tables together.

In [23]:
%%sql
-- Join query for film on to language
select 
  * 
from 
  film -- Left Table
join -- Type of Join
  language -- Right Table
on film.language_id = language.language_id -- Join Key
limit 1

 * postgresql://localhost/postgres
1 rows affected.


film_id,title,description,release_year,language_id,rental_duration,rental_rate,length,replacement_cost,rating,last_update,special_features,full_text,language_id_1,name,last_update_1
133,Chamber Italian,A Fateful Reflection of a Moose And a Husband who must Overcome a Monkey in Nigeria,1975,1,7,4.99,117,14.99,NC-17,2006-02-15 09:46:00,['Trailers'],'chamber':1 'fate':4 'husband':11 'italian':2 'monkey':16 'moos':8 'must':13 'nigeria':18 'overcom':14 'reflect':5,1,English,2006-02-15 10:02:00


#### Joins for Data Creation 

Different types of joins will yield different result tables. It is recommended that the final result table is planned out and thought through before performing the join. Let's consider the case above where we want to know what language each film is to better give a description to the customer.

Note that to join any two tables together, a join key should be used so that the table knows which rows to append together. The join key is typically a field in one table that is supposed to match to a field in the second table. 

**Join Types**
Let’s assume we have two tables, one on the left and one on the right. 

* An **inner** join returns records that have matching values in both tables. The most basic type of join.
* A **left** join returns all records from the left table and corresponding matched values from the right table 
* A **right** join returns all records from the right table and corresponding matched values from the left table (Note that this is essentially the same as a left join but with the right table)
* An **full** or **outer** join returns all records from both tables, but will match the values if the join key is validated
* An **anti** join returns the records if there is no match in either table  

Before performing a join, think about what you want the final data set to look like. Write out all the features and which tables you need to retrieve those fields from before you join tables together. This will help you be organized and write cleaner code. 

In [21]:
%%sql
-- Join the table film onto language
-- What type of join do you use and why? Think about what the granularity of the final table will be.
select 
  * 
from 
  film
left join
  language 
on film.language_id = language.language_id
limit 1

 * postgresql://localhost/postgres
1 rows affected.


film_id,title,description,release_year,language_id,rental_duration,rental_rate,length,replacement_cost,rating,last_update,special_features,full_text,language_id_1,name,last_update_1
133,Chamber Italian,A Fateful Reflection of a Moose And a Husband who must Overcome a Monkey in Nigeria,1975,1,7,4.99,117,14.99,NC-17,2006-02-15 09:46:00,['Trailers'],'chamber':1 'fate':4 'husband':11 'italian':2 'monkey':16 'moos':8 'must':13 'nigeria':18 'overcom':14 'reflect':5,1,English,2006-02-15 10:02:00


In [24]:
%%sql
-- How many rows is the output of the above query?
select 
  sum(1) as num_rows
from 
  film
left join
  language 
on film.language_id = language.language_id

 * postgresql://localhost/postgres
1 rows affected.


num_rows
1000


In [25]:
%%sql
-- Try using a different join than the one you used above for the same query
-- Do you get the same number of rows? If not, why?
select 
  sum(1)
from 
  film
right join
  language 
on film.language_id = language.language_id

 * postgresql://localhost/postgres
1 rows affected.


sum
1005


Lastly, check how many films listed are foreign language films.

In [17]:
%%sql
-- How many films aren't in the English language?
select 
  * 
from 
  film
left join
  language 
on film.language_id = language.language_id
where
  trim(upper(language.name)) <> 'ENGLISH'

 * postgresql://localhost/postgres
0 rows affected.


film_id,title,description,release_year,language_id,rental_duration,rental_rate,length,replacement_cost,rating,last_update,special_features,full_text,language_id_1,name,last_update_1


### Join Keys

Join keys are the most essential part of a join. A typical join key, as noted in the previous section, consists of a statement that connects one field in the left table to one field in the right table. Typical join statements use equality, but using inequality in join statements is possible. 

Extra caution is needed when joining on a string join key. It is possible that these strings are capitalized differently and encoded differently. It is best practice to check the join key in both tables before performing the join and if needed, mutate the join key so that they match.

**Joins for Data Filtering**   
Joins are more useful than just appending on new data from different sources. They are also effective in optimizing a slow query when rows need to be filtered out (as opposed to using a where clause in conjunction with the “in” function). There are two joins that can help you accomplish this: (1) inner joins and (2) anti joins. 

* **Inner joins** are a filtering join because it discards the rows that don't have a match in both the left and right tables.
* **Anti joins** are a filtering join because it discards rows that do have a match in both the left and right tables. Exact opposite of an inner join! Note that anti joins aren't supported in Postgres

In [42]:
%%sql
-- Count the number of rows in the tables `film_category` and `film` respectively
-- Enter your query here!

 * postgresql://localhost/postgres
57 rows affected.


film_id,category_id,last_update,category_id_1,name,last_update_1
41.0,16.0,2006-02-15 10:07:09,,,
57.0,16.0,2006-02-15 10:07:09,,,
75.0,16.0,2006-02-15 10:07:09,,,
84.0,16.0,2006-02-15 10:07:09,,,
87.0,16.0,2006-02-15 10:07:09,,,
88.0,16.0,2006-02-15 10:07:09,,,
103.0,16.0,2006-02-15 10:07:09,,,
123.0,16.0,2006-02-15 10:07:09,,,
125.0,16.0,2006-02-15 10:07:09,,,
167.0,16.0,2006-02-15 10:07:09,,,


In [None]:
%%sql
-- Count the number of rows in the inner join between `film_category` and `film`. How many rows were lost?
-- Enter your query here!

In [None]:
%%sql
-- Replicate an anti join in Postgres
select * from film_category fc full join (select * from category limit 15) c on fc.category_id = c.category_id where fc.category_id is null or c.category_id is null                                                                                            

### Subqueries
Sometimes when we try to join two tables, one table is at a different granularity than the other. Joining these two tables together will create a lot of duplicates. This is one of the many use cases of using a subquery before joining. Subqueries can refine different granularites but also just filter down the tables before joining. This will help optimize the final query and minimize the runtime (lower resource costs too). 

The strength of subqueries (over just putting filtering statements at the end of all the joins) lie in that you can select the number of columns you want to transfer to the next level of nesting. This allows for the most efficient calls to the database.

**Julie Tip**

Try to alias tables with a small identifier that is descriptive of the table you are querying from. Tag each field you select with the table identifier so that each field is pulled exactly from the table you want.

In [50]:
%%sql
-- Find the film categories of movies that include "moose" in the film description
select 
  f.*,
  c.name as category_name
from 
  (select
     film_id,
     title,
     description,
     release_year,
     length,
     special_features
   from
     film
   where 
     lower(description) like '%moose%') f
left join
  film_category fc
on cast(f.film_id as int) = cast(fc.film_id as int)
left join
  category c
on fc.category_id = c.category_id

 * postgresql://localhost/postgres
80 rows affected.


film_id,title,description,release_year,length,special_features,category_name
915,Truman Crazy,A Thrilling Epistle of a Moose And a Boy who must Meet a Database Administrator in A Monastery,1943,92,"['Trailers', 'Commentaries']",Action
105,Bull Shawshank,A Fanciful Drama of a Moose And a Squirrel who must Conquer a Pioneer in The Canadian Rockies,1920,125,['Deleted Scenes'],Action
986,Wonka Sea,A Brilliant Saga of a Boat And a Mad Scientist who must Meet a Moose in Ancient India,1984,85,"['Trailers', 'Commentaries']",Animation
805,Sleepless Monsoon,A Amazing Saga of a Moose And a Pastry Chef who must Escape a Butler in Australia,2003,64,"['Trailers', 'Deleted Scenes', 'Behind the Scenes']",Animation
761,Santa Paris,A Emotional Documentary of a Moose And a Car who must Redeem a Mad Cow in A Baloon Factory,1931,154,"['Commentaries', 'Behind the Scenes']",Children
553,Maker Gables,A Stunning Display of a Moose And a Database Administrator who must Pursue a Composer in A Jet Boat,1985,136,"['Deleted Scenes', 'Behind the Scenes']",Children
515,Legally Secretary,A Astounding Tale of a A Shark And a Moose who must Meet a Womanizer in The Sahara Desert,1920,113,"['Trailers', 'Commentaries', 'Behind the Scenes']",Children
343,Full Flatliners,A Beautiful Documentary of a Astronaut And a Moose who must Pursue a Monkey in A Shark Tank,1972,94,"['Trailers', 'Deleted Scenes']",Children
895,Tomorrow Hustler,A Thoughtful Story of a Moose And a Husband who must Face a Secret Agent in The Sahara Desert,1945,142,['Commentaries'],Classics
874,Tadpole Park,A Beautiful Tale of a Frisbee And a Moose who must Vanquish a Dog in An Abandoned Amusement Park,2001,155,"['Trailers', 'Commentaries']",Classics
