# __Merging Tables With Different Join Types__
# Outline
- [1 Left join](#lft-join)
- [&nbsp;&nbsp;1.1 Counting missing rows with left join](#cnt-mis-rows)
- [&nbsp;&nbsp;1.2 Enriching a dataset](#enrch-ds)
- [&nbsp;&nbsp;1.3 How many rows with a left join?](#)

<a id="lft-join"></a>
# 1 Left join
<!-- %%HTML -->
<div align="middle">
<video width="60%" controls>
      <source src="./../../res/videos/2.merging-tables-with-different-join-types/1.left_join.mp4" type="video/mp4">
</video></div>



<a id="cnt-mis-rows"></a>
## 1.1 Counting missing rows with left join
The Movie Database is supported by volunteers going out into the world, collecting data, and entering it into the database. This includes financial data, such as movie budget and revenue. If you wanted to know which movies are still missing data, you could use a left join to identify them. Practice using a left join by merging the movies table and the financials table.

The movies and financials tables have been loaded for you.

In [1]:
import pandas as pd
movies = pd.read_pickle("../../data/2/movies.p")
financials = pd.read_pickle("../../data/2/financials.p")

display(movies.head(3))
display(financials.head(3))

Unnamed: 0,id,title,popularity,release_date
0,257,Oliver Twist,20.415572,2005-09-23
1,14290,Better Luck Tomorrow,3.877036,2002-01-12
2,38365,Grown Ups,38.864027,2010-06-24


Unnamed: 0,id,budget,revenue
0,19995,237000000,2787965000.0
1,285,300000000,961000000.0
2,206647,245000000,880674600.0


__Question__<br>
What column is likely the best column to merge the two tables on?
- Answer: `on='id'`

Merge the movies table, as the left table, with the financials table using a left join, and save the result to movies_financials.

In [2]:
movies_financials = movies.merge(financials, on="id", how="left")

Count the number of rows in movies_financials with a null value in the budget column.

In [3]:
# Count the number of rows in the budget column that are missing
movies_financials['budget'].isnull().sum()

1574

<a id="enrch-ds"></a>
## 1.2 Enriching a dataset
Setting how='left' with the .merge() method is a useful technique for enriching or enhancing a dataset with additional information from a different table. In this exercise, you will start off with a sample of movie data from the movie series Toy Story. Your goal is to enrich this data by adding the marketing tag line for each movie. You will compare the results of a left join versus an inner join.

The toy_story DataFrame contains the Toy Story movies. The toy_story and taglines DataFrames have been loaded for you.

In [4]:
toy_story = movies[movies["title"].str.contains('toy story', case=False)]
taglines = pd.read_pickle("../../data/2/taglines.p")

display(taglines.head(3))
display(toy_story.head(3))

Unnamed: 0,id,tagline
0,19995,Enter the World of Pandora.
1,285,"At the end of the world, the adventure begins."
2,206647,A Plan No One Escapes


Unnamed: 0,id,title,popularity,release_date
103,10193,Toy Story 3,59.995418,2010-06-16
2637,863,Toy Story 2,73.575118,1999-10-30
3716,862,Toy Story,73.640445,1995-10-30


Merge toy_story and taglines on the id column with a left join, and save the result as toystory_tag.

In [5]:
toystory_tag = toy_story.merge(taglines, on="id", how="left")

display(toystory_tag.head(3))

Unnamed: 0,id,title,popularity,release_date,tagline
0,10193,Toy Story 3,59.995418,2010-06-16,No toy gets left behind.
1,863,Toy Story 2,73.575118,1999-10-30,The toys are back!
2,862,Toy Story,73.640445,1995-10-30,


With toy_story as the left table, merge to it taglines on the id column with an inner join, and save as toystory_tag.

In [6]:
toystory_tag = toy_story.merge(taglines, on="id", how="inner")

display(toystory_tag.head(3))


Unnamed: 0,id,title,popularity,release_date,tagline
0,10193,Toy Story 3,59.995418,2010-06-16,No toy gets left behind.
1,863,Toy Story 2,73.575118,1999-10-30,The toys are back!


How many rows with a left join?
Select the true statement about left joins.

Try running the following code statements.

`left_table.merge(one_to_one, on='id', how='left').shape`<br>
`left_table.merge(one_to_many, on='id', how='left').shape`

Note that the left_table starts out with 4 rows.

In [11]:
# preloaded
one_to_one = taglines
one_to_many = pd.read_pickle("../../data/2/crews.p")
left_table = movies[movies["title"].str.contains("jurassic", case=False)]

display(f"left_table, shape: {left_table.shape}", left_table.head(3))
display(f"one_to_one, shape: {one_to_one.shape}", one_to_one.head(3))
display(f"one_to_many, shape: {one_to_many.shape}", one_to_many.head(3))

'left_table, shape: (4, 4)'

Unnamed: 0,id,title,popularity,release_date
160,329,Jurassic Park,40.413191,1993-06-11
682,330,The Lost World: Jurassic Park,2.502487,1997-05-23
1867,135397,Jurassic World,418.708552,2015-06-09


'one_to_one, shape: (3955, 2)'

Unnamed: 0,id,tagline
0,19995,Enter the World of Pandora.
1,285,"At the end of the world, the adventure begins."
2,206647,A Plan No One Escapes


'one_to_many, shape: (42502, 4)'

Unnamed: 0,id,department,job,name
0,19995,Editing,Editor,Stephen E. Rivkin
2,19995,Sound,Sound Designer,Christopher Boyes
4,19995,Production,Casting,Mali Finn


In [13]:
print(left_table.merge(one_to_one, on='id', how='left').shape)
print(left_table.merge(one_to_many, on='id', how='left').shape)

(4, 5)
(232, 7)
