# Left Join

In this lesson, we focus on the concept of a **left join**, another method of combining two tables. Before diving in, let’s recall what we already know.

Previously, we learned about the `merge` method in pandas, which combines two tables using one or more key columns. By default, this method performs an **inner join**, meaning it only returns rows where the key values are present in both tables.

A **left join** works differently. It returns **all the rows from the left table**, and only the matching rows from the right table based on the key columns. If a value in the left table does not have a corresponding match in the right table, the left table’s data is still included, but the missing fields from the right table are filled with null values.

This is an important distinction because a left join ensures that no data from the left table is excluded, even when no match exists in the right table. This feature makes left joins particularly useful when the left dataset is the main source of information and you want to enrich it with details from another table without losing any original rows.

When performing a left join, pandas requires the `how='left'` argument in the merge function. The default setting is `how='inner'`, which is why it wasn’t necessary to specify this earlier when we worked only with inner joins.

After completing a left join, the number of rows in the resulting table will always be equal to the number of rows in the left table, assuming a one-to-one relationship. This consistency ensures that the left table’s data remains fully intact while supplementing it with additional information where available.


## Prepare Data

In [3]:
# Import pandas library
import pandas as pd

# Read the file
movies = pd.read_pickle("datasets/movies.p")
financials = pd.read_pickle("datasets/financials.p")
toy_story = pd.read_csv("datasets/toy_story.csv")
taglines = pd.read_pickle("datasets/taglines.p")

## Exercise: Counting missing rows with left join

The Movie Database is a collaborative project where volunteers gather and input various types of information, including details about movie budgets and revenues. To check which movies lack financial data, we can apply a **left join** between the movies table and the financials table. This allows us to preserve all movie records while spotting the ones with missing financial details.

The `movies` and `financials` DataFrames are already available for use.

### Instructions

1. Perform a left join between the `movies` table and the `financials` table, keeping `movies` as the left table, and store the result in `movies_financials`.
2. Find out how many rows in `movies_financials` do not have a value in the `budget` column.


In [3]:
# Perform a left join between movies and financials
movies_financials = movies.merge(financials, how='left', on='id')

# Calculate how many rows have missing budget values
missing_budget_count = movies_financials['budget'].isna().sum()

# Display the result
print(missing_budget_count)

1574


## Exercise: Enriching a dataset

Using `how='left'` in the `.merge()` method is a practical way to add more context to an existing dataset. In this task, you will work with a small set of Toy Story movies and add their marketing tag lines. The exercise also demonstrates the difference in results between a **left join** and an **inner join**.

The `toy_story` DataFrame contains Toy Story movie data, and the `taglines` DataFrame contains marketing tag lines. Both have been preloaded.

### Instructions

1. Merge `toy_story` with `taglines` on the `id` column using a **left join** and store the result in `toystory_left`.
2. Merge `toy_story` with `taglines` on the `id` column using an **inner join** and store the result in `toystory_inner`.

In [4]:
# Left join: keep all rows from toy_story and add taglines when available
toystory_left = toy_story.merge(taglines, on='id', how='left')

# Display the merged table and its dimensions
print(toystory_left)
print(toystory_left.shape)

      id        title  popularity release_date                   tagline
0  10193  Toy Story 3      59.995   2010-06-16  No toy gets left behind.
1    863  Toy Story 2      73.575   1999-10-30        The toys are back!
2    862    Toy Story      73.640   1995-10-30                       NaN
(3, 5)


In [5]:
# Inner join: keep only rows that have matches in both tables
toystory_inner = toy_story.merge(taglines, on='id', how='inner')

# Display the merged table and its dimensions
print(toystory_inner)
print(toystory_inner.shape)

      id        title  popularity release_date                   tagline
0  10193  Toy Story 3      59.995   2010-06-16  No toy gets left behind.
1    863  Toy Story 2      73.575   1999-10-30        The toys are back!
(2, 5)
