# Merging Guest Lists â€” In class

This notebook contains guided exercises on merging data frames. Imagine the following situation: Two friends decide to host a party together. They both have their own guest list which needs to be merged in order to effectively organize the party.

## Go through the notebook

1. Understanding: Go through all cells and try to understand what is done here by answering the questions. Annotate the cells either using a markdown cell or by commenting out.
2. Run the code and inspect results.

In [None]:
# Getting ready

import pandas as pd

## Create and Check Guest List for Host A: Ally

Take a moment to take a good look at both guest lists. What do you notice? How long are they? Where do they overlap?

In [None]:
# What object type is useful for storing information on guests? What info do these variables store? What steps are taken here to prepare the data?

guest_list_a = {
    "GuestName": ["Alice", "Bob", "Charlie", "Denise", "Lola", "Gus", "Frankie", "Gene", "Corrie"],
    "RSVP_A": ["Yes", "No", "Yes", "Yes", "Maybe", "Maybe", "Yes", "No", "Maybe"]
}
df_a = pd.DataFrame(guest_list_a)
df_a

## Create and Check Guest List for Host B: Bert

In [None]:
# What object type is useful for storing information on guests? What info do these variables store? What steps are taken here to prepare the data?

guest_list_b = {
    "GuestName": ["Bob", "Charlie", "Elena", "Frank", "Denise", "Rob"],
    "RSVP_B": ["Yes", "Yes", "No", "Yes", "Maybe", "Maybe"]
}
df_b = pd.DataFrame(guest_list_b)
df_b

## Inner Join

The inner join includes only guests that who are on both lists. This means that only guests who know both hosts will be invited to the party. 

Before running the code, take a guess: How many guests will be invited in this scenario?

In [None]:
# Who will be invited to the party? What do you notice in the output? How can you explain missing values (if any)?
# Anything that might become problematic at some point? 

inner_merge = pd.merge(df_a, df_b, on="GuestName", how="inner")

print("Number of observations in the dataframe: ", len(inner_merge))
print("Number of missing values in variables: ", inner_merge.isna().sum())

inner_merge

## Outer Join

The outer join includes all observations that can be possibly included. This means that guests on both lists will be invited, no matter if they only know one of the hosts.

Before running the code, take a guess: How many guests will be invited in this scenario?

In [None]:
# Who will be invited to the party? What do you notice in the output? How can you explain missing values (if any)?
# Anything that might become problematic at some point? 

outer_merge = pd.merge(df_a, df_b, on="GuestName", how="outer")

print("Number of observations in the dataframe: ", len(outer_merge))
print("Number of missing values in variables: ", outer_merge.isna().sum())

outer_merge

## Left Join

The left join literally means that the dataframe listed first (on the left side, i.e., Ally's guest list) will be the reference dataframe. 

Before running the code, take a guess: How many guests will be invited in this scenario? What happens to guests listed in the other dataframe?

In [None]:
# Who will be invited to the party? What do you notice in the output? How can you explain missing values (if any)?
# Anything that might become problematic at some point? 

left_merge = pd.merge(df_a, df_b, on="GuestName", how="left")

print("Number of observations in the dataframe: ", len(left_merge))
print("Number of missing values in variables: ", left_merge.isna().sum())

left_merge

## Right Join

The right join means that the dataframe listed second (on the right side) will be the reference dataframe. The number of guests stored in the 'right' dataframe thus defines the number of guests invited.

Before running the code, take a guess: How many guests will be invited in this scenario? What happens to guests listed in the other dataframe?

In [None]:
# Who will be invited to the party? What do you notice in the output? How can you explain missing values (if any)?
# Anything that might become problematic at some point? 

right_merge = pd.merge(df_a, df_b, on="GuestName", how="right")

print("Number of observations in the dataframe: ", len(right_merge))
print("Number of missing values in variables: ", right_merge.isna().sum())

right_merge