# Different Joins

By default, pandas merge executes an inner join. Specify a different join with the `how` parameter.

In [1]:
import pandas as pd
from pyprojroot import here

In [7]:
users = pd.read_csv(here("data/kaggle-data/user_table.csv"))
reactions = pd.read_csv(here("data/kaggle-data/reactions_table.csv"))
users["User"] = range(1, len(users)+1)

In [34]:
print(f"Number of rows on an inner join: {len(users.merge(reactions, on='User')):,}")
print(f"Number of rows on a left join: {len(users.merge(reactions, on='User', how='right')):,}")


Number of rows on an inner join: 14,042
Number of rows on a left join: 26,365


## Check Join Integrity with outer join
***

Count nulls in the table after performing an outer join to get an idea of unmatched IDs. 

In [36]:
user_react = users.merge(reactions, on="User", how="outer")
user_react

Unnamed: 0,Surname,Name,Age,Subscription Date,User,Reaction Type,Reaction Date
0,Smith,Sarah,30.0,1.588157e+09,1,,
1,Picard,Francine,32.0,1.588162e+09,2,,
2,Roth,Hans,40.0,1.588157e+09,3,,
3,Pomme,Ali,28.0,1.588166e+09,4,,
4,Di Lillo,Jordi,42.0,1.588156e+09,5,,
...,...,...,...,...,...,...,...
27235,,,,,7816,Like,1.588166e+09
27236,,,,,7816,Like,1.588168e+09
27237,,,,,7816,Emoticon,1.588165e+09
27238,,,,,7816,Like,1.588167e+09


In [41]:
mismatches = (user_react["Name"].isnull() | user_react["Reaction Type"].isnull())
mismatches.sum()
print(f"{round(mismatches.sum() / len(user_react) * 100, 1)} % of rows were unmatched in this join.")

48.5 % of rows were unmatched in this join.
