# Advanced Merging

In [29]:
import pandas as pd
from pyprojroot import here

In [30]:
users = pd.read_csv(here("data/kaggle-data/user_table.csv"))
friends = pd.read_csv(here("data/kaggle-data/friends_table.csv"))
users["id"] = range(1, len(users) + 1)
# let's remove some users from the friends table
friends = friends[friends["Friend 1"] > 10]

## Semi Join
***

Pandas does not directly support the semi (or anti) joins. But they can be emulated. Let's look how to do that for a semi join. Here are the required steps:

1. Merge the 2 tables with an inner join.
2. Search if the left table's key '.isin()' the key of the merged table.
3. Subset the rows of the left table by the boolean series created by '.isin()'.

This returns the rows and columns of the left table only, that have a corresponding ID match in the right table.

In [31]:
users_friends = users.merge(friends, left_on="id", right_on="Friend 1", how="inner")
key_search = users["id"].isin(users_friends["id"])
users[key_search]

Unnamed: 0,Surname,Name,Age,Subscription Date,id
10,Kirk,Josie,31,1588166811,11
11,Wellington,Sarah,40,1588160408,12
12,Meier,Francine,32,1588161431,13
13,Pomme,Anna,41,1588168125,14
14,Smith,Zoe,26,1588164495,15
...,...,...,...,...,...
995,Kirk,Lee,19,1588160246,996
996,Pomme,Franz,40,1588159625,997
997,Gwahsi,Thomas,40,1588165504,998
998,Beierlorzer,Jean-Luc,32,1588151074,999


## Anti Join
***
This requires the use of the pandas merge `indicator` argument, which allows you to see what side of the join the key was matched.

1. Perform a left join, specifying `indicator=True`
2. Locate the key column of the merged table where the `"_merge"` column is equal to "left_only".
3. Subset the left table where id `.isin()` the "left_only" IDs.

In [36]:
users_friends = users.merge(friends, left_on="id", right_on="Friend 1", how="left", indicator=True)
null_matches = users_friends.loc[users_friends["_merge"] == "left_only", "id"]
users[users["id"].isin(null_matches)]

Unnamed: 0,Surname,Name,Age,Subscription Date,id
0,Smith,Sarah,30,1588157373,1
1,Picard,Francine,32,1588161732,2
2,Roth,Hans,40,1588157337,3
3,Pomme,Ali,28,1588165636,4
4,Di Lillo,Jordi,42,1588156042,5
5,Roth,Anna,26,1588162689,6
6,Kirk,Jordi,56,1588153009,7
7,Beierlorzer,Josie,20,1588166376,8
8,Picard,Robert,39,1588158173,9
9,Meier,Jean-Luc,37,1588156009,10
