* [pd.merge(how='left')](#left) 
* [pd.merge(how='right')](#right)
* [pd.merge(how='outer')](#outer)
* [Merge on an Index](#mergeOnIndex)
* [Merge on different column names](#merge_on_different_column_names)
* [Merge when having duplicate columns](#merge_duplicate)

<a id='left'></a>
__`pd.merge(how='left')`__

In [11]:
import pandas as pd

__!!! Note__ that the order of the tables being passed in as arguments does matter here.
<br>The first table passed in is the left one and the second table passed in is the right one.

In [13]:
registrations = pd.DataFrame({'reg_id':[1,2,3,4],'name':['Andrew','Bobo','Claire','David']})
logins = pd.DataFrame({'log_id':[1,2,3,4],'name':['Xavier','Andrew','Yolanda','Bobo']})

In [14]:
registrations

Unnamed: 0,reg_id,name
0,1,Andrew
1,2,Bobo
2,3,Claire
3,4,David


In [15]:
logins

Unnamed: 0,log_id,name
0,1,Xavier
1,2,Andrew
2,3,Yolanda
3,4,Bobo


Because I say how is equal to left, that means I want everything in the name column that is present in the left hand table.

In [6]:
pd.merge(registrations, logins, how='left', on='name')

Unnamed: 0,reg_id,name,log_id
0,1,Andrew,2.0
1,2,Bobo,4.0
2,3,Claire,
3,4,David,


___

<a id='right'></a>

__`pd.merge(how='right')`__

In [9]:
pd.merge(registrations, logins, how='right', on='name')

Unnamed: 0,reg_id,name,log_id
0,,Xavier,1
1,1.0,Andrew,2
2,,Yolanda,3
3,2.0,Bobo,4


When we say how is equal to right that means all the names in the right hand logins table will be present in the results.

___

<a id='outer'></a>
__`pd.merge(how='outer')`__

This allows us to include everything present in both tables.

We have names that only appear in one table. We can use the how='outer' to make sure we grab all names from both tables.

In [16]:
pd.merge(registrations, logins, how='outer', on='name')

Unnamed: 0,reg_id,name,log_id
0,1.0,Andrew,2.0
1,2.0,Bobo,4.0
2,3.0,Claire,
3,4.0,David,
4,,Xavier,1.0
5,,Yolanda,3.0


____

<a id='mergeOnIndex'></a>
__`Merge on an Index`__

In [17]:
registrations = registrations.set_index('name')
registrations

Unnamed: 0_level_0,reg_id
name,Unnamed: 1_level_1
Andrew,1
Bobo,2
Claire,3
David,4


In [18]:
logins

Unnamed: 0,log_id,name
0,1,Xavier
1,2,Andrew
2,3,Yolanda
3,4,Bobo


We have to specify now that we actually want to join __on the registration's index__ and __on the login name column__.

In [21]:
pd.merge(registrations, logins, left_index=True, right_on='name', how='inner')

Unnamed: 0,reg_id,log_id,name
1,1,2,Andrew
3,2,4,Bobo


___

<a id='merge_on_different_column_names'></a>
__`Merge on different column names`__

In [23]:
registrations = registrations.reset_index()

In [24]:
registrations

Unnamed: 0,name,reg_id
0,Andrew,1
1,Bobo,2
2,Claire,3
3,David,4


In [25]:
registrations.columns = ['reg_name', 'reg_id']

In [29]:
registrations # So while they do represent the same thing, they no longer have the same column name.

Unnamed: 0,reg_name,reg_id
0,Andrew,1
1,Bobo,2
2,Claire,3
3,David,4


In [30]:
logins

Unnamed: 0,log_id,name
0,1,Xavier
1,2,Andrew
2,3,Yolanda
3,4,Bobo


In [31]:
pd.merge(registrations, logins, how='inner', left_on='reg_name', right_on='name')

Unnamed: 0,reg_name,reg_id,log_id,name
0,Andrew,1,2,Andrew
1,Bobo,2,4,Bobo


__!!! Something to keep in mind__ here is it will return both the reg_name and name, even though they should technically be representing the same values. So because of that, you can always just drop it afterwards.

In [32]:
results = pd.merge(registrations, logins, how='inner', left_on='reg_name', right_on='name')
results.drop('reg_name', axis=1)

Unnamed: 0,reg_id,log_id,name
0,1,2,Andrew
1,2,4,Bobo


___

<a id='merge_duplicate'></a>
__`Merge when having duplicate columns`__

If you have columns that happen to have the exact same name in both tables and you're not joining on those columns.

In [34]:
registrations

Unnamed: 0,reg_name,reg_id
0,Andrew,1
1,Bobo,2
2,Claire,3
3,David,4


In [35]:
registrations.columns = ['name', 'id']

In [36]:
logins

Unnamed: 0,log_id,name
0,1,Xavier
1,2,Andrew
2,3,Yolanda
3,4,Bobo


In [37]:
logins.columns = ['id', 'name']

In [38]:
registrations

Unnamed: 0,name,id
0,Andrew,1
1,Bobo,2
2,Claire,3
3,David,4


In [40]:
logins

Unnamed: 0,id,name
0,1,Xavier
1,2,Andrew
2,3,Yolanda
3,4,Bobo


If we actually merge on these, we will get __two columns with the exact same name__ 'id'. And what pandas is going to do is it will __automatically label__ these.

In [41]:
pd.merge(registrations, logins, how='inner', on='name')

Unnamed: 0,name,id_x,id_y
0,Andrew,1,2
1,Bobo,2,4


So when you run this, you'll notice pandas __automatically__ tags these duplicate columns with a suffix and by default it does ___x for the left table__ and ___y for the right table__.

You can also provide your __own tuple of the suffix__ you want to use for each table.



In [42]:
pd.merge(registrations, logins, how='inner', on='name', suffixes=('_reg', '_log'))

Unnamed: 0,name,id_reg,id_log
0,Andrew,1,2
1,Bobo,2,4
