# Merging
The purpose of this script is to demonstrate examples of the four different types of joins (inner, outer, left, and right) to aid in the comprehension of joining (i.e. merging).

In [1]:
import pandas as pd

Load data (fictional data)

In [2]:
df1 = pd.read_csv("https://raw.githubusercontent.com/nmbrodnax/ppol-565/master/misc_resources/merging/df1.csv")
df2 = pd.read_csv("https://raw.githubusercontent.com/nmbrodnax/ppol-565/master/misc_resources/merging/df2.csv")

In [3]:
df1.head()

Unnamed: 0,Name,Age,Num1
0,Person1,25,0.5
1,Person2,37,2.1
2,Person3,51,9.5
3,Person5,64,7.3
4,Person7,15,6.4


In [4]:
df2.head()

Unnamed: 0,Name,Fruit
0,Person1,apple
1,Person2,orange
2,Person3,strawberry
3,Person4,grape
4,Person6,kiwi


Between both dataframes, there are seven unique people. Three are both in df1 and df2. Two are only in df1 and two are only in df2.

### Pandas merge documentation

In [5]:
?pd.merge

Pandas merge user guide: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html#database-style-dataframe-or-named-series-joining-merging 

## Inner Join
Joins the data from variables in both datasets only for the Names that are in common between df1 and df2.

In [6]:
inner = pd.merge(df1, df2, how = "inner", on = "Name")

In [7]:
inner.head()

Unnamed: 0,Name,Age,Num1,Fruit
0,Person1,25,0.5,apple
1,Person2,37,2.1,orange
2,Person3,51,9.5,strawberry


## Outer Join
Joins all of the Names and variables from both dataframes, notice the missing data.

In [8]:
outer = pd.merge(df1, df2, how = "outer", on = "Name")

In [9]:
outer.head(7)

Unnamed: 0,Name,Age,Num1,Fruit
0,Person1,25.0,0.5,apple
1,Person2,37.0,2.1,orange
2,Person3,51.0,9.5,strawberry
3,Person5,64.0,7.3,
4,Person7,15.0,6.4,
5,Person4,,,grape
6,Person6,,,kiwi


## Left Join
Joins data from df2 to df1, but only for the Names that are in df1, again take note of the missing data and the order in which the variables are the dataframe. 

Note: left and right are identical objects in terms of the data that they contain. Take note of the order in which the dataframes are specified and what type of join is used.

In [10]:
left = pd.merge(df1, df2, how = "left", on = "Name")
right = pd.merge(df2, df1, how = "right", on = "Name")

In [11]:
left.head()

Unnamed: 0,Name,Age,Num1,Fruit
0,Person1,25,0.5,apple
1,Person2,37,2.1,orange
2,Person3,51,9.5,strawberry
3,Person5,64,7.3,
4,Person7,15,6.4,


In [12]:
right.head()

Unnamed: 0,Name,Fruit,Age,Num1
0,Person1,apple,25,0.5
1,Person2,orange,37,2.1
2,Person3,strawberry,51,9.5
3,Person5,,64,7.3
4,Person7,,15,6.4


## Right Join
Join data from df1 to df2, but only for the Names that are in df2, again take note of the missing data and the order in which the variables are the dataframe)

Note: right2 and left2 are identical objects in terms of the data that they contain. Take note of the order in which the dataframes are specified and what type of join is used.

In [13]:
right2 = pd.merge(df1, df2, how = "right", on = "Name")
left2 = pd.merge(df2, df1, how = "left", on = "Name")

In [14]:
right2.head()

Unnamed: 0,Name,Age,Num1,Fruit
0,Person1,25.0,0.5,apple
1,Person2,37.0,2.1,orange
2,Person3,51.0,9.5,strawberry
3,Person4,,,grape
4,Person6,,,kiwi


In [15]:
left2.head()

Unnamed: 0,Name,Fruit,Age,Num1
0,Person1,apple,25.0,0.5
1,Person2,orange,37.0,2.1
2,Person3,strawberry,51.0,9.5
3,Person4,grape,,
4,Person6,kiwi,,


<b>Note</b>: if the variable you are joining on is named differently in the two datasets, you will need to specify both left_on = "name1" and right_on = "name2".