## Merging data frames

Merging can be done in 2 ways : *horizontally*(also called join) and *vertically* (also called concatination).

There are 4 types of joins(merges) that can be performed:
1. Outer Join (union)
2. Inner Join (intersection)
3. Left Join 
4. Right Join

For A & B (two dfs), A being the first df, left join will include all the rows of A, and intersecting rows of B.

In [7]:
import pandas as pd

staffs = pd.DataFrame([
    {'Name':'Kelly', 'Role':'Director of HR'},
    {'Name':'Sally', 'Role':'Course liason'},
    {'Name':'James', 'Role':'Grader'},
])

students = pd.DataFrame([
    {'Name':'James', 'School':'Business'},
    {'Name':'Sally', 'School':'Law'},
    {'Name':'Mike', 'School':'Engineering'},
])

staffs =staffs.set_index('Name')
students = students.set_index('Name')
# Note that these have a common index

print(staffs.head())
print(students.head())


                 Role
Name                 
Kelly  Director of HR
Sally   Course liason
James          Grader
            School
Name              
James     Business
Sally          Law
Mike   Engineering


In [8]:
# Outer join
pd.merge(staffs,students,how='outer',left_index=True,right_index=True)

Unnamed: 0_level_0,Role,School
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
James,Grader,Business
Kelly,Director of HR,
Mike,,Engineering
Sally,Course liason,Law


In [9]:
# Inner join
pd.merge(staffs,students,how='inner',left_index=True,right_index=True)

Unnamed: 0_level_0,Role,School
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Sally,Course liason,Law
James,Grader,Business


In [10]:
# Left join
pd.merge(staffs,students,how='left',left_index=True,right_index=True)

Unnamed: 0_level_0,Role,School
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Kelly,Director of HR,
Sally,Course liason,Law
James,Grader,Business


In [11]:
# Right join
pd.merge(staffs,students,how='right',left_index=True,right_index=True)

Unnamed: 0_level_0,Role,School
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
James,Grader,Business
Sally,Course liason,Law
Mike,,Engineering


We don't need **indices** to join data frames we can also use **columns** using the parameter **on**

In [12]:
staffs= staffs.reset_index()
students = students.reset_index()


pd.merge(staffs,students,how='right',on='Name')

Unnamed: 0,Name,Role,School
0,Sally,Course liason,Law
1,James,Grader,Business
2,Mike,,Engineering


In [13]:
# conflicting columns
staffs = pd.DataFrame([
    {'Name':'Kelly', 'Role':'Director of HR', 'Location':'Jane Street'},
    {'Name':'Sally', 'Role':'Course liason', 'Location':'Western Ave'},
    {'Name':'James', 'Role':'Grader','Location':'Wilson Ave'},
])

students = pd.DataFrame([
    {'Name':'James', 'School':'Business','Location':'House #22 Woodward Street'},
    {'Name':'Sally', 'School':'Law','Location':'House #2 Jane Street'},
    {'Name':'Mike', 'School':'Engineering','Location':'House #12 183rd Street'},
])


# Here location in staffs df refers the office location where as the location in students refers house address
# Pandas resolve this conflict by adding _x and _y after conflicting cols.


pd.merge(staffs,students,how='left',on='Name')

Unnamed: 0,Name,Role,Location_x,School,Location_y
0,Kelly,Director of HR,Jane Street,,
1,Sally,Course liason,Western Ave,Law,House #2 Jane Street
2,James,Grader,Wilson Ave,Business,House #22 Woodward Street


We can also pass a list inside **on** parameter for example, ```on=['FirstName','LastName']```


#### Merging vertically (concatinating)

we can concatinate dataframes vertically like ```pd.concat(frames)```
where *frames* is the list of data frames to be concatinated. Also we can set keys using ```pd.concat(frames,keys=['2001','2002','2003']```