# Combining DataFrames

Methods:
1. Concatenation
2. Merge
   1. Inner Merge
   2. Left Merge
   3. Right Merge
   4. Outer Merge

In [1]:
import pandas as pd
import numpy as np
import warnings

warnings.filterwarnings('ignore')

## Concatentation

Concatentation is simply pasting the two DataFrames together by columns

We can also perform concatentation by rows

In [2]:
data_one = {'A': ['A0', 'A1', 'A2', 'A3'], 'B': ['B0', 'B1', 'B2', 'B3']}

In [3]:
data_two = {'C': ['C0', 'C1', 'C2', 'C3'], 'D': ['D0', 'D1', 'D2', 'D3']}

In [4]:
dataframe_one = pd.DataFrame(data_one)

In [5]:
dataframe_two = pd.DataFrame(data_two)

In [6]:
dataframe_one

Unnamed: 0,A,B
0,A0,B0
1,A1,B1
2,A2,B2
3,A3,B3


In [7]:
dataframe_two

Unnamed: 0,C,D
0,C0,D0
1,C1,D1
2,C2,D2
3,C3,D3


### Concatentating by Columns

In [8]:
pd.concat([dataframe_one, dataframe_two], axis=1)

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3


### Concatentating by Rows

In [9]:
pd.concat([dataframe_one, dataframe_two], axis=0)

Unnamed: 0,A,B,C,D
0,A0,B0,,
1,A1,B1,,
2,A2,B2,,
3,A3,B3,,
0,,,C0,D0
1,,,C1,D1
2,,,C2,D2
3,,,C3,D3


**Renaming Columns to match in both dataframes**

In [10]:
dataframe_two.columns = dataframe_one.columns

In [11]:
dataframe_two

Unnamed: 0,A,B
0,C0,D0
1,C1,D1
2,C2,D2
3,C3,D3


In [12]:
pd.concat([dataframe_one, dataframe_two], axis=0)

Unnamed: 0,A,B
0,A0,B0
1,A1,B1
2,A2,B2
3,A3,B3
0,C0,D0
1,C1,D1
2,C2,D2
3,C3,D3


**Correcting Index**

In [13]:
concat_dataframe = pd.concat([dataframe_one, dataframe_two], axis=0)
concat_dataframe

Unnamed: 0,A,B
0,A0,B0
1,A1,B1
2,A2,B2
3,A3,B3
0,C0,D0
1,C1,D1
2,C2,D2
3,C3,D3


In [14]:
concat_dataframe.index = range(len(concat_dataframe))
concat_dataframe

Unnamed: 0,A,B
0,A0,B0
1,A1,B1
2,A2,B2
3,A3,B3
4,C0,D0
5,C1,D1
6,C2,D2
7,C3,D3


## Merge

often DataFrames are not in the exact same order or format, meaning we can not simply concatenate them together.

In this case, we need to merge the DataFrames

There are 3 main ways of merging tables together using **how** parameter
1. Inner
2. Outer
3. Left or Right

In [15]:
registrations = pd.DataFrame({'reg_id':[1,2,3,4],'name':['Andrew','Bobo','Claire','David']})

In [16]:
logins = pd.DataFrame({'log_id':[1,2,3,4],'name':['Xavier','Andrew','Yolanda','Bobo']})

In [17]:
registrations

Unnamed: 0,reg_id,name
0,1,Andrew
1,2,Bobo
2,3,Claire
3,4,David


In [18]:
logins

Unnamed: 0,log_id,name
0,1,Xavier
1,2,Andrew
2,3,Yolanda
3,4,Bobo


### Inner Merge

Match up where the key is present in BOTH tables. There should be no NaNs due to the join, since by definition to be part of the Inner Join they need info in both tables.

Only Andrew and Bobo both registered and logged in

In [19]:
pd.merge(registrations,logins,how='inner',on='name')

Unnamed: 0,reg_id,name,log_id
0,1,Andrew,2
1,2,Bobo,4


In [20]:
pd.merge(registrations,logins,how='inner')

Unnamed: 0,reg_id,name,log_id
0,1,Andrew,2
1,2,Bobo,4


### Left Join

Match up AND include all rows from Left Table
Show everyone who registered on Left Table, if they don't have login info, then fill with NaN

In [21]:
pd.merge(registrations,logins,how='left')

Unnamed: 0,reg_id,name,log_id
0,1,Andrew,2.0
1,2,Bobo,4.0
2,3,Claire,
3,4,David,


### Right Join

Match up AND include all rows from Right Table
Show everyone who logged in on the Right Table, if they don't have registration info, then fill with NaN.

In [22]:
pd.merge(registrations,logins,how='right')

Unnamed: 0,reg_id,name,log_id
0,,Xavier,1
1,1.0,Andrew,2
2,,Yolanda,3
3,2.0,Bobo,4


### Outer Join

Match up on all info found in either Left or Right Table
Show everyone that's in the Log in table and the registrations table. Fill any missing info with NaN

In [23]:
pd.merge(registrations,logins,how='outer')

Unnamed: 0,reg_id,name,log_id
0,1.0,Andrew,2.0
1,2.0,Bobo,4.0
2,3.0,Claire,
3,4.0,David,
4,,Xavier,1.0
5,,Yolanda,3.0
