# Combining Dataframes

- Often the data you need exists in tow separate sources, fortunately, Pandas makes it easy to combine these together.
- The simplest combination is if both sources in the same format, then a concatention through the pd.concat() call is all that is needed.
<br><br>
- Concatenation is simply "pasting" the two DataFrames together, 

##### by Columns:

<img src="./assets/9/1-concat_1.png" width="600px" />
<img src="./assets/9/1-concat_2.png" width="600px" />

<br><br>
##### by Rows:
<img src="./assets/9/1-concat_3.png" width="600px" />
<img src="./assets/9/1-concat_4.png" width="600px" />

<br><br>
##### Note:
- Pandas will also automatically fill NaN where necessary.

In [21]:
import pandas as pd
import numpy as np

In [22]:
data_one = {'A': ['A0','A1','A2','A3'], 'B': ['B0','B1','B2','B3'] }

In [23]:
data_two = {'C': ['C0','C1','C2','C3'], 'D': ['D0','D1','D2','D3'] }

In [24]:
one = pd.DataFrame(data_one)

In [25]:
two = pd.DataFrame(data_two)

In [26]:
one

Unnamed: 0,A,B
0,A0,B0
1,A1,B1
2,A2,B2
3,A3,B3


In [27]:
two

Unnamed: 0,C,D
0,C0,D0
1,C1,D1
2,C2,D2
3,C3,D3


In [28]:
# Concatinate columns
pd.concat([one, two], axis=1)

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3


In [29]:
# Concatinate Rows
pd.concat([one, two], axis=0)

of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  


Unnamed: 0,A,B,C,D
0,A0,B0,,
1,A1,B1,,
2,A2,B2,,
3,A3,B3,,
0,,,C0,D0
1,,,C1,D1
2,,,C2,D2
3,,,C3,D3


In [30]:
one.columns

Index(['A', 'B'], dtype='object')

In [31]:
two.columns

Index(['C', 'D'], dtype='object')

In [32]:
# rename the columns, to make it easier to concatinate the rows
two.columns = one.columns

In [33]:
pd.concat([one, two], axis=0)

Unnamed: 0,A,B
0,A0,B0
1,A1,B1
2,A2,B2
3,A3,B3
0,C0,D0
1,C1,D1
2,C2,D2
3,C3,D3


In [34]:
myDf = pd.concat([one, two], axis=0)

In [35]:
myDf

Unnamed: 0,A,B
0,A0,B0
1,A1,B1
2,A2,B2
3,A3,B3
0,C0,D0
1,C1,D1
2,C2,D2
3,C3,D3


In [36]:
# fix the index duplicae issue
myDf.index = range(len(myDf))

In [37]:
myDf

Unnamed: 0,A,B
0,A0,B0
1,A1,B1
2,A2,B2
3,A3,B3
4,C0,D0
5,C1,D1
6,C2,D2
7,C3,D3
