***
# 4.05 Pandas Merging, Joining, and Concatenating
- Pandas Documentation: https://pandas.pydata.org/
***
### Python 4.01. Series
### Python 4.02. Pandas DataFrame, Selection, and Indexing
### Python 4.03. Configuring Options, Data Type Conversion, Working with strings and dates, Missing Data
### Python 4.04. Groupby, Categorizing, and Labeling Data
### Python 4.05. Merging,Joining,and Concatenating
### Python 4.06. Pipe, Apply, Applymap, Map, Pivot Table, and Cotingency Table
### Python 4.07. Data Input and Output
### Python 4.08. Data Visualization
### Python 4.09. Exploratory Data Analysis and Beyond
### Python 4.10. Breakout Group Exercise and Solution
***

## Merging, Joining, and Concatenating Contents:

There are 3 main ways of combining DataFrames together: Merging, Joining, and Concatenating. Let us discuss these 3 methods with examples.
### 1. Concatenation: concat() function

### 2. Merging: merge() function

### 3. Joinning: join() function

***



## 1. Concatenation: `concat()` function 

Concatenation adds together DataFrames: dimensions should match along the axis we are concatenating on. 

We can use **pd.concat** and pass in a list of DataFrames to concatenate together:

### Example 1

In [29]:
import pandas as pd
import numpy as np

In [30]:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                    'C': ['C0', 'C1', 'C2', 'C3'],
                    'D': ['D0', 'D1', 'D2', 'D3']}, index=[0, 1, 2, 3])
df1

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3


In [31]:
df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
                    'B': ['B4', 'B5', 'B6', 'B7'],
                    'C': ['C4', 'C5', 'C6', 'C7'],
                    'D': ['D4', 'D5', 'D6', 'D7']}, index=[4, 5, 6, 7]) 
df2

Unnamed: 0,A,B,C,D
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7


In [32]:
df3 = pd.DataFrame({'A': ['A8', 'A9', 'A10', 'A11'],
                    'B': ['B8', 'B9', 'B10', 'B11'],
                    'C': ['C8', 'C9', 'C10', 'C11'],
                    'D': ['D8', 'D9', 'D10', 'D11']}, index=[8, 9, 10, 11])
df3

Unnamed: 0,A,B,C,D
8,A8,B8,C8,D8
9,A9,B9,C9,D9
10,A10,B10,C10,D10
11,A11,B11,C11,D11


In [33]:
pd.concat([df1,df2,df3])

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7
8,A8,B8,C8,D8
9,A9,B9,C9,D9


In [34]:
pd.concat([df1,df2,df3]).drop_duplicates().reset_index(drop=True)

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3
4,A4,B4,C4,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7
8,A8,B8,C8,D8
9,A9,B9,C9,D9


In [35]:
pd.concat([df1,df2,df3],axis=1)

Unnamed: 0,A,B,C,D,A.1,B.1,C.1,D.1,A.2,B.2,C.2,D.2
0,A0,B0,C0,D0,,,,,,,,
1,A1,B1,C1,D1,,,,,,,,
2,A2,B2,C2,D2,,,,,,,,
3,A3,B3,C3,D3,,,,,,,,
4,,,,,A4,B4,C4,D4,,,,
5,,,,,A5,B5,C5,D5,,,,
6,,,,,A6,B6,C6,D6,,,,
7,,,,,A7,B7,C7,D7,,,,
8,,,,,,,,,A8,B8,C8,D8
9,,,,,,,,,A9,B9,C9,D9


## 2. Merging: `merge()` function

The `merge()` function allows you to merge DataFrames together using a similar logic as merging SQL Tables together.

### Example 2

![LeftJoin.png](attachment:LeftJoin.png)

In [36]:
left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})
left 

Unnamed: 0,key,A,B
0,K0,A0,B0
1,K1,A1,B1
2,K2,A2,B2
3,K3,A3,B3


In [37]:
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3'], 'key': ['K0', 'K1', 'K2', 'K3'],})   
right

Unnamed: 0,C,D,key
0,C0,D0,K0
1,C1,D1,K1
2,C2,D2,K2
3,C3,D3,K3


In [38]:
pd.merge(left,right,how='left',on='key')

Unnamed: 0,key,A,B,C,D
0,K0,A0,B0,C0,D0
1,K1,A1,B1,C1,D1
2,K2,A2,B2,C2,D2
3,K3,A3,B3,C3,D3


In [39]:
pd.merge(left,right,how='left',left_on='key', right_on='key',indicator = 'key_merge')  # if the join key names are not the same

Unnamed: 0,key,A,B,C,D,key_merge
0,K0,A0,B0,C0,D0,both
1,K1,A1,B1,C1,D1,both
2,K2,A2,B2,C2,D2,both
3,K3,A3,B3,C3,D3,both


![InnerJoin.png](attachment:InnerJoin.png)			
				
				

In [40]:
pd.merge(left,right,how='inner',on='key')  # inner by default

Unnamed: 0,key,A,B,C,D
0,K0,A0,B0,C0,D0
1,K1,A1,B1,C1,D1
2,K2,A2,B2,C2,D2
3,K3,A3,B3,C3,D3


Or to show a more complicated example:

In [41]:
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})
left

Unnamed: 0,key1,key2,A,B
0,K0,K0,A0,B0
1,K0,K1,A1,B1
2,K1,K0,A2,B2
3,K2,K1,A3,B3


![RightJoin.png](attachment:RightJoin.png)

In [42]:
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']})
right

Unnamed: 0,key1,key2,C,D
0,K0,K0,C0,D0
1,K1,K0,C1,D1
2,K1,K0,C2,D2
3,K2,K0,C3,D3


In [43]:
pd.merge(left, right, on=['key1', 'key2'])

Unnamed: 0,key1,key2,A,B,C,D
0,K0,K0,A0,B0,C0,D0
1,K1,K0,A2,B2,C1,D1
2,K1,K0,A2,B2,C2,D2


In [44]:
pd.merge(left, right, how='inner', on=['key1', 'key2'])

Unnamed: 0,key1,key2,A,B,C,D
0,K0,K0,A0,B0,C0,D0
1,K1,K0,A2,B2,C1,D1
2,K1,K0,A2,B2,C2,D2


In [45]:
pd.merge(left, right, how='right', on=['key1', 'key2'])

Unnamed: 0,key1,key2,A,B,C,D
0,K0,K0,A0,B0,C0,D0
1,K1,K0,A2,B2,C1,D1
2,K1,K0,A2,B2,C2,D2
3,K2,K0,,,C3,D3


In [46]:
pd.merge(left, right, how='left', on=['key1', 'key2'])

Unnamed: 0,key1,key2,A,B,C,D
0,K0,K0,A0,B0,C0,D0
1,K0,K1,A1,B1,,
2,K1,K0,A2,B2,C1,D1
3,K1,K0,A2,B2,C2,D2
4,K2,K1,A3,B3,,


In [47]:
left.merge(right, how='left', on =['key1','key2'])

Unnamed: 0,key1,key2,A,B,C,D
0,K0,K0,A0,B0,C0,D0
1,K0,K1,A1,B1,,
2,K1,K0,A2,B2,C1,D1
3,K1,K0,A2,B2,C2,D2
4,K2,K1,A3,B3,,


## 3. Joining: join() function
Joining is a convenient method for combining the columns of two potentially differently-indexed DataFrames into a single result DataFrame.

In [48]:
left1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                      'B': ['B0', 'B1', 'B2']},
                      index=['K0', 'K1', 'K2']) 
left1

Unnamed: 0,A,B
K0,A0,B0
K1,A1,B1
K2,A2,B2


In [49]:
right2 = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
                       'D': ['D0', 'D2', 'D3']},
                      index=['K0', 'K2', 'K3'])
right2

Unnamed: 0,C,D
K0,C0,D0
K2,C2,D2
K3,C3,D3


In [50]:
left1.join(right2)

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K1,A1,B1,,
K2,A2,B2,C2,D2


In [51]:
left1.join(right2, how='right')

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K2,A2,B2,C2,D2
K3,,,C3,D3


In [52]:
left1.join(right2, how='outer')

Unnamed: 0,A,B,C,D
K0,A0,B0,C0,D0
K1,A1,B1,,
K2,A2,B2,C2,D2
K3,,,C3,D3


## 4. Append New Row to Your Dataframe

In [53]:
df1 = pd.DataFrame({'letter':['A','B','C','D'],
                    'number':[1,2,3,4]})
df2 = pd.DataFrame({'letter':['C','D','E','F'],
                    'number':[3,4,5,6]})

In [54]:
# Union with pd.concat()
df3 = pd.concat([df1,df2]).drop_duplicates().reset_index(drop=True)
df3

Unnamed: 0,letter,number
0,A,1
1,B,2
2,C,3
3,D,4
4,E,5
5,F,6


In [55]:
# Concatenate dataframes horizontally
df4 = pd.concat([df1,df2], axis=1)
df4

Unnamed: 0,letter,number,letter.1,number.1
0,A,1,C,3
1,B,2,D,4
2,C,3,E,5
3,D,4,F,6


In [56]:
new_row = pd.Series(['Z',26],index=df3.columns)
df3.append(new_row,ignore_index=True)

  df3.append(new_row,ignore_index=True)


Unnamed: 0,letter,number
0,A,1
1,B,2
2,C,3
3,D,4
4,E,5
5,F,6
6,Z,26


#### Note: The course materials are developed mainly based on personal experience and contributions from the Python learning community
Referred Books: 
- Learning Python, 5th Edition by Mark Lutz
- Python Data Science Handbook, Jake, VanderPlas
- Python for Data Analysis, Wes McKinney    

Copyright ©2023 Mei Najim. All rights reserved. 