# Merging DataFrames in Pandas

Pandas provides a powerful data manipulation tool called **merge** which allows you to combine two DataFrames based on a set of keys. This notebook will cover the different types of merges you can perform using Pandas, including inner, left, right, and full (outer) merges. We will also demonstrate merging using different sets of keys.

Merging DataFrames is a fundamental operation in data analysis and manipulation, and understanding how to use the merge function effectively is crucial for any data scientist or analyst.

In [2]:
import pandas as pd

## Creating Sample DataFrames

Let's create two sample DataFrames to demonstrate the different types of merges.

In [3]:
# Sample DataFrame 1
df1 = pd.DataFrame({
    'key1': ['A', 'B', 'C', 'D'],
    'value1': [1, 2, 3, 4]
})

# Sample DataFrame 2
df2 = pd.DataFrame({
    'key1': ['B', 'D', 'E', 'F'],
    'value2': [5, 6, 7, 8]
})

In [4]:
df1

Unnamed: 0,key1,value1
0,A,1
1,B,2
2,C,3
3,D,4


In [5]:
df2

Unnamed: 0,key1,value2
0,B,5
1,D,6
2,E,7
3,F,8


## Inner Merge

An inner merge returns only the rows with keys that are present in both DataFrames.

In [6]:
# Inner Merge
inner_merge = pd.merge(df1, df2, on='key1', how='inner')
inner_merge

Unnamed: 0,key1,value1,value2
0,B,2,5
1,D,4,6


## Left Merge

A left merge returns all the rows from the left DataFrame and the matching rows from the right DataFrame. If there is no match, the result will contain `NaN` for columns from the right DataFrame.

In [7]:
# Left Merge
left_merge = pd.merge(df1, df2, on='key1', how='left')
left_merge

Unnamed: 0,key1,value1,value2
0,A,1,
1,B,2,5.0
2,C,3,
3,D,4,6.0


## Right Merge

A right merge returns all the rows from the right DataFrame and the matching rows from the left DataFrame. If there is no match, the result will contain `NaN` for columns from the left DataFrame.

In [8]:
# Right Merge
right_merge = pd.merge(df1, df2, on='key1', how='right')
right_merge

Unnamed: 0,key1,value1,value2
0,B,2.0,5
1,D,4.0,6
2,E,,7
3,F,,8


## Full (Outer) Merge

A full merge returns all the rows when there is a match in either the left or right DataFrame. If there is no match, the result will contain `NaN` for the missing side.

In [9]:
# Full (Outer) Merge
outer_merge = pd.merge(df1, df2, on='key1', how='outer')
outer_merge

Unnamed: 0,key1,value1,value2
0,A,1.0,
1,B,2.0,5.0
2,C,3.0,
3,D,4.0,6.0
4,E,,7.0
5,F,,8.0


## Merging with Different Keys

You can also merge DataFrames using different sets of keys. Let's create two more sample DataFrames for this demonstration.

In [10]:
# Sample DataFrame 3
df3 = pd.DataFrame({
    'key1': ['A', 'B', 'C', 'D'],
    'key2': [1, 2, 3, 4],
    'value1': [10, 20, 30, 40]
})

# Sample DataFrame 4
df4 = pd.DataFrame({
    'key1': ['A', 'B', 'C', 'D'],
    'key2': [3, 4, 5, 6],
    'value2': [50, 60, 70, 80]
})

In [11]:
df3

Unnamed: 0,key1,key2,value1
0,A,1,10
1,B,2,20
2,C,3,30
3,D,4,40


In [12]:
df4

Unnamed: 0,key1,key2,value2
0,A,3,50
1,B,4,60
2,C,5,70
3,D,6,80


## Inner Merge on Multiple Keys

You can merge DataFrames on multiple keys by passing a list of keys to the `on` parameter.

In [13]:
# Inner Merge on Multiple Keys
inner_merge_multi = pd.merge(df3, df4, on=['key1', 'key2'], how='inner')
inner_merge_multi

Unnamed: 0,key1,key2,value1,value2


## Outer Merge on Multiple Keys

You can also perform an outer merge on multiple keys.

In [14]:
# Outer Merge on Multiple Keys
outer_merge_multi = pd.merge(df3, df4, on=['key1', 'key2'], how='outer')
outer_merge_multi

Unnamed: 0,key1,key2,value1,value2
0,A,1,10.0,
1,B,2,20.0,
2,C,3,30.0,
3,D,4,40.0,
4,A,3,,50.0
5,B,4,,60.0
6,C,5,,70.0
7,D,6,,80.0
