## Merging

We can merge the DataFrames using a common column. The merge() function of pandas is used for this.

## Step 1: Import the Required Library

- Import the pandas library:


In [None]:
import pandas as pd

## Step 2: Load the Datasets

- Load the __Merge1.csv__ and __Merge2.csv__ datasets using pandas


In [None]:
df1 = pd.read_csv('Merge1.csv')
df2 = pd.read_csv('Merge2.csv')

Let's see the DataFrame __df1__.

In [None]:
df1

Unnamed: 0,date,a,b,c
0,2022-01-01,9000,35,140
1,2022-01-02,8000,12,920
2,2022-01-03,69000,42,370
3,2022-01-04,81000,29,900
4,2022-01-05,91000,35,770
...,...,...,...,...
85,2022-03-27,37000,90,260
86,2022-03-28,24000,24,950
87,2022-03-29,44000,7,400
88,2022-03-30,31000,10,600


**Observation**

The DataFrame __df1__ contains four columns __date__, __a__, __b__ and __c__.

Let's see the dataframe **df2**.

In [None]:
df2

Unnamed: 0,date,a,b,c
0,2022-03-01,12000,50,60
1,2022-03-02,27000,90,990
2,2022-03-03,19000,17,750
3,2022-03-04,70000,60,900
4,2022-03-05,6000,7,810
5,2022-03-06,10000,98,730
6,2022-03-07,41000,21,710
7,2022-03-08,80000,76,200
8,2022-03-09,78000,46,200
9,2022-03-10,74000,67,50


**Observation**

- __df2__ has the same columns as __df1__.

- We can also see that in __df1__ the date starts from January 1st 2022 to March 31st, whereas __df2__ has data from March 1st to March 31st.



## Step 3: Merge DataFrames with Inner Join


Let's join the two DataFrames. By default, it is going to be an inner join.

There are two methods to do this. One is by using pd.merge() and the other is by using df.merge().

- Merge the DataFrames using the **date** column as the key:


In [None]:
pd.merge(df1,df2,on=['date'])

Unnamed: 0,date,a_x,b_x,c_x,a_y,b_y,c_y
0,2022-03-01,16000,29,360,12000,50,60
1,2022-03-02,56000,73,990,27000,90,990
2,2022-03-03,3000,99,650,19000,17,750
3,2022-03-04,36000,94,310,70000,60,900
4,2022-03-05,59000,17,600,6000,7,810
5,2022-03-06,35000,57,450,10000,98,730
6,2022-03-07,86000,98,130,41000,21,710
7,2022-03-08,69000,44,60,80000,76,200
8,2022-03-09,21000,64,930,78000,46,200
9,2022-03-10,45000,83,670,74000,67,50



**Observation**

Both the DataFrames are merged now, and there are 31 rows.



## Step 4: Merge the DataFrames with Different Types of Joins

Now let's outer join the DataFrames. For this, we need to add the parameter **how = 'outer'** in the merge() method.

Let's perform an outer join.


### Outer join

In [None]:
pd.merge(df1,df2,on=['date'],how='outer')

Unnamed: 0,date,a_x,b_x,c_x,a_y,b_y,c_y
0,2022-01-01,9000,35,140,,,
1,2022-01-02,8000,12,920,,,
2,2022-01-03,69000,42,370,,,
3,2022-01-04,81000,29,900,,,
4,2022-01-05,91000,35,770,,,
...,...,...,...,...,...,...,...
85,2022-03-27,37000,90,260,17000.0,89.0,230.0
86,2022-03-28,24000,24,950,97000.0,69.0,790.0
87,2022-03-29,44000,7,400,47000.0,54.0,970.0
88,2022-03-30,31000,10,600,72000.0,27.0,600.0


**Observation**

Here, we can see that the DataFrames are merged.

If there is no data available, it is represented as **NaN**.

### Left join

Now, let's perform a left join on the DataFrames.

In [None]:
pd.merge(df1,df2,on=['date'],how='left')

Unnamed: 0,date,a_x,b_x,c_x,a_y,b_y,c_y
0,2022-01-01,9000,35,140,,,
1,2022-01-02,8000,12,920,,,
2,2022-01-03,69000,42,370,,,
3,2022-01-04,81000,29,900,,,
4,2022-01-05,91000,35,770,,,
...,...,...,...,...,...,...,...
85,2022-03-27,37000,90,260,17000.0,89.0,230.0
86,2022-03-28,24000,24,950,97000.0,69.0,790.0
87,2022-03-29,44000,7,400,47000.0,54.0,970.0
88,2022-03-30,31000,10,600,72000.0,27.0,600.0


**Observation**

We can see that there are 90 rows. The rows with __NaN__ values indicate data that was not present in the right DataFrame __df2__.

### Right join

Let's now try a right join.

In [None]:
pd.merge(df1,df2,on=['date'],how='right')

Unnamed: 0,date,a_x,b_x,c_x,a_y,b_y,c_y
0,2022-03-01,16000,29,360,12000,50,60
1,2022-03-02,56000,73,990,27000,90,990
2,2022-03-03,3000,99,650,19000,17,750
3,2022-03-04,36000,94,310,70000,60,900
4,2022-03-05,59000,17,600,6000,7,810
5,2022-03-06,35000,57,450,10000,98,730
6,2022-03-07,86000,98,130,41000,21,710
7,2022-03-08,69000,44,60,80000,76,200
8,2022-03-09,21000,64,930,78000,46,200
9,2022-03-10,45000,83,670,74000,67,50


**Observation**

Here, we can see that the DataFrames are merged using right join.

Now, there are no __NaN__ values.

## Step 5: Rename a Column in the DataFrame

We were able to merge using the on keyword because the date column is present in both DataFrames. Let's modify __df1__ and change the column.

- Rename the __date__ column in __df1__ to __Date__:


In [None]:
df1.rename(columns={'date':'Date'},inplace=True)

In [None]:
df1

Unnamed: 0,Date,a,b,c
0,2022-01-01,9000,35,140
1,2022-01-02,8000,12,920
2,2022-01-03,69000,42,370
3,2022-01-04,81000,29,900
4,2022-01-05,91000,35,770
...,...,...,...,...
85,2022-03-27,37000,90,260
86,2022-03-28,24000,24,950
87,2022-03-29,44000,7,400
88,2022-03-30,31000,10,600


**Observation**

The index of __df1__ has been changed.

In [None]:
pd.merge(df1,df2,on=['date'])

KeyError: ignored

**Observation**

It threw an error since **df1** has a different column name now. 

## Step 6: Merge DataFrames Using left_on and right_on

- Merge the DataFrames using the __Date__ column from **df1** and the __date__ column from **df2**.


In [None]:
pd.merge(df1,df2,left_on=['Date'],right_on=['date'])

Unnamed: 0,Date,a_x,b_x,c_x,date,a_y,b_y,c_y
0,2022-03-01,16000,29,360,2022-03-01,12000,50,60
1,2022-03-02,56000,73,990,2022-03-02,27000,90,990
2,2022-03-03,3000,99,650,2022-03-03,19000,17,750
3,2022-03-04,36000,94,310,2022-03-04,70000,60,900
4,2022-03-05,59000,17,600,2022-03-05,6000,7,810
5,2022-03-06,35000,57,450,2022-03-06,10000,98,730
6,2022-03-07,86000,98,130,2022-03-07,41000,21,710
7,2022-03-08,69000,44,60,2022-03-08,80000,76,200
8,2022-03-09,21000,64,930,2022-03-09,78000,46,200
9,2022-03-10,45000,83,670,2022-03-10,74000,67,50


**Observation**

Here, we can see that the DataFrames have been merged using the __date__ and __Date__ columns.

## Step 7: Set Indexes for Both DataFrames

We can merge two DataFrames using indexes.

- Set the __Date__ column as the index for **df1** and the __date__ column as index for **df2**:


In [None]:
df1.set_index(['Date'],inplace=True)
df2.set_index(['date'],inplace=True)

## Step 8: Merge the DataFrames Using left_index and right_index

- Merge the DataFrames using the indexes

- Set **left_index = True** and **right_index = True** 


In [None]:
pd.merge(df1,df2,left_index=True,right_index=True)

**Observation**

We can observe that even though the date column wasn't the same, they were still merged because we gave the left index and the right index as equal to **True**.


## Step 9: Merge DataFrames with Suffixes

Sometimes we need different suffixes for left and right. This can be done by passing the parameter suffixes.
We need to pass a tuple where the first value indicates the left DataFrame and the second one indicates the right DataFrame.

- Merge the DataFrames using the indexes, and add suffixes to the columns:


In [None]:
pd.merge(df1,df2,left_index=True,right_index=True,suffixes=('_from_left','_from_right'))

**Observation**

We can see here that the column has suffixes.