# Pandas Concatenation and Merging Tutorial

In this tutorial, we will cover various aspects of concatenating and merging dataframes in Pandas library of Python. We will go through the following topics:

1. What is Inner Join, Outer Join, Left Join, Right Join?
2. Pandas Merge Command
3. Suffixes Attribute within Merge Command
4. Concatenation of DataFrames
5. Verifying Integrity

Let's get started!


## 1. What is Inner Join, Outer Join, Left Join, Right Join?

In database terminology, different types of joins are used to combine data from two or more tables based on a related column between them.

- **Inner Join**: Returns only the rows where there is a match in both tables.
- **Outer Join (Full Outer Join)**: Returns all rows when there is a match in either table.
- **Left Join**: Returns all rows from the left table, and the matched rows from the right table. Returns NaN for rows where there is no match.
- **Right Join**: Returns all rows from the right table, and the matched rows from the left table. Returns NaN for rows where there is no match.

We will illustrate these with examples.


## 2. Pandas Merge Command

Pandas provides a `merge()` function to perform database-style joins on DataFrame objects.
Syntax: `pd.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, suffixes=('_x', '_y'), validate=None)`

We'll demonstrate the usage of this command with examples.


## 3. Suffixes Attribute within Merge Command

The `suffixes` attribute allows us to specify custom suffixes to use for overlapping column names when columns are not merge keys.
This helps to distinguish columns with the same name in the resulting DataFrame.

We'll see how to use this attribute with examples.


## 4. Concatenation of DataFrames

Concatenation is the process of combining two or more dataframes either along rows (vertically) or columns (horizontally).

We'll illustrate both types of concatenation with examples.


## 5. Verifying Integrity

Pandas provides some methods to verify the integrity of data after performing concatenation or merging.
We'll see how to do this with examples.


## Dataset

Let's create a sample dataset that we'll use throughout this tutorial.
We'll create two dataframes `df1` and `df2`.

In [45]:
import pandas as pd

# Creating the sample datasets
data1 = {
    'ID': [1, 2, 3, 4],
    'Name': ['Charlie', 'David', 'Alice', 'Bob'],
    'City': ['Berlin', 'Toronto', 'Duckburg', 'Paperopoli']
    
}
data2 = {
    'ID': [3, 4, 6, 5],
    'Age': [25, 30, 35, 40],
    'Name': ['Alice', 'Bob', 'Daviddd', 'Charlie'],
}

# Creating DataFrames
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# Displaying the datasets
df1

Unnamed: 0,ID,Name,City
0,1,Charlie,Berlin
1,2,David,Toronto
2,3,Alice,Duckburg
3,4,Bob,Paperopoli


In [46]:
df2

Unnamed: 0,ID,Age,Name
0,3,25,Alice
1,4,30,Bob
2,6,35,Daviddd
3,5,40,Charlie


## 1. What is Inner Join, Outer Join, Left Join, Right Join?

Let's illustrate each type of join using the datasets `df1` and `df2`.


In [47]:
pd.merge(df1, df2, on='ID', how='inner')

Unnamed: 0,ID,Name_x,City,Age,Name_y
0,3,Alice,Duckburg,25,Alice
1,4,Bob,Paperopoli,30,Bob


In [48]:
pd.merge(df1, df2, on='ID', how='outer')

Unnamed: 0,ID,Name_x,City,Age,Name_y
0,1,Charlie,Berlin,,
1,2,David,Toronto,,
2,3,Alice,Duckburg,25.0,Alice
3,4,Bob,Paperopoli,30.0,Bob
4,5,,,40.0,Charlie
5,6,,,35.0,Daviddd


In [49]:
pd.merge(df1, df2, on='ID', how='left')

Unnamed: 0,ID,Name_x,City,Age,Name_y
0,1,Charlie,Berlin,,
1,2,David,Toronto,,
2,3,Alice,Duckburg,25.0,Alice
3,4,Bob,Paperopoli,30.0,Bob


In [50]:
pd.merge(df1, df2, on='ID', how='right')

Unnamed: 0,ID,Name_x,City,Age,Name_y
0,3,Alice,Duckburg,25,Alice
1,4,Bob,Paperopoli,30,Bob
2,6,,,35,Daviddd
3,5,,,40,Charlie


## 2. Pandas Merge Command

Let's demonstrate the usage of the merge command with examples.


In [51]:
merged_df = pd.merge(df1, df2, left_on='ID', right_on='ID', how='outer')
merged_df

Unnamed: 0,ID,Name_x,City,Age,Name_y
0,1,Charlie,Berlin,,
1,2,David,Toronto,,
2,3,Alice,Duckburg,25.0,Alice
3,4,Bob,Paperopoli,30.0,Bob
4,5,,,40.0,Charlie
5,6,,,35.0,Daviddd


## 3. Suffixes Attribute within Merge Command

Let's see how to use the `suffixes` attribute with examples.


In [52]:
# Merging with custom suffixes
custom_suffix_merge = pd.merge(df1, df2, on='ID', how='outer', suffixes=('_left', '_right'))
custom_suffix_merge


Unnamed: 0,ID,Name_left,City,Age,Name_right
0,1,Charlie,Berlin,,
1,2,David,Toronto,,
2,3,Alice,Duckburg,25.0,Alice
3,4,Bob,Paperopoli,30.0,Bob
4,5,,,40.0,Charlie
5,6,,,35.0,Daviddd


## 4. Concatenation of DataFrames

Let's illustrate concatenation of dataframes, both vertically and horizontally.


In [53]:


# Vertical Concatenation
concatenated_vertical = pd.concat([df1, df2], ignore_index=True)
concatenated_vertical



Unnamed: 0,ID,Name,City,Age
0,1,Charlie,Berlin,
1,2,David,Toronto,
2,3,Alice,Duckburg,
3,4,Bob,Paperopoli,
4,3,Alice,,25.0
5,4,Bob,,30.0
6,6,Daviddd,,35.0
7,5,Charlie,,40.0


## 5. Verifying Integrity

After performing concatenation or merging, it's important to verify the integrity of the resulting dataframe.
We'll illustrate this with an example.


In [54]:
# Horizontal Concatenation
concatenated_horizontal = pd.concat([df1, df2], axis=1)
concatenated_horizontal

Unnamed: 0,ID,Name,City,ID.1,Age,Name.1
0,1,Charlie,Berlin,3,25,Alice
1,2,David,Toronto,4,30,Bob
2,3,Alice,Duckburg,6,35,Daviddd
3,4,Bob,Paperopoli,5,40,Charlie
