# **`Data Science Learners Hub`**

**Module : Python**

**email** : [datasciencelearnershub@gmail.com](mailto:datasciencelearnershub@gmail.com)

## **`#4: Advanced Data Manipulation`**
10. **Merging and Concatenating DataFrames**
    - Combining DataFrames
    - Concatenation and merging operations

11. **Reshaping Data**
    - Pivoting and melting
    - Stacking and unstacking

12. **Time Series Data**
    - Handling time and date data
    - Resampling and frequency conversion

### **`10. Merging and Concatenating DataFrames`**

#### **`Combining DataFrames in Pandas`**

#### Concept of Combining DataFrames:

Combining or merging DataFrames in Pandas involves bringing together information from two or more DataFrames based on a common key or index. This is particularly useful when dealing with related datasets or when you want to integrate information from multiple sources.

#### Scenarios for DataFrame Combination:

1. **Data Integration:**
   - Combine datasets with shared information to create a unified view.

2. **Relational Databases:**
   - Mimic relational database joins for complex data relationships.

3. **Time Series Alignment:**
   - Align datasets based on time indices for time series analysis.

4. **Handling Missing Data:**
   - Fill in missing information by combining datasets with complementary information.

#### Types of Merges:

##### 1. Inner Merge:

In [1]:
import pandas as pd

# Sample DataFrames
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['laxman', 'harshita', 'naina']})
df2 = pd.DataFrame({'ID': [2, 3, 4], 'Salary': [60000, 45000, 70000]})

# Inner Merge on 'ID'
merged_inner = pd.merge(df1, df2, on='ID', how='inner')

# Displaying the merged DataFrame
print("Inner Merge Result:")
print(merged_inner)

# Result : Only rows with common 'ID' values in both DataFrames are retained.


Inner Merge Result:
   ID      Name  Salary
0   2  harshita   60000
1   3     naina   45000


##### 2. Outer Merge:

In [2]:
# Outer Merge on 'ID'
merged_outer = pd.merge(df1, df2, on='ID', how='outer')

# Displaying the merged DataFrame
print("\nOuter Merge Result:")
print(merged_outer)

# Result : All rows from both DataFrames are included. NaN is used for missing values.


Outer Merge Result:
   ID      Name   Salary
0   1    laxman      NaN
1   2  harshita  60000.0
2   3     naina  45000.0
3   4       NaN  70000.0


##### 3. Left Merge:

In [3]:
# Left Merge on 'ID'
merged_left = pd.merge(df1, df2, on='ID', how='left')

# Displaying the merged DataFrame
print("\nLeft Merge Result:")
print(merged_left)

# Result : All rows from the left DataFrame (df1) are retained. NaN for missing values in the right DataFrame.



Left Merge Result:
   ID      Name   Salary
0   1    laxman      NaN
1   2  harshita  60000.0
2   3     naina  45000.0


##### 4. Right Merge:

In [4]:
# Right Merge on 'ID'
merged_right = pd.merge(df1, df2, on='ID', how='right')

# Displaying the merged DataFrame
print("\nRight Merge Result:")
print(merged_right)

# Result : All rows from the right DataFrame (df2) are retained. NaN for missing values in the left DataFrame.


Right Merge Result:
   ID      Name  Salary
0   2  harshita   60000
1   3     naina   45000
2   4       NaN   70000


#### Implications of Merge Types:

- **Inner Merge:**
  - Retains only rows with matching keys in both DataFrames.

- **Outer Merge:**
  - Retains all rows from both DataFrames, filling in missing values with NaN.

- **Left Merge:**
  - Retains all rows from the left DataFrame, filling in missing values with NaN.

- **Right Merge:**
  - Retains all rows from the right DataFrame, filling in missing values with NaN.

#### Considerations:

- **Key Column(s):**
  - Specify the key column(s) on which to merge the DataFrames.

- **Duplicate Keys:**
  - Be cautious about duplicate keys; they can result in unexpected behavior.

- **Multiple Key Columns:**
  - Merge on multiple columns for more complex relationships.

#### Tips:

- **Suffixes:**
  - Use `suffixes` parameter to differentiate columns with the same name in the merged DataFrames.

- **Index-Based Merge:**
  - Merge based on indices using `left_index` and `right_index` parameters.

Merging DataFrames is a crucial aspect of data manipulation in Pandas, enabling the combination of information from diverse sources. Understanding the types of merges and their implications empowers efficient data integration and analysis.

#### **`Concatenation and Merging Operations in Pandas`**

#### Concatenation with `concat()`:

Concatenation in Pandas involves combining DataFrames either vertically or horizontally.

##### 1. Vertical Concatenation:

In [5]:
import pandas as pd

# Sample DataFrames
df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1']})
df2 = pd.DataFrame({'A': ['A2', 'A3'], 'B': ['B2', 'B3']})

# Vertical Concatenation
concatenated_vertical = pd.concat([df1, df2])

# Displaying the concatenated DataFrame
print("Vertical Concatenation Result:")
print(concatenated_vertical)

# Result : Rows from both the datframe are stacked vertically

Vertical Concatenation Result:
    A   B
0  A0  B0
1  A1  B1
0  A2  B2
1  A3  B3


***Explanation:***

- The above code demonstrates how to perform vertical concatenation using the `pd.concat` function in pandas. Vertical concatenation is the process of stacking DataFrames on top of each other along the rows (axis 0). This is the default for concat()

- Here's a breakdown of the code:

***Vertical Concatenation:***
```python
concatenated_vertical = pd.concat([df1, df2])
```
- The `pd.concat` function is used to concatenate the DataFrames `df1` and `df2` vertically. The argument `[df1, df2]` specifies a list of DataFrames to be concatenated. The result (`concatenated_vertical`) is a new DataFrame where the rows from `df2` are stacked below the rows from `df1`.


- The final DataFrame (`concatenated_vertical`) has four rows, with the rows from `df2` appearing below the rows from `df1`.



##### 2. Horizontal Concatenation:

In [6]:
# Sample DataFrames
df3 = pd.DataFrame({'C': ['C0', 'C1'], 'D': ['D0', 'D1']})

# Horizontal Concatenation
concatenated_horizontal = pd.concat([df1, df3], axis=1)

# Displaying the concatenated DataFrame
print("\nHorizontal Concatenation Result:")
print(concatenated_horizontal)

# Result : Columns from both DataFrames are joined horizontally.


Horizontal Concatenation Result:
    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1


#### Merging with `merge()`:

The `merge()` function combines DataFrames based on specified columns.

In [7]:
# Sample DataFrames
df4 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df5 = pd.DataFrame({'ID': [2, 3], 'Salary': [60000, 45000]})

# Merging on 'ID'
merged_result = pd.merge(df4, df5, on='ID', how='inner')

# Displaying the merged DataFrame
print("\nMerge Result:")
print(merged_result)

# Result : Inner merge on 'ID' retains only rows with common 'ID' values.


Merge Result:
   ID Name  Salary
0   2  Bob   60000


#### Merging Parameters:

- **`how`:**
  - Specifies the type of merge (e.g., 'inner', 'outer', 'left', 'right').

- **`on`:**
  - Specifies the key column(s) for merging.

- **`suffixes`:**
  - Appends suffixes to duplicate column names in case of overlap.

#### Considerations:

- **Common Key Columns:**
  - Ensure the key columns have the same name and contain common values.

- **Duplicate Columns:**
  - Be cautious about duplicate columns; use `suffixes` to handle them.

#### Tips:

- **Multiple Key Columns:**
  - Merge on multiple columns for complex relationships using a list in the `on` parameter.

- **Index-Based Merge:**
  - Merge based on indices using `left_index` and `right_index` parameters.

Combining DataFrames using `concat()` and `merge()` provides flexibility in managing and integrating data. Understanding these functions and their parameters allows for efficient data manipulation and analysis in various scenarios.
