# Day 15 - Merging and Joining DataFrames


### Why Merging and Joining DataFrames is Important

In data science, combining datasets from different sources is a common task. Whether you're merging customer data with sales transactions or integrating data from multiple experiments, the ability to merge and join DataFrames efficiently is vital. These operations allow you to create a unified dataset that can be analyzed holistically, revealing insights that might not be apparent when the data is siloed.


### Tutorial: Combining Multiple DataFrames

In [None]:
!pip install pandas

#### Merging DataFrames with `merge()`

In [None]:

import pandas as pd

# Example DataFrames
customers = pd.DataFrame({
    'CustomerID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'David']
})

transactions = pd.DataFrame({
    'TransactionID': [101, 102, 103, 104],
    'CustomerID': [1, 2, 2, 4],
    'Amount': [250, 150, 200, 300]
})

# Merging on 'CustomerID'
merged_df = pd.merge(customers, transactions, on='CustomerID')

print("Merged DataFrame:")
print(merged_df)


#### Joining DataFrames with `join()`

In [None]:

# Setting 'CustomerID' as the index
customers.set_index('CustomerID', inplace=True)
transactions.set_index('CustomerID', inplace=True)

# Joining the DataFrames
joined_df = customers.join(transactions)

print("\nJoined DataFrame:")
print(joined_df)



### Use Case: Merging Customer Data with Transaction Records

Let’s apply these concepts to a real-life scenario: merging customer data with transaction records. This is a common task in business analytics, where combining data from different departments (e.g., marketing and sales) provides a comprehensive view of customer behavior.


#### Step 1: Loading the Datasets

In [None]:

# Loading the customer data
customers_df = pd.DataFrame({
    'CustomerID': [1, 2, 3, 4],
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Email': ['alice@example.com', 'bob@example.com', 'charlie@example.com', 'david@example.com']
})

# Loading the transaction data
transactions_df = pd.DataFrame({
    'TransactionID': [101, 102, 103, 104],
    'CustomerID': [1, 2, 2, 4],
    'Amount': [250, 150, 200, 300],
    'Date': ['2023-08-01', '2023-08-02', '2023-08-03', '2023-08-04']
})

# Display the first few rows of each DataFrame
print("Customers DataFrame:")
print(customers_df.head())

print("\nTransactions DataFrame:")
print(transactions_df.head())


#### Step 2: Merging the Datasets

In [None]:

# Merging the DataFrames on 'CustomerID'
merged_data = pd.merge(customers_df, transactions_df, on='CustomerID')

# Display the merged DataFrame
print("\nMerged Customer and Transaction Data:")
print(merged_data.head())


#### Step 3: Analyzing the Merged Data

In [None]:

# Calculating total spending per customer
total_spending = merged_data.groupby('Name')['Amount'].sum().reset_index()

print("\nTotal Spending per Customer:")
print(total_spending)
