## 5. Common Real-World Use Cases

In [1]:
import pandas as pd

### 1. 📋 **Combining User Demographics with Activity Logs**

* **Goal**: Enrich logs with user details.
* **Tools**: `pd.merge()`
* **Example**:


In [2]:
# user profile
df_users = pd.DataFrame({
    'user_id': [101, 102, 103],
    'location': ['Delhi', 'Mumbai', 'Chennai']
})

# login activity
df_logs = pd.DataFrame({
    'user_id': [101, 102, 101, 103],
    'login_time': ['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04']
})

display(df_users, df_logs)

Unnamed: 0,user_id,location
0,101,Delhi
1,102,Mumbai
2,103,Chennai


Unnamed: 0,user_id,login_time
0,101,2024-01-01
1,102,2024-01-02
2,101,2024-01-03
3,103,2024-01-04


In [3]:
pd.merge(df_logs, df_users, on='user_id', how='left')

Unnamed: 0,user_id,login_time,location
0,101,2024-01-01,Delhi
1,102,2024-01-02,Mumbai
2,101,2024-01-03,Delhi
3,103,2024-01-04,Chennai


📌 **Use case**: User behavior modeling, churn analysis, personalization in apps.

### 2. 📊 **Appending Monthly Sales Data**

* **Goal**: Stack multiple monthly sales reports into a single DataFrame.
* **Tools**: `pd.concat()`
* **Example**:

In [4]:
jan_sales = pd.DataFrame({'product': ['A', 'B'], 'sales': [100, 150]})
feb_sales = pd.DataFrame({'product': ['A', 'B'], 'sales': [110, 170]})

display(jan_sales, feb_sales)

Unnamed: 0,product,sales
0,A,100
1,B,150


Unnamed: 0,product,sales
0,A,110
1,B,170


In [7]:
pd.concat([jan_sales, feb_sales], ignore_index=True)

Unnamed: 0,product,sales
0,A,100
1,B,150
2,A,110
3,B,170


In [10]:
pd.merge(jan_sales, feb_sales, on='product', suffixes=['_jan', '_feb'])

Unnamed: 0,product,sales_jan,sales_feb
0,A,100,110
1,B,150,170


📌 **Use case**: Time-series forecasting, trend analysis.

### 3. 🔗 **Joining Reference Tables**

* **Goal**: Add categorical labels or descriptions from reference tables.
* **Tools**: `df.join()` or `pd.merge()` depending on index vs column key
* **Example**:

In [11]:
df_codes = pd.DataFrame({'code': ['E1', 'E2'], 'desc': ['Error 1', 'Error 2']})
df_logs = pd.DataFrame({'code': ['E1', 'E2', 'E1'], 'ts': ['2024-01-01', '2024-01-02', '2024-01-03']})

display(df_codes, df_logs)

Unnamed: 0,code,desc
0,E1,Error 1
1,E2,Error 2


Unnamed: 0,code,ts
0,E1,2024-01-01
1,E2,2024-01-02
2,E1,2024-01-03


In [None]:
# merge on 'code'
pd.merge(df_logs, df_codes, on='code')

Unnamed: 0,code,ts,desc
0,E1,2024-01-01,Error 1
1,E2,2024-01-02,Error 2
2,E1,2024-01-03,Error 1


📌 **Use case**: Mapping status codes, error labels, region codes to full descriptions.

### 4. 🧮 **Aligning Data from Multiple Sources**

* **Goal**: Match datasets with similar but non-identical indexes
* **Tools**: `df.align()`, `combine_first()`, arithmetic with alignment
* **Example**:

In [18]:
df1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
df2 = pd.Series([10, 20], index=['b', 'c'])

display(df1, df2)

a    1
b    2
c    3
dtype: int64

b    10
c    20
dtype: int64

In [19]:
aligned1, aligned2 = df1.align(df2)

display(aligned1, aligned2)

a    1
b    2
c    3
dtype: int64

a     NaN
b    10.0
c    20.0
dtype: float64

📌 **Use case**: Financial ratios, multi-source time series, combining historical vs real-time feeds.

### 5. 🧱 **Joining with Multi-Level (Hierarchical) Data**

* **Goal**: Join across multiple levels like region, store, department
* **Tools**: `pd.merge()` with `MultiIndex`, `keys` in `concat`
* **Example**:

In [20]:
df_a = pd.DataFrame({'region': ['East', 'West'], 'sales': [100, 200]})
df_b = pd.DataFrame({'region': ['East', 'West'], 'manager': ['John', 'Alice']})

display(df_a, df_b)

Unnamed: 0,region,sales
0,East,100
1,West,200


Unnamed: 0,region,manager
0,East,John
1,West,Alice


In [21]:
pd.merge(df_a, df_b, on='region')

Unnamed: 0,region,sales,manager
0,East,100,John
1,West,200,Alice


📌 **Use case**: Building regional dashboards, organizational KPIs.

### 6. 🏗️ **Combining Feature Sets Before Modeling**

* **Goal**: Merge different features from different engineering pipelines
* **Tools**: `pd.merge()` or `df.join()`, `axis=1` in `concat()`
* **Example**:

In [22]:
text_features = pd.DataFrame({'user_id': [1, 2], 'sentiment_score': [0.8, 0.5]})
activity_features = pd.DataFrame({'user_id': [1, 2], 'click_rate': [3.5, 4.1]})

display(text_features, activity_features)

Unnamed: 0,user_id,sentiment_score
0,1,0.8
1,2,0.5


Unnamed: 0,user_id,click_rate
0,1,3.5
1,2,4.1


📌 **Use case**: Final dataset for ML modeling or model training pipelines.

In [23]:
pd.merge(text_features, activity_features, on='user_id')

Unnamed: 0,user_id,sentiment_score,click_rate
0,1,0.8,3.5
1,2,0.5,4.1


### 7. 📁 **Merging Data Across Files or APIs**

* **Goal**: Combine multiple sources into one dataset
* **Techniques**:

  * Read all files with `pd.read_csv()` in loop
  * Append with `concat`
  * Merge on unique identifiers

📌 **Use case**: ETL pipelines, analytics dashboards, customer 360° view.


### 8. 📦 **Creating Audit Logs with Merge Indicators**

* **Goal**: Track what matched or didn’t during merges
* **Tools**: `pd.merge(..., indicator=True)`
* **Example**:


📌 **Use case**: Data reconciliation, consistency checks across systems.


In [24]:
df1 = pd.DataFrame({'id': [1, 2]})
df2 = pd.DataFrame({'id': [2, 3]})
display(df1, df2)

Unnamed: 0,id
0,1
1,2


Unnamed: 0,id
0,2
1,3


In [25]:
pd.merge(df1, df2, on='id', how='outer', indicator=True)

Unnamed: 0,id,_merge
0,1,left_only
1,2,both
2,3,right_only


## 🧠 Summary

| Tool Used                   | Scenario                               | Method                  |
| --------------------------- | -------------------------------------- | ----------------------- |
| `pd.concat`                 | Appending data vertically/horizontally | Time-series, reports    |
| `pd.merge`                  | SQL-like joins on columns              | Enriching datasets      |
| `df.join()`                 | Index-based joins                      | Simpler syntax for keys |
| `align()`                   | Aligning data with different indexes   | Finance, series ops     |
| `combine_first()`           | Filling data from fallback sources     | Data repair pipelines   |
| `merge(... indicator=True)` | Auditing join results                  | Validation/auditing     |


<center><b>Thanks</b></center>