# **7. Merging, Joining & Concatenation**

## **3. `df.join()` – Index-based Joining**

In [2]:
import pandas as pd

**`df.join()` – Index-based Joining**, which is a more **concise and readable** alternative to `pd.merge()`—especially useful when you are dealing with **DataFrames that are aligned on their indexes**.

## ✅ 1. What `df.join()` Does and When to Use It

### 🔹 What It Does:

`df.join()` **joins columns of another DataFrame to the calling DataFrame** using:

* Index by default
* Or optionally specified key columns

It is **similar to `merge()`**, but is more streamlined when:

* You're working with **index-based joins**
* You don’t need complex joins involving multiple columns


### 🔹 When to Use:

* When both DataFrames share the **same index** (or one can be set as index).
* When you want a **quick left join** on index with readable code.
* Common in feature enrichment or during data preparation.

> ✅ Preferred over `pd.merge()` when simplicity and speed are needed for index-based operations.


## 🧾 2. Syntax and Core Parameters

```python
df.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)
```

### 🔹 Core Parameters:

| Parameter            | Description                                                       |
| -------------------- | ----------------------------------------------------------------- |
| `other`              | The DataFrame to join with                                        |
| `on`                 | Column(s) in `df` to use as join key (optional)                   |
| `how`                | Type of join: `'left'` (default), `'right'`, `'inner'`, `'outer'` |
| `lsuffix`, `rsuffix` | Suffixes to disambiguate overlapping column names                 |
| `sort`               | Whether to sort the result DataFrame by the join keys             |


## 🧠 3. Different Methods & Techniques

### 🔸 A. Basic Left Join on Index (default behavior)

```python
df1.join(df2)
```

* Joins `df2` to `df1` using their index.
* This is a **left join** by default (keeps all rows of `df1`).

---

### 🔸 B. Inner/Outer/Right Join

```python
df1.join(df2, how='inner')
df1.join(df2, how='outer')
df1.join(df2, how='right')
```

* Supports full SQL-style join options, with index alignment.

---

### 🔸 C. Joining on a Key Column (instead of index)

```python
df1.join(df2, on='key_column')
```

* Joins using `df1['key_column']` and `df2.index`.

---

### 🔸 D. Resolving Overlapping Columns

```python
df1.join(df2, lsuffix='_left', rsuffix='_right')
```

* Prevents conflicts when both have columns with the same name.


## ⚠️ 4. Common Pitfalls and Best Practices

| Pitfall                                    | Best Practice                                                |
| ------------------------------------------ | ------------------------------------------------------------ |
| ❌ Index mismatch causing many NaNs         | ✅ Set appropriate indexes before joining                     |
| ❌ Overlapping column names overwritten     | ✅ Use `lsuffix`, `rsuffix` to disambiguate                   |
| ❌ Confusing `df.join()` with `pd.merge()`  | ✅ Use `join()` only when merging by index or a simple column |
| ❌ Joining on a column when you meant index | ✅ Verify `df.index` before joining                           |
| ❌ Assuming inner join by default           | ✅ Remember that default join is `'left'`, unlike `merge`     |


## 🧪 5. Examples on Real/Pseudo Data

### ✅ Example 1: Basic Join on Index

In [3]:
df1 = pd.DataFrame({'Name': ['Alice', 'Bob']}, index=['a', 'b'])
df2 = pd.DataFrame({'Age': [25, 30]}, index=['a', 'b'])

display(df1, df2)

Unnamed: 0,Name
a,Alice
b,Bob


Unnamed: 0,Age
a,25
b,30


In [4]:
df1.join(df2)

Unnamed: 0,Name,Age
a,Alice,25
b,Bob,30


### ✅ Example 2: Join with Mismatched Index (Outer Join)

In [5]:
df3 = pd.DataFrame({'Age': [28, 32]}, index=['a', 'c'])
df3

Unnamed: 0,Age
a,28
c,32


In [6]:
df1.join(df3)

Unnamed: 0,Name,Age
a,Alice,28.0
b,Bob,


In [7]:
df1.join(df3, how='outer')

Unnamed: 0,Name,Age
a,Alice,28.0
b,Bob,
c,,32.0


### ✅ Example 3: Join on a Key Column

In [9]:
df1 = pd.DataFrame({'ID': [101, 102], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'Score': [85, 90]}, index=[101, 102])

display(df1, df2)

Unnamed: 0,ID,Name
0,101,Alice
1,102,Bob


Unnamed: 0,Score
101,85
102,90


In [11]:
df1.join(df2, on='ID')

Unnamed: 0,ID,Name,Score
0,101,Alice,85
1,102,Bob,90


### ✅ Example 4: Handling Overlapping Column Names

In [12]:
df1 = pd.DataFrame({'Value': [10, 20]}, index=['x', 'y'])
df2 = pd.DataFrame({'Value': [30, 40]}, index=['x', 'y'])

display(df1, df2)

Unnamed: 0,Value
x,10
y,20


Unnamed: 0,Value
x,30
y,40


In [13]:
df1.join(df2, lsuffix='_left', rsuffix='_right')

Unnamed: 0,Value_left,Value_right
x,10,30
y,20,40


## 🌍 6. Real World Use Cases

| Use Case                                            | Description                                                  |
| --------------------------------------------------- | ------------------------------------------------------------ |
| 🧾 **Add metadata to a dataset**                    | Join user demographics or product details using index        |
| 🔍 **Combine features from different computations** | Enrich main dataset with engineered features stored by index |
| 📊 **Join lookup tables or mappings**               | E.g., join regional info based on store codes                |
| 🧠 **Model pipeline inputs**                        | Join additional model features stored separately             |
| 📅 **Align time series from different sources**     | Combine two DataFrames with timestamp indexes                |


## 📌 Summary

| Feature      | Summary                                             |
| ------------ | --------------------------------------------------- |
| Purpose      | Concise index-based join                            |
| Join Types   | `'left'` (default), `'right'`, `'inner'`, `'outer'` |
| Match On     | Index by default, or column via `on`                |
| Alternatives | Use `pd.merge()` for more complex joins             |
| Use When     | Indexes are aligned or easy to align                |


<center><b>Thanks</b></center>