# **7. Merging, Joining & Concatenation**

## **4. Aligning & Combining Data with Different Indexes**

In [8]:
import pandas as pd
import numpy as np

**Aligning & Combining Data with Different Indexes**, a subtle but powerful aspect of pandas that enables **clean and intelligent merging of datasets** when their indexes don’t exactly match.

This concept is especially important when:

* You want to **align data structures** for mathematical operations.
* You are **combining partially overlapping datasets**.
* You're filling missing values intelligently using other data sources.


## ✅ 1. What It Does and When to Use It

### 🔹 What It Does:

Pandas provides tools like **`.align()`** and **`.combine_first()`** to:

* **Align indexes** of two Series or DataFrames.
* Perform **element-wise operations** safely on mismatched data.
* Combine data **intelligently** while **preserving non-null values**.

### 🔹 When to Use:

* Two datasets have **different or partially overlapping indexes**.
* You're doing **mathematical operations** (add, subtract, etc.) across Series/DataFrames.
* You want to **fill missing values** in one dataset using values from another.
* You want to ensure **both DataFrames have the same shape** for safe comparison.


## 🧾 2. Syntax and Core Parameters


### 🔸 `.align()` – Align Two Objects on Index (and/or Columns)

```python
left_aligned, right_aligned = df1.align(df2, join='outer', axis=0, fill_value=None)
```

#### Parameters:

| Parameter    | Description                                         |
| ------------ | --------------------------------------------------- |
| `other`      | The other DataFrame or Series to align with         |
| `join`       | `'outer'` (default), `'inner'`, `'left'`, `'right'` |
| `axis`       | Align on index (`0`), columns (`1`), or both        |
| `fill_value` | Value to fill in for missing labels                 |



### 🔸 `.combine_first()` – Combine Two DataFrames by Filling Missing Values from Another

```python
result = df1.combine_first(df2)
```

* Takes values from `df1`, and fills **only missing entries** with values from `df2`.


## 🧠 3. Different Methods & Techniques

### 🔸 A. Aligning Two Series or DataFrames

```python
a, b = df1.align(df2)
```

* Returns aligned versions of `df1` and `df2` with matching indexes/columns.

---

### 🔸 B. Arithmetic Operations with Auto-alignment

```python
df1 + df2
```

* Pandas **automatically aligns indexes** before performing element-wise addition/subtraction/multiplication.

---

### 🔸 C. Filling Gaps Using `combine_first()`

```python
df1.combine_first(df2)
```

* Fill `NaN` values in `df1` with non-null values from `df2`.

---

### 🔸 D. Aligning Only on Index or Columns

```python
df1.align(df2, axis=1, join='inner')
```

* Align only columns, useful for side-by-side analysis.


## ⚠️ 4. Common Pitfalls and Best Practices

| Pitfall                                                 | Best Practice                                                       |
| ------------------------------------------------------- | ------------------------------------------------------------------- |
| ❌ Misaligned indexes cause NaNs in arithmetic ops       | ✅ Use `.align()` before arithmetic operations                       |
| ❌ Unexpected shape after operations                     | ✅ Always check shape of both objects using `.shape` before aligning |
| ❌ Overwriting important values using `.combine_first()` | ✅ Only use when you're certain which data should be preserved       |
| ❌ Wrong assumptions on index order                      | ✅ Use `.sort_index()` if needed to maintain consistency             |
| ❌ Forgetting fill values in `.align()`                  | ✅ Use `fill_value=0` or other appropriate values to avoid NaNs      |


## 🧪 5. Examples on Real/Pseudo Data
### ✅ Example 1: Aligning Two Series

In [2]:
s1 = pd.Series([1, 2], index=['a', 'b'])
s2 = pd.Series([3, 4], index=['b', 'c'])

display(s1, s2)

a    1
b    2
dtype: int64

b    3
c    4
dtype: int64

In [3]:
aligned_s1, aligned_s2 = s1.align(s2)

display(aligned_s1, aligned_s2)

a    1.0
b    2.0
c    NaN
dtype: float64

a    NaN
b    3.0
c    4.0
dtype: float64

### ✅ Example 2: Aligning with Fill Value

In [5]:
a1, a2 = s1.align(s2, fill_value=0)
display(a1, a2)

a    1.0
b    2.0
c    0.0
dtype: float64

a    0.0
b    3.0
c    4.0
dtype: float64

### ✅ Example 3: Arithmetic Operation (Auto-aligned)

In [6]:
s1 + s2

a    NaN
b    5.0
c    NaN
dtype: float64

### ✅ Example 4: Using `combine_first()` to Fill Missing Data

In [9]:
df1 = pd.DataFrame({'A': [1, np.nan]}, index=['a', 'b'])
df2 = pd.DataFrame({'A': [10, 20]}, index=['a', 'b'])

display(df1, df2)

Unnamed: 0,A
a,1.0
b,


Unnamed: 0,A
a,10
b,20


In [10]:
df1.combine_first(df2)

Unnamed: 0,A
a,1.0
b,20.0


### ✅ Example 5: Aligning on Columns Only

In [13]:
a1, a2 = df1.align(df2, axis=1, join='inner')
display(a1, a2)

Unnamed: 0,A
a,1.0
b,


Unnamed: 0,A
a,10
b,20


* Useful when columns overlap but indexes differ.


## 🌍 6. Real World Use Cases

| Use Case                                       | Description                                                                     |
| ---------------------------------------------- | ------------------------------------------------------------------------------- |
| 🧮 **Merging logs from different systems**     | Align logs on timestamps (index), even if one system is missing records         |
| 🩺 **Medical data integration**                | Align patient data from different clinics—some IDs may be missing in one source |
| 📈 **Financial time series alignment**         | Align stock data from two sources on trading dates                              |
| 🧠 **Imputation from backup sources**          | Use `combine_first()` to fill missing features using alternative data sources   |
| 🔁 **Ensuring matching shape before modeling** | Use `.align()` to synchronize training/test datasets before model fitting       |


## 📌 Summary

| Feature        | Summary                                                                   |
| -------------- | ------------------------------------------------------------------------- |
| Purpose        | Align and combine datasets with different indexes                         |
| Methods        | `.align()`, `.combine_first()`, arithmetic ops                            |
| Join Options   | `outer`, `inner`, `left`, `right`                                         |
| Use When       | Indexes or shapes don’t match but you still want to combine intelligently |
| Best Practices | Always inspect alignment results; use `fill_value` wisely                 |


<center><b>Thanks</b></center>