<a href="https://colab.research.google.com/github/samiha-mahin/Data-Analysis/blob/main/KNN_Imputer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **KNN Imputer**

---

### 🧠 What is KNN Imputer?

**KNN Imputer** fills in missing values (like `NaN`s) using the **K-Nearest Neighbors** approach.

Instead of guessing or averaging, it:

* Looks at the **K most similar rows (neighbors)** in the dataset.
* Then fills in the missing value based on the **average of those neighbors' values**.

---

### 📦 When to use?

When your data has **missing values**, and you want to **impute** them based on **other similar data points**, not just column-wise mean/median.

---

### 🪄 Simple Example:

Imagine this small dataset:

| Height (cm) | Weight (kg) | Age (years) |
| ----------- | ----------- | ----------- |
| 160         | 55          | 25          |
| 165         | 60          | 30          |
| 170         | **NaN**     | 28          |
| 175         | 75          | 35          |

We want to fill the **missing weight** for the third person (`NaN` in row 3).

---

### 🔍 Steps of KNN Imputer:

1. **Choose `k`** (number of neighbors). Let’s say `k = 2`.
2. **Find the 2 nearest rows** to the third row (based on Height and Age).

   * Compare row 3 with others using Euclidean distance.
   * Most similar rows to row 3 might be row 1 and row 2.
3. **Take average weight** of row 1 and row 2:

   * (55 + 60) / 2 = **57.5**
4. **Impute `NaN`** with **57.5**

---

### ✅ After KNN Imputer:

| Height (cm) | Weight (kg) | Age (years) |
| ----------- | ----------- | ----------- |
| 160         | 55          | 25          |
| 165         | 60          | 30          |
| 170         | **57.5**    | 28          |
| 175         | 75          | 35          |

---

### 🔧 How to use in code (Python):

```python
from sklearn.impute import KNNImputer
import pandas as pd
import numpy as np

# Example dataset
data = pd.DataFrame({
    'Height': [160, 165, 170, 175],
    'Weight': [55, 60, np.nan, 75],
    'Age': [25, 30, 28, 35]
})

# Create KNN Imputer with k=2
imputer = KNNImputer(n_neighbors=2)

# Impute the data
imputed_data = imputer.fit_transform(data)

# Convert back to DataFrame
imputed_df = pd.DataFrame(imputed_data, columns=data.columns)
print(imputed_df)
```

---

