# K-Nearest Neighbors: Classifier and Regressor

---

## Learning Objectives

By the end of this notebook, you will be able to:

1. Explain how KNN works as an instance-based (lazy) learner
2. Understand distance metrics (Euclidean, Manhattan) and why feature scaling is critical
3. Train a `KNeighborsClassifier` and `KNeighborsRegressor` using scikit-learn
4. Visualize decision boundaries and analyze the bias-variance tradeoff with different `k` values
5. Choose an optimal `k` using accuracy-vs-k analysis

## Prerequisites

- Python fundamentals (loops, functions, basic data structures)
- NumPy and Pandas basics
- Familiarity with train/test splitting and basic classification concepts (ML100, ML300)
- Matplotlib for plotting

## Table of Contents

1. [Theory: Instance-Based Learning](#1-theory-instance-based-learning)
2. [Distance Metrics](#2-distance-metrics)
3. [Why Scaling Is Critical](#3-why-scaling-is-critical)
4. [KNN Classifier on the Iris Dataset](#4-knn-classifier-on-the-iris-dataset)
5. [Decision Boundary Visualization](#5-decision-boundary-visualization)
6. [Bias-Variance Tradeoff and Choosing k](#6-bias-variance-tradeoff-and-choosing-k)
7. [KNN Regressor](#7-knn-regressor)
8. [Common Mistakes](#8-common-mistakes)
9. [Exercise](#9-exercise)

---

## 1. Theory: Instance-Based Learning

K-Nearest Neighbors (KNN) is an **instance-based** (or **lazy**) learning algorithm. Unlike models such as logistic regression or decision trees, KNN does **not** learn an explicit mapping from features to labels during training. Instead, it stores the entire training set and defers computation until prediction time.

**How it works:**
1. Store all training examples.
2. For a new query point, compute the distance to every training example.
3. Select the `k` closest neighbors.
4. **Classification:** take a majority vote among the neighbors' labels.
5. **Regression:** take the mean (or weighted mean) of the neighbors' target values.

Because there is no explicit training phase, KNN is called a **lazy learner** -- all the work happens at prediction time.

---

## 2. Distance Metrics

KNN relies on a notion of "closeness". The two most common metrics are:

**Euclidean distance** (L2):

$$d(\mathbf{x}, \mathbf{y}) = \sqrt{\sum_{i=1}^{n}(x_i - y_i)^2}$$

**Manhattan distance** (L1):

$$d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^{n} |x_i - y_i|$$

Euclidean distance is the default in scikit-learn. Manhattan distance can be more robust when features have very different distributions or when dealing with high-dimensional data.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris, make_regression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.metrics import accuracy_score, classification_report, mean_squared_error

sns.set_style("whitegrid")
plt.rcParams["figure.figsize"] = (8, 5)

print("Imports complete.")

In [None]:
# Quick demonstration of distance metrics
a = np.array([1, 2])
b = np.array([4, 6])

euclidean = np.sqrt(np.sum((a - b) ** 2))
manhattan = np.sum(np.abs(a - b))

print(f"Point A: {a}")
print(f"Point B: {b}")
print(f"Euclidean distance: {euclidean:.4f}")
print(f"Manhattan distance: {manhattan:.4f}")

---

## 3. Why Scaling Is Critical

KNN uses distances to determine neighbors. If one feature has a much larger range than another, it will **dominate** the distance calculation. Feature scaling ensures all features contribute equally.

Below we demonstrate KNN on the Iris dataset **with and without** scaling, using two features that have different scales.

In [None]:
# Load Iris and pick two features with different scales
iris = load_iris()
X = iris.data[:, [0, 3]]  # sepal length (cm) vs petal width (cm)
y = iris.target

print(f"Feature 0 (sepal length) range: {X[:, 0].min():.1f} - {X[:, 0].max():.1f}")
print(f"Feature 1 (petal width)  range: {X[:, 1].min():.1f} - {X[:, 1].max():.1f}")

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

In [None]:
# Without scaling
knn_unscaled = KNeighborsClassifier(n_neighbors=5)
knn_unscaled.fit(X_train, y_train)
acc_unscaled = accuracy_score(y_test, knn_unscaled.predict(X_test))

# With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

knn_scaled = KNeighborsClassifier(n_neighbors=5)
knn_scaled.fit(X_train_scaled, y_train)
acc_scaled = accuracy_score(y_test, knn_scaled.predict(X_test_scaled))

print(f"Accuracy WITHOUT scaling: {acc_unscaled:.4f}")
print(f"Accuracy WITH scaling:    {acc_scaled:.4f}")
print("\nScaling can improve results by preventing features with larger ranges from dominating.")

---

## 4. KNN Classifier on the Iris Dataset

Key parameters of `KNeighborsClassifier`:
- **`n_neighbors`** (default=5): number of neighbors to consider
- **`metric`** (default='minkowski' with p=2, i.e., Euclidean): distance function
- **`weights`** (default='uniform'): 'uniform' gives equal weight to all neighbors; 'distance' weights by inverse distance

In [None]:
# Full Iris dataset with all 4 features
X_full = iris.data
y_full = iris.target

X_train_f, X_test_f, y_train_f, y_test_f = train_test_split(
    X_full, y_full, test_size=0.3, random_state=42, stratify=y_full
)

scaler_f = StandardScaler()
X_train_f_s = scaler_f.fit_transform(X_train_f)
X_test_f_s = scaler_f.transform(X_test_f)

knn = KNeighborsClassifier(n_neighbors=5, metric="euclidean", weights="uniform")
knn.fit(X_train_f_s, y_train_f)

y_pred = knn.predict(X_test_f_s)
print(f"Test accuracy: {accuracy_score(y_test_f, y_pred):.4f}")
print("\nClassification Report:")
print(classification_report(y_test_f, y_pred, target_names=iris.target_names))

---

## 5. Decision Boundary Visualization

To visualize the decision boundary, we use only **2 features** so we can plot in 2D.

In [None]:
# Use petal length and petal width (features 2, 3) for cleaner separation
X_2d = iris.data[:, 2:4]
y_2d = iris.target

scaler_2d = StandardScaler()
X_2d_s = scaler_2d.fit_transform(X_2d)

def plot_decision_boundary(X, y, model, title, ax):
    """Plot the decision boundary for a 2D feature space."""
    h = 0.05
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    ax.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.RdYlBu)
    scatter = ax.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdYlBu,
                         edgecolors="k", s=30)
    ax.set_title(title)
    ax.set_xlabel("Petal length (scaled)")
    ax.set_ylabel("Petal width (scaled)")

fig, axes = plt.subplots(1, 3, figsize=(18, 5))
for i, k in enumerate([1, 5, 25]):
    knn_viz = KNeighborsClassifier(n_neighbors=k)
    knn_viz.fit(X_2d_s, y_2d)
    plot_decision_boundary(X_2d_s, y_2d, knn_viz, f"k = {k}", axes[i])

plt.tight_layout()
plt.show()
print("Notice: k=1 is very jagged (overfitting), k=25 is very smooth (underfitting).")

---

## 6. Bias-Variance Tradeoff and Choosing k

- **Small k** (e.g., k=1): low bias, high variance -- the model is sensitive to noise and overfits.
- **Large k** (e.g., k=50): high bias, low variance -- the model is too smooth and underfits.

We plot accuracy vs. k on both training and test sets to find the sweet spot.

In [None]:
X_train_2d, X_test_2d, y_train_2d, y_test_2d = train_test_split(
    X_2d_s, y_2d, test_size=0.3, random_state=42, stratify=y_2d
)

k_values = range(1, 31)
train_accs = []
test_accs = []

for k in k_values:
    knn_k = KNeighborsClassifier(n_neighbors=k)
    knn_k.fit(X_train_2d, y_train_2d)
    train_accs.append(accuracy_score(y_train_2d, knn_k.predict(X_train_2d)))
    test_accs.append(accuracy_score(y_test_2d, knn_k.predict(X_test_2d)))

plt.figure(figsize=(10, 5))
plt.plot(k_values, train_accs, "o-", label="Train accuracy")
plt.plot(k_values, test_accs, "s-", label="Test accuracy")
plt.xlabel("k (number of neighbors)")
plt.ylabel("Accuracy")
plt.title("Accuracy vs. k -- Bias-Variance Tradeoff")
plt.xticks(list(k_values))
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

best_k = k_values[np.argmax(test_accs)]
print(f"Best k on test set: {best_k} with accuracy {max(test_accs):.4f}")

---

## 7. KNN Regressor

`KNeighborsRegressor` predicts the **mean** of the k nearest neighbors' target values.

In [None]:
# Generate synthetic regression data
np.random.seed(42)
X_reg = np.sort(5 * np.random.rand(100, 1), axis=0)
y_reg = np.sin(X_reg).ravel() + np.random.randn(100) * 0.2

fig, axes = plt.subplots(1, 3, figsize=(18, 5))
for i, k in enumerate([1, 5, 20]):
    knn_reg = KNeighborsRegressor(n_neighbors=k)
    knn_reg.fit(X_reg, y_reg)
    X_plot = np.linspace(0, 5, 300).reshape(-1, 1)
    y_plot = knn_reg.predict(X_plot)
    
    axes[i].scatter(X_reg, y_reg, color="darkorange", s=20, label="data")
    axes[i].plot(X_plot, y_plot, color="navy", linewidth=2, label="prediction")
    axes[i].set_title(f"KNN Regressor (k={k})")
    axes[i].set_xlabel("X")
    axes[i].set_ylabel("y")
    axes[i].legend()

plt.tight_layout()
plt.show()
print("k=1 overfits (jagged), k=20 underfits (too smooth), k=5 is a good middle ground.")

---

## 8. Common Mistakes

| Mistake | Why It Matters |
|---|---|
| **Not scaling features** | Features with larger ranges dominate the distance calculation. Always use `StandardScaler` or `MinMaxScaler`. |
| **Using even k for binary classification** | Even k can lead to ties in majority voting. Use odd k for binary problems. |
| **Applying KNN to high-dimensional data** | In high dimensions, distances become less meaningful (the "curse of dimensionality"). Consider dimensionality reduction first. |
| **Choosing k without evaluation** | Always evaluate multiple k values on a validation set or with cross-validation. |
| **Forgetting KNN is slow at prediction** | KNN must compute distances to all training points at inference. For large datasets, consider approximate methods or other algorithms. |

---

## 9. Exercise

**Task:** Load the wine dataset (`sklearn.datasets.load_wine`). Split into train/test (70/30, `random_state=42`). Scale the features. Train a KNN classifier for k = 1, 3, 5, 7, 9, 11. Plot test accuracy vs. k and report the best k.

Bonus: Try `weights='distance'` and compare results.

In [None]:
# YOUR CODE HERE
from sklearn.datasets import load_wine

# 1. Load data
# 2. Split into train/test
# 3. Scale features
# 4. Loop over k values and record test accuracy
# 5. Plot accuracy vs. k
# 6. Print best k