<a href="https://colab.research.google.com/github/sreent/machine-learning/blob/main/K-Nearest%20Neighbours%20Regression/KNN%20Regression%20Hands-On%20Lab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# K-Nearest Neighbors (KNN) Regression

In this lab, you will implement a K-Nearest Neighbors regressor from scratch, then compare it with scikit-learn's implementation. You'll learn about:
- The core KNN regression algorithm
- Distance metrics and feature scaling
- Bias-variance tradeoff when tuning K
- Evaluation metrics for regression tasks

Unlike classification where KNN uses **voting** to select a class label, in regression KNN **averages** the target values of the K nearest neighbors to make continuous predictions.

## Overview of KNN Regression

**Algorithm:**
1. Given a new data point $\mathbf{x}_{\text{new}}$, compute the distance from $\mathbf{x}_{\text{new}}$ to every point in the training set.
2. Select the K training points closest to $\mathbf{x}_{\text{new}}$ (the K nearest neighbors).
3. **Average** the target values of these K neighbors to produce the prediction:

$$\hat{y}_{\text{new}} = \frac{1}{K} \sum_{i=1}^{K} y_i$$

where $y_i$ are the target values of the K nearest neighbors.

**Key Difference from Classification:**
- **Classification**: Uses majority voting among K neighbors' class labels
- **Regression**: Averages the K neighbors' continuous target values

**Example:**
If K=3 and the 3 nearest neighbors have target values [2.5, 3.0, 2.8], the prediction would be:
$$\hat{y} = \frac{2.5 + 3.0 + 2.8}{3} = 2.77$$

> **Question**: In KNN regression, the parameter K refers to:
>
> A) The number of features in the dataset  
>
> B) The number of nearest neighbors whose target values are averaged for prediction  
>
> C) The number of clusters in the data  
>
> D) The number of iterations for training
>
> **Answer**: B. K is the number of nearest neighbors we average to make a prediction.

## Bias-Variance Trade-off in KNN Regression

The choice of K controls the model's complexity:

**Small K (e.g., K=1):**
- Low bias, high variance
- Very flexible, can fit complex patterns
- Sensitive to noise and outliers
- Risk of overfitting
- For K=1, prediction equals the nearest neighbor's value exactly

**Large K (e.g., K=100):**
- High bias, low variance
- Less flexible, smoother predictions
- More robust to noise
- Risk of underfitting
- Predictions become more similar (averaged over many points)

**Optimal K:**
- Balances bias and variance
- Found through validation/cross-validation
- Depends on the dataset size and noise level

## Distance Metrics and Feature Scaling

**Common Distance Metrics:**
- **Euclidean (L2)**: $d(\mathbf{x}, \mathbf{x}') = \sqrt{\sum_{i=1}^{n} (x_i - x_i')^2}$
- **Manhattan (L1)**: $d(\mathbf{x}, \mathbf{x}') = \sum_{i=1}^{n} |x_i - x_i'|$
- **Minkowski**: Generalization of both (p=1 is Manhattan, p=2 is Euclidean)

**Feature Scaling is Critical:**
- Features with larger ranges dominate distance calculations
- Example: If feature1 ranges from 0-1 and feature2 from 0-1000, feature2 will dominate
- **Solution**: Standardize features to have mean=0 and std=1, or normalize to [0,1] range

$$z = \frac{x - \mu}{\sigma}$$

where $\mu$ is the mean and $\sigma$ is the standard deviation.

> **Question**: When using KNN on a dataset with features measured in very different scales (e.g., age in years vs. income in dollars), what should you do?
>
> A) Nothing—KNN handles different scales automatically  
>
> B) Remove the feature with the largest scale  
>
> C) Standardize or normalize all features to comparable ranges  
>
> D) Use only categorical features
>
> **Answer**: C. Scaling ensures all features contribute fairly to distance calculations.

## Pseudocode for KNN Regressor

Before implementing the algorithm, let's understand the pseudocode structure (from lecture slides):

```
# Inputs
#   data      ← training set of N examples (x, y)
#   k         ← number of neighbours
#   metric    ← distance function (e.g., Euclidean, Manhattan)
#   X_query   ← set of examples to predict

# ----- "fit" (lazy) -----
X_train ← data.x
y_train ← data.y

# ----- predict -----
ŷ ← list of length |X_query|  # outputs align 1:1 with X_query

FOR i = 1 TO |X_query| DO
    x* ← X_query[i]
    d ← distances from x* to all X_train using metric
    J ← indices of the k smallest values in d
    # regression prediction = average of the targets of the k nearest neighbours
    ŷ[i] ← mean(y_train[J])
END FOR

RETURN ŷ
```

**Key Points:**
- KNN is a "lazy learner" - the `fit` method just stores the training data
- The `predict` method does all the work: compute distances, find K nearest, average their targets
- **Regression uses averaging** (not voting like classification)

## Implementing a Custom KNN Regressor

Below is a scaffold of the `MyKNNRegressor` class. Fill in the TODO sections to complete the implementation.

In [None]:
import numpy as np
from sklearn.metrics import pairwise_distances

class MyKNNRegressor:
    """
    A simple K-Nearest Neighbors Regressor implementation.
    
    Parameters:
    -----------
    n_neighbors : int, default=5
        Number of neighbors to use for prediction
    metric : str, default='euclidean'
        Distance metric to use ('euclidean', 'manhattan', etc.)
    """
    
    def __init__(self, n_neighbors=5, metric='euclidean'):
        self.n_neighbors = n_neighbors
        self.metric = metric
        self.X_train = None
        self.y_train = None
    
    def fit(self, X, y):
        """
        Fit the KNN regressor by storing the training data.
        
        Parameters:
        -----------
        X : array-like of shape (n_samples, n_features)
            Training data
        y : array-like of shape (n_samples,)
            Target values
        """
        # TODO: Store the training data
        # Hint: KNN is a "lazy learner" - it just stores the training data
        self.X_train = X
        self.y_train = y
        return self
    
    def predict(self, X):
        """
        Predict target values for test data.
        
        Parameters:
        -----------
        X : array-like of shape (n_samples, n_features)
            Test data
            
        Returns:
        --------
        predictions : array of shape (n_samples,)
            Predicted target values
        """
        predictions = []
        
        for x_test in X:
            # TODO: For each test point:
            # 1. Compute distances from x_test to all training points
            # 2. Find indices of K nearest neighbors
            # 3. Get target values of those K neighbors
            # 4. Average them to get the prediction
            
            # Step 1: Compute distances
            distances = pairwise_distances(
                x_test.reshape(1, -1), 
                self.X_train, 
                metric=self.metric
            ).ravel()
            
            # Step 2: Find K nearest neighbor indices
            k_nearest_indices = np.argsort(distances)[:self.n_neighbors]
            
            # Step 3: Get target values of K nearest neighbors
            k_nearest_targets = self.y_train[k_nearest_indices]
            
            # Step 4: Average the target values
            prediction = np.mean(k_nearest_targets)
            
            predictions.append(prediction)
        
        return np.array(predictions)

print("MyKNNRegressor class defined successfully!")

Once you have filled in the implementation, let's test our custom regressor on a simple dataset.

## A Dataset for Visualization

We'll create a synthetic 1D regression dataset to visualize how KNN regression works.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(42)

# Generate synthetic 1D data
def f(x):
    """True underlying function: sinusoidal pattern"""
    return np.sin(x) + 0.1 * x

# Training data
X_train_1d = np.sort(np.random.uniform(0, 10, 100))
y_train_1d = f(X_train_1d) + np.random.normal(0, 0.2, 100)  # Add noise

# Reshape for sklearn compatibility
X_train_1d = X_train_1d.reshape(-1, 1)

# Test data (for smooth prediction curve)
X_test_1d = np.linspace(0, 10, 200).reshape(-1, 1)
y_test_1d_true = f(X_test_1d.ravel())

print(f"Training data shape: X={X_train_1d.shape}, y={y_train_1d.shape}")
print(f"Test data shape: X={X_test_1d.shape}")

Let's visualize the training data:

In [None]:
# Manual step-by-step prediction (matching lecture slides 11-13)
from sklearn.metrics import pairwise_distances

# Choose a test point (let's pick x=5.0 from our test range)
x_test_single = np.array([[5.0]])  # Must be 2D for sklearn
print("="*70)
print("STEP-BY-STEP: How KNN Predicts for x_test = 5.0")
print("="*70)

# Step 1: Calculate distances from test point to ALL training points
print("\nSTEP 1: Pairwise Distance Calculation")
print("-" * 70)
distances = pairwise_distances(x_test_single, X_train_1d, metric='euclidean').ravel()
print(f"Computed {len(distances)} distances from x_test=5.0 to all training points")
print(f"Distance array shape: {distances.shape}")
print(f"First 10 distances: {distances[:10].round(3)}")

# Step 2: Find K nearest neighbors
print("\nSTEP 2: Finding K Nearest Neighbors")
print("-" * 70)
K = 5
k_indices = np.argsort(distances)[:K]  # Indices of K smallest distances
k_distances = distances[k_indices]
print(f"K = {K}")
print(f"Indices of K nearest neighbors: {k_indices}")
print(f"Their distances: {k_distances.round(3)}")
print(f"Their x-coordinates: {X_train_1d[k_indices].ravel().round(3)}")

# Step 3: Get target values of K nearest neighbors
print("\nSTEP 3: Get Neighbor Targets")
print("-" * 70)
k_targets = y_train_1d[k_indices]
print(f"NN's Labels (target values): {k_targets.round(3)}")

# Step 4: Average to get prediction
print("\nSTEP 4: Average (THE PREDICTION!)")
print("-" * 70)
manual_prediction = np.mean(k_targets)
print(f"Prediction = mean({k_targets.round(3)})")
print(f"Prediction = {manual_prediction:.3f}")

# Verify with our custom KNN
knn_verify = MyKNNRegressor(n_neighbors=K)
knn_verify.fit(X_train_1d, y_train_1d)
verify_prediction = knn_verify.predict(x_test_single)[0]

print(f"\n" + "="*70)
print("VERIFICATION:")
print(f"  Manual calculation:  {manual_prediction:.3f}")
print(f"  MyKNNRegressor:      {verify_prediction:.3f}")
print(f"  Match: {np.isclose(manual_prediction, verify_prediction)} ✓")
print("="*70)

# Visualize this specific prediction
plt.figure(figsize=(12, 5))
plt.scatter(X_train_1d, y_train_1d, c='lightblue', alpha=0.6, edgecolor='k', s=50, label='All training data')
plt.scatter(X_train_1d[k_indices], k_targets, c='red', s=100, edgecolor='k', 
           label=f'K={K} nearest neighbors', zorder=5)
plt.scatter(x_test_single, manual_prediction, c='green', s=200, marker='*', 
           edgecolor='k', label='Prediction (average)', zorder=10)
plt.axhline(manual_prediction, color='green', linestyle='--', alpha=0.3)
plt.xlabel('X')
plt.ylabel('y')
plt.title(f'Step-by-Step KNN Prediction: x_test={x_test_single[0,0]}, K={K}, Prediction={manual_prediction:.3f}')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Understanding the Prediction Process: Step-by-Step

Before we test different K values, let's manually walk through exactly what happens when KNN makes a single prediction. This matches the step-by-step process from the lecture slides (slides 11-13).

**The 4 Steps:**
1. **Pairwise Distance Calculation**: Compute distance from test point to all training points
2. **Finding K Nearest Neighbors**: Sort distances and select K smallest
3. **Get Neighbor Targets**: Retrieve the y-values of those K neighbors  
4. **Average**: Compute mean of the K target values → this is the prediction!

Let's demonstrate this process:

In [None]:
plt.figure(figsize=(12, 5))

# Plot training data
plt.scatter(X_train_1d, y_train_1d, c='blue', alpha=0.6, edgecolor='k', s=50, label='Training data')
plt.plot(X_test_1d, y_test_1d_true, 'g--', lw=2, label='True function')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Training Data with True Underlying Function')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Testing Our Custom KNN Regressor

Let's test our implementation with different K values to see how it affects the predictions.

In [None]:
# Test different K values
k_values = [1, 3, 10, 30]

plt.figure(figsize=(14, 10))

for idx, k in enumerate(k_values, 1):
    # Fit our custom KNN regressor
    knn = MyKNNRegressor(n_neighbors=k)
    knn.fit(X_train_1d, y_train_1d)
    
    # Make predictions
    y_pred = knn.predict(X_test_1d)
    
    # Plot
    plt.subplot(2, 2, idx)
    plt.scatter(X_train_1d, y_train_1d, c='blue', alpha=0.6, edgecolor='k', s=30, label='Training data')
    plt.plot(X_test_1d, y_test_1d_true, 'g--', lw=1.5, label='True function', alpha=0.7)
    plt.plot(X_test_1d, y_pred, 'r-', lw=2, label=f'KNN (K={k})')
    plt.xlabel('X')
    plt.ylabel('y')
    plt.title(f'KNN Regression with K={k}')
    plt.legend()
    plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Observations:")
print("- K=1: Very wiggly, follows training points closely (high variance, low bias)")
print("- K=3-10: Smoother, captures general trend while adapting to local patterns")
print("- K=30: Very smooth, may miss local variations (low variance, higher bias)")

> **Question**: Looking at the plots above, which K value shows signs of overfitting?
>
> A) K=30 (too smooth, doesn't follow the data closely)  
>
> B) K=1 (too wiggly, follows every noise point)  
>
> C) K=10 (balanced smoothness)  
>
> D) All of them equally
>
> **Answer**: B. K=1 is overfitting—it perfectly fits training noise, creating excessive variation.

## Working with a Real Dataset

Now let's work with a real 2D regression dataset and compare our implementation with scikit-learn's.

In [None]:
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Generate a 2D regression dataset
X, y = make_regression(
    n_samples=300,
    n_features=2,
    n_informative=2,
    noise=10,
    random_state=42
)

print(f"Dataset shape: X={X.shape}, y={y.shape}")
print(f"Target range: [{y.min():.2f}, {y.max():.2f}]")

Visualize the 2D dataset with target values as colors:

In [None]:
plt.figure(figsize=(10, 6))
scatter = plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis', 
                     edgecolor='k', s=50, alpha=0.7)
plt.colorbar(scatter, label='Target Value')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('2D Regression Dataset (color represents target value)')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Splitting into Train, Validation, and Test Sets

We split the data into:
- **Training set (60%)**: To fit the model
- **Validation set (20%)**: To tune hyperparameters (K)
- **Test set (20%)**: For final unbiased performance evaluation

In [None]:
# Split into train, validation, and test sets (60/20/20)
X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, test_size=0.4, random_state=42
)
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=42
)

print(f"Train size: {X_train.shape[0]}")
print(f"Validation size: {X_val.shape[0]}")
print(f"Test size: {X_test.shape[0]}")

After this split, you should see roughly: Train size 180, Validation size 60, Test size 60.

> **Question**: Why do we use a separate validation set instead of tuning the hyperparameters directly on the test set?
>
> A) The test set is too small  
>
> B) To prevent overfitting to the test set and get an unbiased final performance estimate  
>
> C) The validation set trains faster  
>
> D) It's just a convention with no real benefit
>
> **Answer**: B. If we tune on the test set, we'll overfit to it and get an overly optimistic performance estimate.

## Feature Scaling

Let's demonstrate the importance of feature scaling by comparing scaled vs. unscaled performance.

In [None]:
from sklearn.metrics import mean_squared_error, r2_score

# Without scaling
knn_raw = MyKNNRegressor(n_neighbors=5)
knn_raw.fit(X_train, y_train)
y_val_pred_raw = knn_raw.predict(X_val)
rmse_raw = np.sqrt(mean_squared_error(y_val, y_val_pred_raw))
r2_raw = r2_score(y_val, y_val_pred_raw)

# With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)

knn_scaled = MyKNNRegressor(n_neighbors=5)
knn_scaled.fit(X_train_scaled, y_train)
y_val_pred_scaled = knn_scaled.predict(X_val_scaled)
rmse_scaled = np.sqrt(mean_squared_error(y_val, y_val_pred_scaled))
r2_scaled = r2_score(y_val, y_val_pred_scaled)

print("Performance Comparison (K=5):")
print("="*50)
print(f"Without scaling - RMSE: {rmse_raw:.2f}, R²: {r2_raw:.3f}")
print(f"With scaling    - RMSE: {rmse_scaled:.2f}, R²: {r2_scaled:.3f}")
print("="*50)
if rmse_scaled < rmse_raw:
    print(f"Improvement: {((rmse_raw - rmse_scaled) / rmse_raw * 100):.1f}% reduction in RMSE")
else:
    print("Note: Scaling impact may vary depending on feature scales")

## Tuning the Hyperparameter K

Let's find the optimal K by evaluating different values on the validation set.

In [None]:
# TODO: Complete this section
# 1. Create lists to store training and validation metrics
# 2. Loop through K values from 1 to 30
# 3. For each K, fit the model and compute RMSE on both train and validation sets
# 4. Store the results

train_rmse = []
val_rmse = []
k_range = range(1, 31)

for k in k_range:
    # Fit model
    model = MyKNNRegressor(n_neighbors=k)
    model.fit(X_train_scaled, y_train)
    
    # Predictions
    y_train_pred = model.predict(X_train_scaled)
    y_val_pred = model.predict(X_val_scaled)
    
    # Compute RMSE
    train_rmse.append(np.sqrt(mean_squared_error(y_train, y_train_pred)))
    val_rmse.append(np.sqrt(mean_squared_error(y_val, y_val_pred)))

# Find best K
best_k = list(k_range)[np.argmin(val_rmse)]
best_val_rmse = min(val_rmse)

print(f"Best K: {best_k}")
print(f"Best Validation RMSE: {best_val_rmse:.2f}")

Now, let's plot the RMSE vs. K to visualize the bias-variance tradeoff:

In [None]:
plt.figure(figsize=(10, 6))
plt.plot(list(k_range), train_rmse, 'o-', label='Training RMSE', linewidth=2, markersize=6)
plt.plot(list(k_range), val_rmse, 's-', label='Validation RMSE', linewidth=2, markersize=6)
plt.axvline(best_k, linestyle='--', color='red', linewidth=2, label=f'Best K={best_k}')
plt.xlabel('K (Number of Neighbors)', fontsize=12)
plt.ylabel('RMSE', fontsize=12)
plt.title('Bias-Variance Tradeoff: RMSE vs K', fontsize=14)
plt.legend(fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print("\nObservations:")
print("- Small K: Low training RMSE (fits training data closely) but higher validation RMSE (overfitting)")
print("- Large K: Training and validation RMSE converge (underfitting - too much smoothing)")
print(f"- Optimal K={best_k}: Best balance between bias and variance")

## Comparing with Scikit-Learn's Implementation

Let's verify our implementation matches scikit-learn's KNeighborsRegressor.

In [None]:
from sklearn.neighbors import KNeighborsRegressor

# Our implementation
our_knn = MyKNNRegressor(n_neighbors=best_k)
our_knn.fit(X_train_scaled, y_train)
our_pred = our_knn.predict(X_val_scaled)
our_rmse = np.sqrt(mean_squared_error(y_val, our_pred))
our_r2 = r2_score(y_val, our_pred)

# Scikit-learn's implementation
sklearn_knn = KNeighborsRegressor(n_neighbors=best_k)
sklearn_knn.fit(X_train_scaled, y_train)
sklearn_pred = sklearn_knn.predict(X_val_scaled)
sklearn_rmse = np.sqrt(mean_squared_error(y_val, sklearn_pred))
sklearn_r2 = r2_score(y_val, sklearn_pred)

print("Validation Set Performance Comparison:")
print("="*60)
print(f"Our Implementation  - RMSE: {our_rmse:.3f}, R²: {our_r2:.3f}")
print(f"Scikit-learn       - RMSE: {sklearn_rmse:.3f}, R²: {sklearn_r2:.3f}")
print("="*60)
print(f"Difference in RMSE: {abs(our_rmse - sklearn_rmse):.6f}")
print("\nThe implementations should match very closely!")

## Final Evaluation on Test Set

Now that we've chosen the best K using the validation set, let's evaluate on the held-out test set to get an unbiased performance estimate.

In [None]:
from sklearn.metrics import mean_absolute_error

# Scale test set
X_test_scaled = scaler.transform(X_test)

# Retrain on train + validation combined
X_train_all = np.vstack([X_train_scaled, X_val_scaled])
y_train_all = np.hstack([y_train, y_val])

final_knn = MyKNNRegressor(n_neighbors=best_k)
final_knn.fit(X_train_all, y_train_all)

# Predict on test set
y_test_pred = final_knn.predict(X_test_scaled)

# Compute metrics
test_rmse = np.sqrt(mean_squared_error(y_test, y_test_pred))
test_mae = mean_absolute_error(y_test, y_test_pred)
test_r2 = r2_score(y_test, y_test_pred)

print("="*60)
print("FINAL TEST SET PERFORMANCE")
print("="*60)
print(f"Best K: {best_k}")
print(f"RMSE:  {test_rmse:.3f}")
print(f"MAE:   {test_mae:.3f}")
print(f"R²:    {test_r2:.3f}")
print("="*60)

Visualize predictions vs. actual values:

In [None]:
plt.figure(figsize=(12, 5))

# Actual vs Predicted
plt.subplot(1, 2, 1)
plt.scatter(y_test, y_test_pred, alpha=0.6, edgecolor='k', s=50)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 
         'r--', lw=2, label='Perfect prediction')
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
plt.title(f'Test Set: Actual vs Predicted\n(R² = {test_r2:.3f}, RMSE = {test_rmse:.2f})')
plt.legend()
plt.grid(True, alpha=0.3)

# Residuals
plt.subplot(1, 2, 2)
residuals = y_test - y_test_pred
plt.scatter(y_test_pred, residuals, alpha=0.6, edgecolor='k', s=50)
plt.axhline(0, color='r', linestyle='--', lw=2)
plt.xlabel('Predicted Values')
plt.ylabel('Residuals (Actual - Predicted)')
plt.title('Residual Plot')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Distance-Weighted KNN (Advanced)

In standard KNN regression, all K neighbors contribute equally to the prediction. A more sophisticated approach is to **weight neighbors by their distance** - closer neighbors should have more influence.

**Distance-weighted average:**
$$\hat{y} = \frac{\sum_{i=1}^{K} w_i \cdot y_i}{\sum_{i=1}^{K} w_i}$$

where $w_i = \frac{1}{d_i + \epsilon}$ (inverse distance weights, $\epsilon$ prevents division by zero)

In [None]:
# Compare uniform vs distance-weighted KNN using sklearn
knn_uniform = KNeighborsRegressor(n_neighbors=best_k, weights='uniform')
knn_distance = KNeighborsRegressor(n_neighbors=best_k, weights='distance')

knn_uniform.fit(X_train_all, y_train_all)
knn_distance.fit(X_train_all, y_train_all)

pred_uniform = knn_uniform.predict(X_test_scaled)
pred_distance = knn_distance.predict(X_test_scaled)

rmse_uniform = np.sqrt(mean_squared_error(y_test, pred_uniform))
rmse_distance = np.sqrt(mean_squared_error(y_test, pred_distance))

print("Comparison: Uniform vs Distance-Weighted KNN")
print("="*50)
print(f"Uniform weights:  RMSE = {rmse_uniform:.3f}")
print(f"Distance weights: RMSE = {rmse_distance:.3f}")
print("="*50)

if rmse_distance < rmse_uniform:
    improvement = (rmse_uniform - rmse_distance) / rmse_uniform * 100
    print(f"Distance weighting improves RMSE by {improvement:.1f}%")
else:
    print("Uniform weights perform better on this dataset")

## Exercise: Implement Distance-Weighted KNN

**Challenge**: Extend the `MyKNNRegressor` class to support distance-weighted predictions.

**Hints:**
1. Add a `weights` parameter to `__init__` (either 'uniform' or 'distance')
2. In the `predict` method, if weights='distance':
   - Get the distances to the K nearest neighbors
   - Compute weights as `w_i = 1 / (distance_i + 1e-10)`
   - Compute weighted average: `prediction = sum(w_i * y_i) / sum(w_i)`

In [None]:
# TODO: Implement distance-weighted KNN
class MyKNNRegressorWeighted:
    """
    KNN Regressor with support for distance-weighted predictions.
    """
    
    def __init__(self, n_neighbors=5, metric='euclidean', weights='uniform'):
        self.n_neighbors = n_neighbors
        self.metric = metric
        self.weights = weights  # 'uniform' or 'distance'
        self.X_train = None
        self.y_train = None
    
    def fit(self, X, y):
        self.X_train = X
        self.y_train = y
        return self
    
    def predict(self, X):
        predictions = []
        
        for x_test in X:
            # Compute distances
            distances = pairwise_distances(
                x_test.reshape(1, -1), 
                self.X_train, 
                metric=self.metric
            ).ravel()
            
            # Find K nearest neighbor indices
            k_nearest_indices = np.argsort(distances)[:self.n_neighbors]
            k_nearest_distances = distances[k_nearest_indices]
            k_nearest_targets = self.y_train[k_nearest_indices]
            
            # TODO: Implement distance weighting
            if self.weights == 'uniform':
                prediction = np.mean(k_nearest_targets)
            elif self.weights == 'distance':
                # Compute inverse distance weights
                weights = 1 / (k_nearest_distances + 1e-10)
                # Weighted average
                prediction = np.sum(weights * k_nearest_targets) / np.sum(weights)
            
            predictions.append(prediction)
        
        return np.array(predictions)

# Test your implementation
my_knn_weighted = MyKNNRegressorWeighted(n_neighbors=best_k, weights='distance')
my_knn_weighted.fit(X_train_all, y_train_all)
my_pred_weighted = my_knn_weighted.predict(X_test_scaled)
my_rmse_weighted = np.sqrt(mean_squared_error(y_test, my_pred_weighted))

print(f"Your distance-weighted KNN RMSE: {my_rmse_weighted:.3f}")
print(f"Sklearn distance-weighted RMSE:  {rmse_distance:.3f}")
print(f"Difference: {abs(my_rmse_weighted - rmse_distance):.6f}")

## Summary and Key Takeaways

**What we learned:**

1. **KNN Regression Algorithm**: Predicts continuous values by averaging K nearest neighbors' targets

2. **Bias-Variance Tradeoff**:
   - Small K → High variance, low bias (overfitting)
   - Large K → Low variance, high bias (underfitting)
   - Optimal K balances both

3. **Feature Scaling**: Critical for KNN since it's distance-based

4. **Distance Metrics**: Euclidean vs Manhattan vs others

5. **Distance Weighting**: Closer neighbors can have more influence

6. **Evaluation Metrics**:
   - RMSE: Penalizes large errors
   - MAE: Average absolute error
   - R²: Proportion of variance explained

7. **Train/Validation/Test Split**: Essential for proper hyperparameter tuning and unbiased evaluation

**When to use KNN Regression:**
- Small to medium datasets
- Non-linear relationships
- Need interpretable predictions (can show similar examples)
- Quick prototyping and baseline

**Limitations:**
- Computationally expensive for large datasets (must compute all distances)
- Curse of dimensionality (performance degrades in high dimensions)
- Sensitive to irrelevant features
- Requires feature scaling

## Additional Exercises

1. **Try different distance metrics**: Modify `MyKNNRegressor` to use Manhattan distance and compare performance

2. **Feature engineering**: Create polynomial features and see if KNN performance improves

3. **Cross-validation**: Instead of a single validation split, implement k-fold cross-validation to choose K

4. **Computational efficiency**: Measure prediction time for different dataset sizes. How does it scale?

5. **Real dataset**: Apply KNN regression to the California Housing dataset and tune all hyperparameters

6. **Comparison**: Compare KNN with Linear Regression, Decision Trees, and Random Forest on the same dataset