```{contents}
```

# Hyperparameter Tuning

KNN has **no training phase in the usual sense**, but it still has **hyperparameters** that control how predictions are made:

1. **`k` (number of neighbors)** – how many nearby points influence the prediction
2. **Distance metric** – how we measure “closeness” between points

   * Euclidean, Manhattan, Minkowski, Hamming, etc.
3. **Weights of neighbors** – whether all neighbors contribute equally or closer ones count more

   * `uniform`: all neighbors have equal weight
   * `distance`: closer neighbors have higher influence

Hyperparameter tuning is the process of **finding the combination of these hyperparameters that minimizes error** (or maximizes accuracy).

---

## **2. Key Hyperparameters**

| Hyperparameter    | Description                            | Effect on model                                                                    |
| ----------------- | -------------------------------------- | ---------------------------------------------------------------------------------- |
| `n_neighbors (k)` | Number of nearest neighbors            | Small k → noisy predictions, overfit <br> Large k → smoother predictions, underfit |
| `metric`          | Distance metric to calculate closeness | Changes neighbor selection → affects predictions                                   |
| `weights`         | Weighting of neighbors                 | Can improve performance by prioritizing closer points                              |

---

## **3. Methods for Hyperparameter Tuning**

### **A. Manual Search**

* Try different values of `k` (e.g., 1,3,5,…,15)
* Evaluate performance on a validation set
* Choose the `k` giving the **best accuracy / lowest error**

---

### **B. Grid Search**

* Explore a **grid of hyperparameter combinations**:

```python
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier

param_grid = {
    'n_neighbors': [3,5,7,9],
    'weights': ['uniform', 'distance'],
    'metric': ['euclidean', 'manhattan']
}

knn = KNeighborsClassifier()
grid = GridSearchCV(knn, param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)

print("Best hyperparameters:", grid.best_params_)
print("Best CV accuracy:", grid.best_score_)
```

---

### **C. Randomized Search**

* Instead of trying all combinations, sample **random combinations** (faster for large grids)
* Works similarly to GridSearchCV but more efficient.

---

### **D. Cross-Validation**

* Always combine tuning with **cross-validation** to avoid overfitting
* Use **k-fold CV** (e.g., k=5) to evaluate each hyperparameter setting.

---

## **4. Intuition Behind Tuning `k`**

* **Small `k`**:

  * Captures local patterns
  * Sensitive to noise → may misclassify outliers
* **Large `k`**:

  * Smooths predictions
  * Ignores local patterns → may underfit

Optimal `k` is usually found by **experimenting with validation scores or silhouette scores (for clustering)**.

---

## **5. Weighted vs Uniform Neighbors**

* **Uniform**: all neighbors contribute equally
* **Distance**: closer neighbors contribute more → often improves accuracy in noisy datasets

**Intuition:** nearer neighbors are more likely to be similar, so weighting helps KNN “trust” the right points.

---

## **6. Summary Workflow**

1. Choose a **range of hyperparameters** (`k`, `metric`, `weights`)
2. Split data (train/validation or use cross-validation)
3. Evaluate performance for each combination
4. Select **best combination**
5. Retrain KNN on the full training set with these hyperparameters

