```{contents}
```

# Workflows

## **1. Data Preparation**

Before applying KNN:

* **Collect data:** You need labeled training data for classification or regression.
* **Feature scaling:** KNN relies on distance metrics, so features should be on the same scale. Use:

  * Min-Max Scaling
  * Standardization (Z-score)

**Why scaling matters:** Without scaling, a feature with a larger range will dominate the distance calculation.

---

## **2. Choose the Distance Metric**

Decide how to measure "closeness" between points. Common choices:

| Metric    | Formula (2D example)                    | When to Use                      |   |             |   |                                        |
| --------- | --------------------------------------- | -------------------------------- | - | ----------- | - | -------------------------------------- |
| Euclidean | $\sqrt{(x_1 - y_1)^2 + (x_2 - y_2)^2}$  | Most common, continuous features |   |             |   |                                        |
| Manhattan | (                                       | x\_1 - y\_1                      | + | x\_2 - y\_2 | ) | Grid-like distances, discrete features |
| Minkowski | Generalization of Euclidean & Manhattan | Flexibility with parameter `p`   |   |             |   |                                        |
| Hamming   | Counts different features               | Categorical variables            |   |             |   |                                        |

---

## **3. Select `k`**

Decide how many neighbors to consider:

* **Small `k`** → sensitive to noise, high variance (overfitting)
* **Large `k`** → smooths boundaries, may underfit
* Common practice: test multiple odd `k` values using cross-validation.

---

## **4. Compute Distances**

For each new data point $x_{\text{new}}$:

1. Calculate the distance to every point in the training set.
2. Store these distances in a sorted list.

---

## **5. Identify Nearest Neighbors**

* Pick the top `k` closest points from the sorted distance list.
* These points “vote” for the label (classification) or contribute to the average (regression).

---

## **6. Aggregate Neighbor Information**

* **Classification:** Majority vote determines the predicted class.

  * Optional: weighted vote (closer neighbors count more).
* **Regression:** Take the mean (or weighted mean) of neighbors’ values.

---

## **7. Assign the Label or Value**

* Output the predicted class or numerical value for $x_{\text{new}}$.

---

## **8. Evaluate the Model**

* Use performance metrics:

  * Classification: Accuracy, Precision, Recall, F1-score, Confusion Matrix
  * Regression: MSE, RMSE, MAE, R²

* Optionally, tune `k` and/or distance metric to improve results.

---

**Summary**

1. **Prepare data** → 2. **Scale features** → 3. **Select k & distance metric** →
2. **Compute distances** → 5. **Find nearest neighbors** → 6. **Aggregate results** → 7. **Predict output** → 8. **Evaluate & tune**

