```{contents}
```

# Cost Function

Actually, **K-Nearest Neighbors (KNN) doesn’t explicitly have a “cost function” like linear regression or SVM**, because it is a **non-parametric, instance-based learning algorithm**. But we can still think about a related concept that measures “how well KNN is performing.”

---

## **1. KNN is Lazy Learning**

* KNN **does not train a model**.
* There are **no parameters like weights** to optimize via a cost function.
* All computation happens at prediction time: distances are measured, neighbors are selected, and votes/averages are computed.

So unlike linear regression (minimizing squared error) or logistic regression (maximizing likelihood), **KNN does not have a formal cost function during training**.

---

## **2. Implicit “Cost” at Prediction**

We can think about KNN performance in terms of **prediction error**:

### **A. Classification**

* The “cost” is related to **misclassification**:

$$
\text{Error rate} = \frac{\text{Number of misclassified points}}{\text{Total points}}
$$

* Alternatively, **weighted misclassification** can be used if neighbors contribute differently (e.g., closer neighbors count more).

### **B. Regression**

* The “cost” is related to the **difference between predicted and true values**:

$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2
$$

* Here, $\hat{y}_i$ is the KNN prediction (mean of neighbors).
* So for regression, you can think of **MSE or MAE** as the “implicit cost” that KNN is minimizing by choosing neighbors wisely.

---

## **3. Choosing k as Implicit Optimization**

* Selecting **k** can be seen as **optimizing the model to minimize prediction error**.
* Small $k$ → low bias, high variance → sensitive to noise.
* Large $k$ → high bias, low variance → smoother prediction.
* Cross-validation is used to find **the k that minimizes validation error**.

---

## **4. Optional Weighted Cost Function**

Some KNN variants use **distance-weighted predictions**:

* **Classification**: closer neighbors get more weight in majority vote.
* **Regression**: prediction is weighted average:

$$
\hat{y} = \frac{\sum_{i=1}^k w_i y_i}{\sum_{i=1}^k w_i}, \quad w_i = \frac{1}{d(x, x_i)}
$$

* Here, $d(x, x_i)$ is the distance to neighbor $i$.
* This is effectively **minimizing weighted prediction error**.

---

**Summary**

| Aspect                    | KNN Behavior                                             |
| ------------------------- | -------------------------------------------------------- |
| Traditional cost function | None (lazy learner)                                      |
| Implicit “cost”           | Misclassification (classification), MSE/MAE (regression) |
| Optimization              | Choice of k and distance weighting                       |
| Goal                      | Minimize prediction error                                |

