### **1. What is KNN?**

K-Nearest Neighbors is a **simple, instance-based, supervised machine learning algorithm** used for **classification** and **regression**.

* **Instance-based** means it doesn’t explicitly learn a model; it memorizes the training data.
* **Supervised** means it requires labeled data for training.

---

### **2. How KNN Works**

The core idea:

> To predict the label of a new point, look at the **K closest points** (neighbors) in the training set and take a **majority vote** (for classification) or **average** (for regression).

**Steps:**

1. Choose a value of **K** (number of neighbors to consider).
2. Compute the **distance** between the new point and all points in the training set.

   * Common distance metrics:

     * **Euclidean distance**: $\sqrt{\sum (x_i - y_i)^2}$
     * **Manhattan distance**: $\sum |x_i - y_i|$
3. Identify the **K nearest neighbors**.
4. **Classification:** Take the most common class among neighbors.
   **Regression:** Take the average of neighbors' values.
5. Assign this as the predicted label/value.

---

### **3. Example (Classification)**

Suppose you want to classify a fruit as **Apple** or **Orange** based on features like weight and color:

| Weight | Color  | Label  |
| ------ | ------ | ------ |
| 150    | Red    | Apple  |
| 170    | Red    | Apple  |
| 140    | Orange | Orange |
| 160    | Orange | Orange |

* New fruit: weight = 155, color = Red
* Compute distances to all points.
* Choose **K=3** nearest neighbors: maybe 2 Apples, 1 Orange.
* Predicted label = **Apple** (majority vote).

---

### **4. Pros of KNN**

* Simple to understand and implement.
* No training phase (lazy learner).
* Naturally handles multi-class problems.

---

### **5. Cons of KNN**

* Computationally expensive for large datasets (distance computed for all points).
* Sensitive to **feature scaling** (need normalization).
* Choosing the right **K** is critical:

  * Small K → sensitive to noise (overfitting).
  * Large K → may smooth out patterns (underfitting).

---

### **6. Tips**

* Always **normalize/standardize features** before using KNN.
* Use **cross-validation** to choose the best K.
* Consider **distance weighting**: closer neighbors have more influence.

