# K-Nearest Neighbors (KNN) - Conceptual Overview

## 🧠 Classification: The Fundamental Problem
Classification refers to the task of predicting a **label** for a given **input data point**.
Labels can be of different types:
- **Binary Classification**: Two possible labels (e.g., yes/no).
- **Multiclass Classification**: More than two distinct categories.
- **Multilabel Classification**: A single point may belong to multiple categories simultaneously.

## 🧪 KNN Theory
- KNN = K-Nearest Neighbors
- Predicts the label of a new point by examining its *K* nearest labeled neighbors.
- No explicit training phase: it's a lazy learner.
- Requires distance calculation to determine closeness.

## 📏 Distance Functions
- **Euclidean (L2)**: $\sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}$
- **Manhattan (L1)**: $|x_2 - x_1| + |y_2 - y_1|$
- **Minkowski**: Generalized form with parameter *q*.
  - For q=1 → Manhattan, q=2 → Euclidean.

## ✅ Evaluation
- **Accuracy**: Ratio of correct predictions over total predictions.
- Requires splitting the dataset into training and testing sets.

## 🔧 Choosing the Right K
- **Small k** → Can overfit and be sensitive to noise.
- **Large k** → Can underfit and miss the true pattern.
- Use **cross-validation** to tune k.

## 🧮 Weighted KNN
- Assigns **more influence** to neighbors that are closer.
- Uses weights = 1/distance to scale neighbor contributions.
- Can be more accurate and robust to outliers.

## 📊 Advantages
- Easy to understand and implement.
- Flexible: works for classification and regression.
- No training time.

## ⚠️ Disadvantages
- Slow for large datasets (computational cost).
- Sensitive to scale and feature relevance.
- Must normalize data before use.