# üü¶ **K-Nearest Neighbors (KNN)**

*KNN is a simple and powerful machine learning algorithm used mainly for classification.*

---

## üîç **What is KNN?**

**KNN (K-Nearest Neighbors)** ek **supervised learning algorithm** hai jo prediction banata hai **nearest (closest) points** ko dekh kar.

Jab hum aik new data point dete hain, KNN:

1. Uske **k nearest neighbors** dhoondta hai
2. Un ke labels dekhta hai
3. **Majority vote** ke basis par output decide karta hai

---

## üß† **How KNN Works (Step-by-Step)**

### **1Ô∏è‚É£ Choose K**

* K = number of neighbors
* Example: K = 3 ‚Üí 3 nearest points dekhega

### **2Ô∏è‚É£ Calculate Distance**

KNN nearest points dhoondta hai using:

* **Euclidean Distance** (most common)
* Manhattan Distance
* Minkowski Distance

### **3Ô∏è‚É£ Pick Top K Neighbors**

Nearest points ko ascending order me select karta hai.

### **4Ô∏è‚É£ Majority Voting (Classification)**

Jis class ka count zyada ho ‚Üí final prediction.

### **5Ô∏è‚É£ Averaging (Regression)**

Regression me nearest values ka **average** leta hai.

---

## üéØ **Simple Example**

Assume K = 3
Nearby labels:
üê± Cat
üê± Cat
üê∂ Dog

**Predicted Class ‚Üí Cat** (majority = Cat)

---

## üß© **Important Concepts**

### **üìå 1. Choosing K**

* **Small K** ‚Üí noisy, unstable
* **Large K** ‚Üí smooth but slow
* Best K usually odd number (3, 5, 7)

### **üìå 2. Distance Matters**

Points ka distance jitna kam ‚Üí similarity zyada.

### **üìå 3. Feature Scaling**

Important!
KNN distance-based hai, isliye data ko scale karna zaroori hota hai:

* **StandardScaler**
* **MinMaxScaler**

---

## üëç **Advantages**

‚úî Very simple
‚úî Easy to understand
‚úî No training phase required
‚úî Performs well with small datasets

---

## üëé **Disadvantages**

‚ùå Slow for large datasets
‚ùå Sensitive to noisy/outlier data
‚ùå Must do scaling
‚ùå High memory use

---

## üß™ **KNN in Python (Simple Example)**

```python
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Scale data
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# KNN model
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Score
print("Accuracy:", knn.score(X_test, y_test))
```

---

## üèÅ **Summary**

| Topic              | Explanation              |
| ------------------ | ------------------------ |
| **Algorithm Type** | Supervised               |
| **Main Use**       | Classification           |
| **Key Idea**       | Nearest neighbors decide |
| **Important Step** | Distance calculation     |
| **Needs Scaling**  | Yes                      |


In [43]:
# Example of KNN classifier on IRIS data uing Seaborn

# import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt


In [44]:
# load the dataset
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [45]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


In [46]:
# Split the data into X and y 
X = df.drop('species', axis=1)
y = df[['species']]

In [47]:
# load and fit the KNN  classifier on the data
from sklearn.neighbors import KNeighborsClassifier


In [48]:
# train test split the data
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, 
random_state=42)



In [49]:
model = KNeighborsClassifier(n_neighbors=5)
# fit the model on the training data
model.fit(X_train, y_train)

  return self._fit(X, y)


0,1,2
,n_neighbors,5
,weights,'uniform'
,algorithm,'auto'
,leaf_size,30
,p,2
,metric,'minkowski'
,metric_params,
,n_jobs,


In [50]:
# predict the model 
y_pred = model.predict(X_test)