# Notebook 7 â€” K-Nearest Neighbors (Instance-based)

**Dataset:** Iris (Kaggle / sklearn)

**Purpose:** Demonstrate KNN classification, parameter tuning (k), visualization of decision boundaries, and evaluation.

## Setup & Load (sklearn or CSV)
This dataset is small and available in `sklearn.datasets`. If you prefer the Kaggle CSV, download and place it in the working directory.

In [None]:
from sklearn.datasets import load_iris
import pandas as pd

data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
print('Iris loaded:', df.shape)
display(df.head())


## KNN pipeline and evaluation

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

X = df[data.feature_names]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Grid search for best k
param_grid = {'n_neighbors': list(range(1,16))}
knn = KNeighborsClassifier()
gs = GridSearchCV(knn, param_grid, cv=5)
gs.fit(X_train, y_train)
print('Best params:', gs.best_params_)

best_knn = gs.best_estimator_
y_pred = best_knn.predict(X_test)
print('Test accuracy:', accuracy_score(y_test, y_pred))
print('\nClassification report:\n', classification_report(y_test, y_pred))


## (Optional) Visualize decision boundaries for two selected features
Use Petal length/width or Sepal length/width for 2D plots.
