<a href="https://colab.research.google.com/github/ranjithdurgunala/ML-LAB-2025-2026/blob/main/KNN_algorithm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**K-Nearest Neighbors (KNN)**
K-Nearest Neighbors (KNN) is one of the simplest and most intuitive machine learning algorithms. It's a supervised learning method that can be used for both classification and regression.

The core idea is simple: "You are defined by the company you keep." KNN assumes that similar data points exist in close proximity.

It's called a "lazy learner" (or instance-based learner) because it doesn't build a complex internal model during its "training" phase. Instead, it simply stores the entire training dataset. The real work (the computation) happens only when you ask it to make a prediction.

**How KNN Works: A Step-by-Step Guide**

Here is the step-by-step process for how KNN makes a prediction for a new, unseen data point:

Choose a 'k' value: You first decide how many "neighbors" to look at. This is the 'k' (e.g., k=3, k=5, k=10). This is the most important hyperparameter you need to set.

Calculate Distance: The algorithm takes the new data point and measures its distance to every single point in the training dataset. The most common distance metric is Euclidean distance (the straight-line "as the crow flies" distance).

Find the 'k' Neighbors: The algorithm identifies the top 'k' data points from the training set that are "closest" (have the smallest distance) to the new point.

Make a Prediction: This is where it splits based on the task:

For Classification (Voting): The new point is assigned the class that is most common among its 'k' neighbors. This is a simple "majority vote."

For Regression (Averaging): The new point is assigned the average value of its 'k' neighbors. (Sometimes a median is used to be more robust to outliers).

In [1]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# 1. Load the dataset
iris = load_iris()
X = iris.data  # Features (sepal length, sepal width, petal length, petal width)
y = iris.target # Target (species of iris)

# 2. Split the data into training and testing sets
# 70% for training, 30% for testing. random_state ensures reproducible results.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 3. Scale the features
# KNN is distance-based, so scaling features (e.g., to have 0 mean and unit variance)
# is crucial for good performance.
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# 4. Create and Train the KNN Classifier
# We'll start with k=5 (n_neighbors=5)
# 'n_neighbors' is the 'k' in KNN.
knn = KNeighborsClassifier(n_neighbors=5)

# Train the model (for KNN, this mostly just stores the data)
knn.fit(X_train, y_train)

# 5. Make Predictions
y_pred = knn.predict(X_test)

# 6. Evaluate the Model
accuracy = accuracy_score(y_test, y_pred)
print(f"--- Model Evaluation (k=5) ---")
print(f"Accuracy: {accuracy:.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

# 7. (Optional) Predict on a single new sample
# Let's invent a new flower and predict its species
# Note: The new sample must be scaled using the *same scaler*
new_flower = np.array([[5.1, 3.5, 1.4, 0.2]]) # A sample similar to the first iris
new_flower_scaled = scaler.transform(new_flower)

prediction = knn.predict(new_flower_scaled)
predicted_species = iris.target_names[prediction[0]]
print(f"\n--- New Prediction ---")
print(f"Prediction for {new_flower}: {predicted_species}")

--- Model Evaluation (k=5) ---
Accuracy: 1.0000

Classification Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        19
  versicolor       1.00      1.00      1.00        13
   virginica       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45


Confusion Matrix:
[[19  0  0]
 [ 0 13  0]
 [ 0  0 13]]

--- New Prediction ---
Prediction for [[5.1 3.5 1.4 0.2]]: setosa
