<a href="https://colab.research.google.com/github/sravanmalla123/Data-Cleaning-Preprocessing/blob/main/K_Nearest_Neighbors_(KNN)_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Training the KNN Model

Now, we'll initialize and train the K-Nearest Neighbors classifier on the scaled training data. We'll start with a common choice for `n_neighbors`, such as 3 or 5, and then evaluate the model.

In [5]:
# Initialize KNN classifier with k=5
knn = KNeighborsClassifier(n_neighbors=5)

# Train the model using the scaled training data
knn.fit(X_train_scaled, y_train)

print("KNN model trained successfully.")

KNN model trained successfully.


### Making Predictions and Evaluating the Model

After training, we will use the model to make predictions on the scaled test data and evaluate its performance using accuracy score, a classification report, and a confusion matrix.

In [6]:
# Make predictions on the scaled test data
y_pred = knn.predict(X_test_scaled)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred, target_names=iris.target_names)
conf_matrix = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy:.4f}")
print("\nClassification Report:\n", report)
print("\nConfusion Matrix:\n", conf_matrix)

Accuracy: 0.9111

Classification Report:
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        15
  versicolor       0.79      1.00      0.88        15
   virginica       1.00      0.73      0.85        15

    accuracy                           0.91        45
   macro avg       0.93      0.91      0.91        45
weighted avg       0.93      0.91      0.91        45


Confusion Matrix:
 [[15  0  0]
 [ 0 15  0]
 [ 0  4 11]]


### K-Nearest Neighbors (KNN) Classification

KNN is a non-parametric, lazy learning algorithm used for both classification and regression tasks. It classifies a data point based on the majority class of its 'k' nearest neighbors in the feature space. Let's implement it using Scikit-learn.

In [2]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Create a DataFrame for better visualization (optional)
df = pd.DataFrame(data=X, columns=iris.feature_names)
df['target'] = y

print("Dataset Head:")
display(df.head())
print("\nTarget Names:", iris.target_names)

Dataset Head:


Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0



Target Names: ['setosa' 'versicolor' 'virginica']


The dataset is now loaded. Next, we will split the data into training and testing sets, and then scale the features.

In [4]:
from sklearn.model_selection import train_test_split

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

print(f"X_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")

# Scale the features
# Scaling is crucial for distance-based algorithms like KNN
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("\nFeatures scaled successfully.")

X_train shape: (105, 4)
X_test shape: (45, 4)

Features scaled successfully.
