<a href="https://colab.research.google.com/github/ssddatascience/KNNimplementation/blob/main/KNNImplementation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt 

K nearest neighbor (KNN) is a simple and powerful machine learning algorithm used for classification and regression problems. It is an instance-based learning algorithm, which means it stores all the training instances in memory and predicts the class label for a new instance based on its K nearest neighbors.

The idea behind the KNN algorithm is simple: the algorithm takes an input data point, finds the K nearest neighbors in the training data, and assigns the class label that is most common among the K nearest neighbors. The algorithm is flexible and can be used for both binary and multi-class classification problems.

In this article, we will discuss the basics of the KNN algorithm. We will also provide a detailed example of how to apply the KNN algorithm in Python, including code blocks and a sample dataset.

In [12]:
df = pd.read_csv("diabetics.csv")

Example of Applying the KNN Algorithm in Python
In this section, we will implement a detailed example of KNN algorithm in Python using a different dataset. We will use the Pima Indians Diabetes dataset which contains information about the health status of patients with diabetes. The goal of this example is to predict whether a patient has diabetes based on various health measures.

First, let’s start by importing the required libraries and loading the dataset.

In [13]:
df

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
...,...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,0.171,63,0
764,2,122,70,27,0,36.8,0.340,27,0
765,5,121,72,23,112,26.2,0.245,30,0
766,1,126,60,0,0,30.1,0.349,47,1


How the KNN Algorithm Works
The KNN algorithm is based on the idea that similar instances are likely to belong to the same class. For example, if two houses are similar in terms of their location, size, and amenities, it is likely that they will have a similar price. The algorithm uses this principle to predict the class label for a new instance by finding its K nearest neighbors in the training data and determining the class label that is most common among the K nearest neighbors.

The KNN algorithm works in three steps:

Calculate the distance between the new instance and all instances in the training data.
Sort the distances in ascending order and select the K nearest neighbors.
Assign the class label that is most common among the K nearest neighbors to the new instance.

In [14]:
# Split the data into training and test sets
from sklearn.model_selection import train_test_split
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [15]:
# Fit the KNN model on the training data
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

In [16]:
# Use the trained KNN model to make predictions on the test data
y_pred = knn.predict(X_test)

In [17]:
# Evaluate the performance of the KNN model
from sklearn.metrics import accuracy_score
print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 0.6623376623376623


Output:


In conclusion, the KNN algorithm is a simple and effective algorithm for classification and regression problems. It is easy to implement and has relatively low computational cost compared to other complex algorithms. The accuracy of the KNN algorithm can be improved by selecting the optimal value of K and by using an appropriate distance measure. In this article, we have discussed the steps involved in KNN algorithm and implemented a detailed example in Python using the Pima Indians Diabetes dataset. 