### K-Nearest Neighborhood
KNN was born out of research done for the armed forces. Fix and Hodge - two officers of USAF School of Aviation Medicine - wrote a technical report in 1951 introducing the KNN algorithm.

KNN can be used in both Regression and Classification predictive problems. However, it’s mostly used in classification since it fairs across all parameters evaluated when determining the usability of a technique Prediction Power Calculation Time

It is used due to its ease of interpretation and low calculation time.

Companies like Amazon or Netflix use KNN when recommending books to buy or movies to watch.

How do these companies make recommendations?

Well, these companies gather data on the books you have read or movies you have watched on their website and apply KNN. The companies will input your available customer data and compare that to other customers who have purchased similar books or have watched similar movies.

The books and movies recommended depending on how the algorithm classifies that data point.

### How does KNN works?
The k-nearest neighbor algorithm stores all the available data and classifies a new data point based on the similarity measure (e.g., distance functions). This means when new data appears. Then it can be easily classified into a well-suited category by using K-NN algorithm.

Suppose there are two classes,

i.e., Class A and Class B,

and we have a new unknown data point “?”,

so this data point will lie in which of these classes. To solve this problem, we need a K-NN algorithm. The data point is classified by a majority vote of its neighbors, with the data point being assigned to the class most common amongst its K nearest neighbors measured by a distance function.

Here, we can see that if k = 3, then based on the distance function used, the nearest three neighbors of the data point is found and based on the majority votes of its neighbors, the data point is classified into a class.

In the case of k = 3, for the above diagram, it's Class B.

Similarly, when k = 7, for the above diagram, based on the majority votes of its neighbors, the data point is classified to Class A.

### K-Nearest Neighbors
KNN algorithm applies the birds of a feather. It assumes that similar things are near to each other; that is, they are nearby.

The idea of similarity (sometimes called closeness, proximity, or distance).

Euclidean distance or straight-line distance is a popular and familiar choice of calculating distance.

### Choosing the right value for K
To get the right K, you should run the KNN algorithm several times with different values of K and select the one that has the least number of errors.
-As K approaches 1, your prediction becomes less stable.

-As your value of K increases, your prediction becomes more stable due to the majority of voters.

-When you start receiving an increasing number of errors, you should know you are pushing your K too far.

-Taking a majority vote among labels needs K to be an odd number to have a tiebreaker. -

### Working of KNN Algorithm in Machine
Step 1 – When implementing an algorithm, you will always need a data set. So, you start by loading the training and the test data.

Step 2 – Choose the nearest data points (the value of K). K can be any integer.
Step 3:

3.1 – Use Euclidean distance, Hamming, or Manhattan to calculate the distance between test data and each row of training. The Euclidean method is the most used when calculating distance.

3.2 – Sort data set in ascending order based on the distance value.

3.3 – From the sorted array, choose the top K rows.

3.4 – Based on the most appearing class of these rows, it will assign a class to the test point.

Step 4 – End

### Advantages of KNN
1. Quick calculation time
2. Simple algorithm – to interpret
3. Versatile – useful for regression and classification
4. High accuracy – you do not need to compare with better-supervised learning models
5. No assumptions about data – no need to make additional assumptions, tune several parameters, or build a model. This makes it crucial in nonlinear data case.

### Disadvantages of KNN
1. Accuracy depends on the quality of the data
2. With large data, the prediction stage might be slow
3. Sensitive to the scale of the data and irrelevant features
4. Require high memory – need to store all of the training data
5. Given that it stores all of the training, it can be computationally expensive

In [1]:
#import the library
import numpy as np
import pandas as pd

In [2]:
iris=pd.read_csv("Iris.csv")
iris=iris.iloc[:,1:]
iris.head()

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [3]:
X=iris.iloc[:,0:4] # features
Y=iris.iloc[:,4] # Labels

In [4]:
#label encoding
from sklearn.preprocessing import LabelEncoder
LE=LabelEncoder()
Y=LE.fit_transform(Y)

In [5]:
#Split the data into train and test part
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.33)

In [6]:
# create and train the model
from sklearn.neighbors import KNeighborsClassifier
classifier=KNeighborsClassifier(n_neighbors=3)
classifier.fit(x_train,y_train)

KNeighborsClassifier(n_neighbors=3)

In [7]:
y_pred=classifier.predict(x_test)

In [8]:
from sklearn.metrics import confusion_matrix, accuracy_score

cm=confusion_matrix(y_test,y_pred)
cm

print("Accuracy of the model is ",accuracy_score(y_test,y_pred)*100,"%")

Accuracy of the model is  96.0 %


In [9]:
# Evaluate alternative K-values for better predictions
k_list=list(range(1,50,2))
acc_score=[]
err_rate=[]
for x in k_list:
    classifier=KNeighborsClassifier(n_neighbors=x)
    classifier.fit(x_train,y_train)
    y_pred=classifier.predict(x_test)
    acc_score.append(accuracy_score(y_test,y_pred))
    err_rate.append(1-accuracy_score(y_test,y_pred))

In [10]:
# plot error rate
print("best k:",k_list[err_rate.index(min(err_rate))])

acc_score[5]

best k: 3


0.96

In [11]:
# Adjust K value per error rate evaluations
classifier=KNeighborsClassifier(n_neighbors=9)
classifier.fit(x_train,y_train)
y_pred=classifier.predict(x_test)
print("Accuracy of the model is ",accuracy_score(y_test,y_pred),"%")
print("Error rate of the model is ",round(1-accuracy_score(y_test,y_pred),2),"%")

Accuracy of the model is  0.94 %
Error rate of the model is  0.06 %
