# K Nearest Neighbors Classification (KNN)

## What is K Nearest Neighbors Classification (KNN) ?

K-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification where the input consists of the k closest training examples in the feature space. 

In k-NN classification, an object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.

k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification.(Credit - Wikipedia)

## How to Implement K Nearest Neighbors Classification (KNN) ?

### Importing the libraries

In [5]:
#Data Processing Libraries
import numpy as np
import pandas as pd

# Machine Learning Library
from sklearn.preprocessing import LabelEncoder # Encode Categorical Variable to Numerical Variable
from sklearn.metrics import accuracy_score # Library for model evaluation
from sklearn.model_selection import train_test_split # Library to split datset into test and train

from sklearn.neighbors import KNeighborsClassifier # K Nearest neighbors Classifier

### Getting the data

In [10]:
iris_dataset = pd.read_csv("C:\\Users\\jagan\\OneDrive\\Documents\\Machine Learning - Projects\\Iris\iris_dataset.csv")

In [11]:
iris_dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
sepal_length    150 non-null float64
sepal_width     150 non-null float64
petal_length    150 non-null float64
petal_width     150 non-null float64
class           150 non-null object
dtypes: float64(4), object(1)
memory usage: 5.9+ KB


In [12]:
iris_dataset.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


### Now converting the class into numerical variables

In [13]:
labelencoder_species = LabelEncoder()
labelencoder_X=LabelEncoder()
iris_dataset['class'] = labelencoder_species.fit_transform(iris_dataset['class'])

In [14]:
iris_dataset.head()  # Iris-Setosa - 0 ; Iris-virsicolor - 1 ; Iris-virginica - 2;

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


### Getting columns into X (feature variable) and y(target variable)

In [16]:
X = iris_dataset .iloc[:,0:4].values
y = iris_dataset.iloc[:,4].values

In [17]:
X[1:5]

array([[4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2]])

In [18]:
y[1:5]

array([0, 0, 0, 0], dtype=int64)

### Dividing the dataset into test & train

In [19]:
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.30)
print('There are {} samples in the training set and {} samples in the test set'.format(X_train.shape[0], X_test.shape[0]))

There are 105 samples in the training set and 45 samples in the test set


## Run KNN Classification

In [24]:
# This is the KNN classifier from Scikit Learn library ; n_neighbors can be varied to improve the model.
classifier = KNeighborsClassifier(n_neighbors = 5) 

In [21]:
classifier.fit(X_train, y_train) # Fitting the training Set

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform')

In [22]:
# Predicting the output on training datset
y_pred = classifier.predict(X_test)

In [23]:
#Accuracy
score_test = accuracy_score(y_test, y_pred)
print(score_test)

0.9111111111111111


### KNN Classifier Parameter details
http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html