Classification is a technique where we categorize data into a given number of classes.
The main goal of a classification problem is to identify the category/class to which
a new data will fall under.

There are 7 most commonly used classification algorithms along with the python code: 
- Logistic Regression, 
- K-Nearest Neighbours, 
- Naive Bayes, 
- Decision Tree,
- Random Forest,
- Stochastic Gradient Descent,
- and Support Vector Machine

The K-nearest neighbors (KNN) algorithm is a type of supervised machine learning algorithms used for classification. 
KNN is extremely easy to implement in its most basic form, and 
yet performs quite complex classification tasks.

If you are similar to your neighbour, then you are one of them.


In [1]:
## importing the packages
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets


In [2]:
## loading the iris dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']

# Read dataset to pandas dataframe
dataset = pd.read_csv(url, names=names)

In [3]:
dataset.sample(5)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,class
19,5.1,3.8,1.5,0.3,Iris-setosa
101,5.8,2.7,5.1,1.9,Iris-virginica
64,5.6,2.9,3.6,1.3,Iris-versicolor
59,5.2,2.7,3.9,1.4,Iris-versicolor
143,6.8,3.2,5.9,2.3,Iris-virginica


In [4]:
dataset.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


In [4]:
x = dataset.iloc[:, :-1] ## dataset[['sepal_length','sepal_width','petal_length','petal_width']]
y = dataset.iloc[:, 4]  ##dataset['class']

In [5]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20)

In [6]:
from sklearn.neighbors import KNeighborsClassifier

In [7]:
classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(x_train, y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')

In [8]:
y_pred = classifier.predict(x_test)

In [9]:
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

[[11  0  0]
 [ 0  7  1]
 [ 0  1 10]]
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        11
Iris-versicolor       0.88      0.88      0.88         8
 Iris-virginica       0.91      0.91      0.91        11

       accuracy                           0.93        30
      macro avg       0.93      0.93      0.93        30
   weighted avg       0.93      0.93      0.93        30



In [10]:
d = confusion_matrix(y_test, y_pred)

In [11]:
import seaborn as sns

In [None]:
# https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm