# Wine classification using the k-Nearest Neighbors algorithm

## Import of libraries

In [36]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, classification_report
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler

## Loading and preprocessing of dataset

In [37]:
wine = load_wine()
X = wine.data
y = wine.target

In [38]:
print(X[:173])

[[1.423e+01 1.710e+00 2.430e+00 ... 1.040e+00 3.920e+00 1.065e+03]
 [1.320e+01 1.780e+00 2.140e+00 ... 1.050e+00 3.400e+00 1.050e+03]
 [1.316e+01 2.360e+00 2.670e+00 ... 1.030e+00 3.170e+00 1.185e+03]
 ...
 [1.220e+01 3.030e+00 2.320e+00 ... 6.600e-01 1.830e+00 5.100e+02]
 [1.277e+01 2.390e+00 2.280e+00 ... 5.700e-01 1.630e+00 4.700e+02]
 [1.416e+01 2.510e+00 2.480e+00 ... 6.200e-01 1.710e+00 6.600e+02]]


In [39]:
print(y[:173])

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2]


In [40]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=100, test_size=0.4)

In [41]:
scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

This scaler works by scaling or normalizing features in our dataset that have differ scales, this is because kNN is sensitive to the scale of the features and if not scaled they will have a disproportionate effect on the distance metric used.

## k-Nearest Neighbors classification

The kNN algorithm is a classifier that aims to correctly class groups based on their similarity to the exist training data using a distance metric. The kNN algorithm aims to classify a new instance by computing its distances to all instances in the training data using a distance metric, and selecting the k instances with the smallest distances. The predicted class of the new instance is then determined by a majority vote among the k nearest neighbors. This process is repeated for each new instance to be classified. 

In [42]:
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

KNeighborsClassifier(n_neighbors=3)

In [43]:
y_pred = knn.predict(X_test)

  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)


In [44]:
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy: ', round(accuracy * 100, 3), '%')

Accuracy:  95.833 %


In [45]:
f1 = f1_score(y_test, y_pred, average='weighted')
print('F1 score: ', round(f1 * 100, 3), '%')

F1 score:  95.783 %


In [46]:
cr = classification_report(y_test, y_pred)
print(cr)

              precision    recall  f1-score   support

           0       0.91      1.00      0.95        21
           1       1.00      0.88      0.94        26
           2       0.96      1.00      0.98        25

    accuracy                           0.96        72
   macro avg       0.96      0.96      0.96        72
weighted avg       0.96      0.96      0.96        72

