**SUPPORT VECTOR MACHINE vs K-NEAREST-NEIGHBOUR
**

  *In this notebook, my goal is to compare the SVM one vs rest algorithm, to KNN. These types of tasks, related to image recognition, usually can be solved easily with deep learning algorithms such as convolutional neural networks, but I wanted to use these two algorithms, because they can be also be efficient sometimes in these problems and require less computations, as you would not expect with CNN or neural networks in general. The dataset has 27455 training samples with pixel intensity of 28 x 28 handsign images for 24 different classes (different handsign per class) and 7172 testing samples.*



In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input/sign-language-mnist'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

**SUPPORT VECTOR MACHINE OvR**

*First we start with Support Vector Machine algorithm, and since the this is a multiclass problem, I chose one vs rest decision function. The script is pretty straight forward, we use all of our training data to estimate the parameters for our SVM model and afterwards validate it with the test dataset. The more data we use the more accurate our model will become. In this case, our classifier manages to predict the test images at an 84% accuracy.*

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix, f1_score, accuracy_score

df = pd.read_csv('/kaggle/input/sign-language-mnist/sign_mnist_train.csv')
df_test = pd.read_csv('/kaggle/input/sign-language-mnist/sign_mnist_test.csv')

x_train = df.iloc[0:27455, 1:785].values
y_train = df.iloc[0:27455, 0].values

x_test = df_test.iloc[0:7172, 1:785].values
y_test = df_test.iloc[0:7172,0].values

label_enc = LabelEncoder()
y_train = label_enc.fit_transform(y_train)
y_test = label_enc.fit_transform(y_test)

from sklearn.svm import SVC

classifier = SVC(decision_function_shape='ovr')

classifier.fit(x_train, y_train)
y_pred = classifier.predict(x_test)

acc = accuracy_score(y_test,y_pred)
f1 = f1_score(y_test,y_pred,average='micro')
cm = confusion_matrix(y_test,y_pred)

print(cm)
print(f1)
print(acc)

**K-NEAREST-NEIGHBOUR**

*KNN is considered as an instance based learning or lazy learning. That is why it does not require as much time as other classification methods to be fitted and to predict. In this case our KNN classifier has approximately a 60% accuracy. As for the k number of neighbours I chose 165, since that is the closest odd number to the squareroot of the training dataset size.*

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import confusion_matrix, f1_score, accuracy_score

df = pd.read_csv('/kaggle/input/sign-language-mnist/sign_mnist_train.csv')
df_test = pd.read_csv('/kaggle/input/sign-language-mnist/sign_mnist_test.csv')

x_train = df.iloc[0:27455, 1:785].values
y_train = df.iloc[0:27455, 0].values

pixel_number = np.arange(0,784,1)

x_test = df_test.iloc[0:7172, 1:785].values
y_test = df_test.iloc[0:7172,0].values

plt.scatter(x_train[0],pixel_number, s=0.4, c = 'r')
plt.scatter(x_train[1],pixel_number, s=0.4, c = 'b')
plt.scatter(x_train[2],pixel_number, s=0.4, c = 'g')
plt.scatter(x_train[3],pixel_number, s=0.4, c = 'y')
plt.scatter(x_train[4],pixel_number, s=0.4, c = 'm')
plt.show()

label_enc = LabelEncoder()
y_train = label_enc.fit_transform(y_train)
y_test = label_enc.fit_transform(y_test)

from sklearn.neighbors import KNeighborsClassifier

KNN = KNeighborsClassifier(n_neighbors=165)
classifier = KNN.fit(x_train,y_train)

y_pred = classifier.predict(x_test)
acc = accuracy_score(y_test,y_pred)
f1 = f1_score(y_test,y_pred,average='micro')
cm = confusion_matrix(y_test, y_pred)
print(cm)
print(f1)
print(acc)

**CONCLUSION**

*In the end we can say that SVM is much more efficient than KNN in predicting images from the given test dataset, but either way, both need to be used in these kinds of tasks, before jumping to deep learning algorithms, since they are much faster, even though most of the time less accurate.*