# Hand Gesture recognition app in python

Importing necessary libraries

In [1]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import numpy as np
import pandas as pd

Loading the dataset into the DataFrame

In [2]:
df = pd.read_csv('sign_mnist.csv')
df.head()

Unnamed: 0,label,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,3,107,118,127,134,139,143,146,150,153,...,207,207,207,207,206,206,206,204,203,202
1,6,155,157,156,156,156,157,156,158,158,...,69,149,128,87,94,163,175,103,135,149
2,2,187,188,188,187,187,186,187,188,187,...,202,201,200,199,198,199,198,195,194,195
3,2,211,211,212,212,211,210,211,210,210,...,235,234,233,231,230,226,225,222,229,163
4,13,164,167,170,172,176,179,180,184,185,...,92,105,105,108,133,163,157,163,164,179


Checking number of columns and rows

In [3]:
print(df.shape)

(10000, 785)


Assigning Independent variables to x and dependent variable to y

In [4]:
x = df.iloc[:, 1:]
y = df.iloc[:, 0]

Demensions of Independent and Dependent variables

In [5]:
print(x.shape)
print(y.shape)

(10000, 784)
(10000,)


Principle Component Analysis (PCA) is used to reduce the demensions of the independent variable (x), since we dont know how many number of components to reduce simply following the variance - covaraince rule of PCA algorithm which is ranging between 0.95 to 0.99, this automatically reduces to the best demensions possible.

In [6]:
pca = PCA(n_components=0.95)
pca.fit(x)
x_pca = pca.transform(x)
print(x_pca.shape)

(10000, 112)


Our columns has been reduced from 784 to 112 after applying PCA

Splitting the dataset into training 80% and test sets into 20%

In [7]:
x_train, x_test, y_train, y_test = train_test_split(
    x_pca, y, test_size=0.2, random_state=42)

Standardization of independent variable x - mean to 0 and standard deviation to 1

In [8]:
scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.fit_transform(x_test)

## Stochastic Gradient Descent Classifier 

Here, we are using loss function as log which is better in Classification algorithms.

In [9]:
sgd = SGDClassifier(loss='log', shuffle=True, random_state=42)

Fitting the data and predicting the data using test sets and training sets

In [10]:
sgd.fit(x_train_scaled, y_train)
y_pred = sgd.predict(x_test_scaled)
y_train_pred = sgd.predict(x_train_scaled)

using sklearn.metrics we use accuracy_score function to get the percentage accuracy of model, normally we use confusion matrix for classification algorithms since we already standardiized our data we are using accuracy as our metrics.

In [11]:
acc_score_test = accuracy_score(
    y_test, y_pred, normalize=True, sample_weight=None)
acc_score_train = accuracy_score(
    y_train, y_train_pred, normalize=True, sample_weight=None)

In [12]:
print('Stochastic gradient descent performance metrics :')
print('Accuracy score for test set:', acc_score_test*100)
print('Accuracy score for train set:', acc_score_train*100)

Stochastic gradient descent performance metrics :
Accuracy score for test set: 88.35
Accuracy score for train set: 94.875


## Decision Tree Classifier

Since our columns are 112 after dimensionality reduction, max_depth  is the depth of tree or the level of the tree = n/2 which is 112/2 = 51, max_depth = 51

In [13]:
dtc = DecisionTreeClassifier(max_depth=51, random_state=42)
dtc.fit(x_train_scaled, y_train)
y_pred_dtc = dtc.predict(x_test_scaled)
y_train_pred_dtc = dtc.predict(x_train_scaled)

print('Decision tree classifier performance metrics :')
print('Accuracy score for test set:', accuracy_score(y_test, y_pred_dtc)*100)
print('Accuracy score for train set:',
      accuracy_score(y_train, y_train_pred_dtc)*100)

Decision tree classifier performance metrics :
Accuracy score for test set: 88.44999999999999
Accuracy score for train set: 100.0


## Random Forest Classifier

Here, n_estimators is the number of different trees we want to create and train the model therefore n_estimators = 10 means we want 10 trees and max_depth = 51 is same as before which we used.

In [14]:
rfc = RandomForestClassifier(n_estimators=10, max_depth=51, random_state=42)
rfc.fit(x_train_scaled, y_train)
y_pred_rfc = rfc.predict(x_test_scaled)
y_train_pred_rfc = rfc.predict(x_train_scaled)

print('Random Forest Classifier performance metrics :')
print('Accuracy score for test set:', accuracy_score(y_test, y_pred_rfc)*100)
print('Accuracy score for train set:',
      accuracy_score(y_train, y_train_pred_rfc)*100)

Random Forest Classifier performance metrics :
Accuracy score for test set: 99.65
Accuracy score for train set: 99.9875


### Finally, comparing our performance metrics of 3 different Classifiers

Stochastic gradient descent has 88.35 %

Decision tree classifier has 88.44 %

Random Forest Classifier has 99.65 %



Since, Random Forest Classifier has highest accuracy of 99.65 %. Out of the 3 Classifiers, we could say Random Forest Classifiers is the best.