# Sign Language Classification
Welcome to the sign language classification dataset, where we will be given a set images depicting hand gestures of the sign language and use those to create a model which predicts them. This prediction is useful because it can help deaf people communicate with others through the use of these gestures.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
from cv2 import imread
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.utils import to_categorical
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Conv2D, MaxPooling2D, Dropout, BatchNormalization, Dense, Flatten

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
train = pd.read_csv('../input/sign-language-mnist/sign_mnist_train/sign_mnist_train.csv')
test = pd.read_csv('../input/sign-language-mnist/sign_mnist_test/sign_mnist_test.csv')

# Visualisation
Before we begin data cleaning, we will look at the different sign gestures and how they appear. Below are three plots which show the different gesticulations and their corresponding letters in the alphabet.

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(20, 20))

ax1.imshow(imread('../input/sign-language-mnist/amer_sign2.png'))
ax1.axis('off')

ax2.imshow(imread('../input/sign-language-mnist/amer_sign3.png'))
ax2.axis('off')

ax3.imshow(imread('../input/sign-language-mnist/american_sign_language.PNG'))
ax3.axis('off')

plt.show()

# Feature engineering
The first step we will be taking towards creating a model which predicts this dataset is feature engineering by reshaping the X data into a form which can be inputted into the neural network. We will also categorise the y data using to_categorical.

In [None]:
train

In [None]:
test

In [None]:
X_train = train.drop('label', axis=1)
y_train = train['label']

X_test = test.drop('label', axis=1)
y_test = test['label']

In [None]:
X_train = np.array(X_train).reshape(27455, 28, 28, 1)
X_test = np.array(X_test).reshape(7172, 28, 28, 1)
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

Dividing the train and test sets by 255 is important engineering because it helps improve the accuracy of the model.

In [None]:
X_train = X_train / 255.0
X_test = X_test / 255.0

Now we use an ImageDataGenerator which creates random augmentations of the images. This is useful as it can lessen the impact which non-relevant features of the images have upon our predictor and it provides more data for it to use.

In [None]:
idg = ImageDataGenerator()
train_gen = idg.flow(X_train, y_train, batch_size=64)
test_gen = idg.flow(X_test, y_test, batch_size=64)

# Classification
Finally, we create the ConvNet model which is used to classify the sign gesture images.
#### It has:
* 2 hidden layers with 32 units
* 2 hidden layers with 64 units
* 2 dropout layers with a value of 0.2
* 2 Max Pooling layers
* a hidden activation function of relu
* a dense output layer with 25 units and softmax activation

In [None]:
model = Sequential()

model.add(Conv2D(32, kernel_size=(3, 3), strides=(1, 1), activation='relu', input_shape=(28, 28, 1)))
model.add(BatchNormalization())
model.add(Conv2D(32, kernel_size=(3, 3), strides=(1, 1), activation='relu'))
model.add(BatchNormalization())

model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))

model.add(Conv2D(64, kernel_size=(3, 3), strides=(1, 1), activation='relu'))
model.add(BatchNormalization())
model.add(Conv2D(64, kernel_size=(3, 3), strides=(1, 1), activation='relu'))
model.add(BatchNormalization())

model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))

model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(25, activation='softmax'))

The model is compiled using a categorical crossentropy loss, a categorical accuracy metric and an Adam optimizer.

In [None]:
model.compile(loss='categorical_crossentropy', metrics='categorical_accuracy', optimizer='adam')
history = model.fit_generator(train_gen, validation_data=test_gen, epochs=10)

Here are the accuracies and losses from the ConvNet:

In [None]:
results = history.history

for i in results:
    plt.plot(results[i])
    plt.title(i+' over epochs')
    plt.ylabel(i)
    plt.xlabel('epochs')
    plt.show()

## Thank you for reading my notebook.
## If you enjoyed this notebook and found it helpful, please upvote it as it will help me make more of these.