# Introduction

**The Sign Language Dataset is used here, An approach using Autoencoders will be implemented**

![](https://storage.googleapis.com/kagglesdsdata/datasets%2F3258%2F5337%2Famer_sign2.png?GoogleAccessId=databundle-worker-v2@kaggle-161607.iam.gserviceaccount.com&Expires=1596903587&Signature=IOtoOmYCiLXST2%2FX%2BrOp13lJNdRF%2FyEKXh8JDDUmIP%2FpM%2BpBzOs4SPAwqBdyoVDwIePM6UmiZzf6fhCRgOKYv2DZpkqTtyxRLRhS3saS3rEi%2BpnJH2Y%2F%2Bo6sfLZeV7yjiHhazWNlpq4UVxEHh11zLeHISfR93xWcba2dNRYoillLROPWpFs5fu8N1W6m9TvLfuO3dBkrMJRD%2Fj8j%2BLvduoCDmBAnDCSVadjdBpKVsrBRCsFctC5XDt79YmsGKxAX8lXQBN%2BLKZwZ0%2FlpP%2F%2BXSuEpqMp4cGartmwGBYLLVPfTJ0s6Pe9BHCp1EYmUJUOFZsRFd3Cy5yDLDmXqhLogMA%3D%3D)

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *
import matplotlib.pyplot as plt
import seaborn as sns
import cv2 as cv2
from sklearn.model_selection import train_test_split

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Loading Dataset

In [None]:
train_df = pd.read_csv('../input/sign-language-mnist/sign_mnist_train/sign_mnist_train.csv')
train_df.shape

In [None]:
train_df.head()

In [None]:
test_df = pd.read_csv('../input/sign-language-mnist/sign_mnist_test/sign_mnist_test.csv')
test_df.shape

In [None]:
test_df.head()

**Checking for missing values**

In [None]:
train_df.isnull().sum()

In [None]:
test_df.isnull().sum()

**So no missing data**

In [None]:
train_df.dtypes

**So our dataset contains all int values let's check our label column**

In [None]:
train_df['label'].values

**So our label contains Categorical variables data so we will have to binarize it later**

**Let us check the label column data frequency**

In [None]:
labels = train_df['label'].values

In [None]:
unique_set = np.unique(np.array(labels))

In [None]:
plt.figure(figsize=(10,10))
sns.set(style="darkgrid")
sns.countplot(y=labels, data=train_df, palette='Set2')

In [None]:
train_df.drop(['label'],axis=1,inplace=True)

# Displaying Images

In [None]:
img = cv2.imread('../input/sign-language-mnist/amer_sign2.png')
plt.imshow(img)

In [None]:
img = cv2.imread('../input/sign-language-mnist/american_sign_language.PNG')
plt.imshow(img)

**Let's Display the images in training data**

In [None]:
images = train_df.values
images = np.array([np.reshape (i, (28,28)) for i in images])
images = np.array([i.flatten() for i in images])

In [None]:
plt.imshow(images[0].reshape(28, 28))

# LabelBinarizer

**Binarize labels in a one-vs-all fashion**

**Several regression and binary classification algorithms are available in scikit-learn. A simple way to extend these algorithms to the multi-class classification case is to use the so-called one-vs-all scheme.**

**At learning time, this simply consists in learning one regressor or binary classifier per class. In doing so, one needs to convert multi-class labels to binary labels (belong or does not belong to the class). LabelBinarizer makes this process easy with the transform method.**

**Here the values are in categorical(nominal) so we are using LabelBinarizer**

In [None]:
from sklearn.preprocessing import LabelBinarizer

lb = LabelBinarizer()
labels = lb.fit_transform(labels)

**Let's see how the data looks like now**

In [None]:
labels[:5]

# Model Developement

In [None]:
x_train, x_test, y_train, y_test = train_test_split(images, labels, test_size=0.2, random_state=140)

print('Training Data shape : ',x_train.shape,  y_train.shape)
print('Testing Data shape : ',x_test.shape,  y_test.shape)

In [None]:
batch_size=256
EPOCHS = 50

**reshaping x_train and x_test**

In [None]:
x_train, x_test = x_train.astype(np.float32), x_test.astype(np.float32)
# Flatten images to 1-D vector of 784 features (28*28).
x_train, x_test = x_train.reshape([-1, 784]), x_test.reshape([-1, 784])
# Normalize images value from [0, 255] to [0, 1].

x_train = x_train/255.
x_test = x_test/255.

In [None]:
plt.imshow(x_train[0].reshape(28,28)) #Since image size is 784 so (28,28)
plt.axis('off')

# Let's start with AutoEncoders!!

**An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation for a set of data, typically for dimensionality reduction**

![](https://camo.githubusercontent.com/5017c87b396b2745f13a289f913b037ea8352d50/687474703a2f2f64726976652e676f6f676c652e636f6d2f75633f6578706f72743d766965772669643d3171546b5178373661424a4e68736b334954725542456644517a4451747a4d6330)

**Here I am sharing some links that can give you heads-up about AutoEncoders**
1. https://www.youtube.com/watch?v=H1AllrJ-_30
2. https://www.youtube.com/watch?v=7mRfwaGGAPg

In [None]:
input_img = tf.keras.layers.Input(shape=(784,), name = "input")

# this is the encoded representation of the input
encoded = Dense(1024, activation='relu', name="emb_0")(input_img)
encoded = Dense(512, activation='relu', name="emb_1")(encoded)
encoded = Dense(256, activation='relu', name="emb_2")(encoded)
encoded = Dense(128, activation='relu', name="emb_3")(encoded)
encoded = Dense(64, activation='relu', name="emb_4")(encoded)
encoded = Dense(16, activation='relu', name="emb_5")(encoded)
latent_vector = Dense(2, activation='relu', name="latent_vector")(encoded)

**More deeper the layers, better the performance**

In [None]:
# this is the loss reconstruction of the input
decoded = Dense(16, activation='relu', name="dec_1")(latent_vector)
decoded = Dense(64, activation='relu', name="dec_3")(decoded)
decoded = Dense(128, activation='relu', name="dec_4")(decoded)
decoded = Dense(256, activation='relu', name="dec_5")(decoded)
decoded = Dense(512, activation='relu', name="dec_6")(decoded)
decoded = Dense(1024, activation='relu', name="dec_7")(decoded)

output_layer = Dense(784, activation = 'sigmoid', name="output")(decoded)

In [None]:
autoencoder = tf.keras.models.Model(input_img, output_layer)

**Let's see the model**

In [None]:
autoencoder.summary()

**Let's create a separate Encoder Model as well**

In [None]:
encoder = tf.keras.models.Model(input_img, latent_vector)
encoder.summary()

# Autoencoder Model 

**One exception while training autoencoders is that it only trains itself on x_train and x_test, as they had been reshaped**

In [None]:
autoencoder.compile(optimizer='adam', loss='mse')
auto_history = autoencoder.fit(x_train, x_train, epochs=EPOCHS, batch_size=batch_size,validation_data=(x_test, x_test))

In [None]:
decoded_imgs = autoencoder.predict(x_test)

**Let's compare the original images and new generated images from AutoEncoder**

In [None]:
n = 10 
plt.figure(figsize=(20, 4))
for i in range(n):
    # display original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
    
    # display reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i].reshape(28, 28))
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

plt.show()