# Examples for Homework 5.2: Convolutional Neural Networks and ASL
Dartmouth College, LING48/CS72, Spring 2023<br>
Rolando Coto-Solano (Rolando.A.Coto.Solano@dartmouth.edu)

A convolutional neural network (ConvNet/CNN) is optimized to understand visual data. This code in particular comes from this URL:
https://github.com/samurainote/CNN_for_Sign_Language_Images/blob/master/CNN_for_Sign_Language_Images.ipynb

In this program, we used a CNN to learn 6 signs from ASL finger spelling (a way to import words from other languages, such as English). The training set
has information for approximately 1100 different pictures for each sign.
The information is presented as the black-and-white pixel values for 784
pixels (28*28). The training set also contains the gold labels for each
picture (a=0, b=1, c=2, d=3, e=4, f=5). The testing set has information
for 2063 pictures for each ASL sign. (331 'a', 432 'b', 310 'c', 245 'd',
498 'e' and 247 'f'. It uses the same format as the training set. The
original information (with pictures for all the ASL signs) comes from here:
https://www.kaggle.com/datamunge/sign-language-mnist
 
There are many good sites where you can learn the intuitions behind convolutional networks. These are some examples:

(1) https://www.cs.ryerson.ca/~aharley/vis/conv/<br>
(2) https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/<br>
(3) https://www.youtube.com/watch?v=iaSUYvmCekI<br>
(4) https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/<br>
(5) https://www.freecodecamp.org/news/an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050/

You need to perform three tasks:

(1) (1)	Study the links above and explain the code in the section *Convolutional Neural Network Structure* below. What are the elements of this network? What kind of layers does it have? What is a kernel? What is a filter? What is pooling? Explain all of these as simply and plainly as you can.

(2) Run the program. Right now it's set to perform one epoch of training. How is the network behaving after one epoch of training? (Report this based on the accuracy, the precision and the recall for each of the letters).

(3) Change the program so that it runs five epochs. How is the network behaving after five epochs of training? How have the values of accuracy, precision and recall changed for the ASL fingerspell letters?

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import gdown

from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D, Flatten, Dropout
from sklearn.metrics import confusion_matrix

In [None]:
# Download ASL data

url = "https://drive.google.com/uc?id=19AGBPZdufJbwB8JOs9Ej0MXiwy_m-j-Q"
output = 'sign-test-a-f.csv'
gdown.download(url, output, quiet=False)

url = "https://drive.google.com/uc?id=1BYZCq6JqHUxuXHA4udF_iZHwXBNqvNuA"
output = 'sign-train-a-f.csv'
gdown.download(url, output, quiet=False)

In [None]:
# Load ASL data
train = pd.read_csv("sign-train-a-f.csv")
test = pd.read_csv("sign-test-a-f.csv")

In [None]:
# Split the samples into a training and a test set

totalSamplesTraining = len(train)
totalSamplesTesting  = len(test)

train_T = train["label"]
train.drop("label", axis=1, inplace=True)

classes = len(train_T.unique())
classes

test_T = test["label"]
test.drop("label", axis=1, inplace=True)

# Convolutional Neural Network Structure

In [None]:
# Convolutional Neural Network Structure
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), activation="relu", input_shape=(28,28,1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation="relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(units=128, activation="relu"))
model.add(Dropout(rate=0.2))
model.add(Dense(units=classes, activation="softmax"))

In [None]:
# Compile the model
model.compile(optimizer=keras.optimizers.Adam(),
              loss="categorical_crossentropy",
              metrics=['accuracy'])
			  
# Reshape data to turn the string of numbers
# into a two-dimensional structure.

from sklearn.preprocessing import LabelBinarizer
label_binarizer = LabelBinarizer()
train_labels = label_binarizer.fit_transform(train_T)
print("=== train labels ===")
print(train_labels)

test_labels = label_binarizer.fit_transform(test_T)
x_train = train.values.reshape(totalSamplesTraining,28,28,1)
y_train = train_labels.reshape(totalSamplesTraining, classes)
x_test = test.values.reshape(totalSamplesTesting,28,28,1)
y_test = test_labels.reshape(totalSamplesTesting, classes)

In [None]:
# Train the model

from sklearn.model_selection import train_test_split
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size = 0.2)

BATCH_SIZE = 64
EPOCHS = 1

history = model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=EPOCHS, batch_size=BATCH_SIZE)
model.evaluate(x=x_test, y=y_test, verbose=1)

from sklearn.metrics import classification_report

y_pred = model.predict(x_test)
print(classification_report(y_test.round(), y_pred.round()))

In [None]:
argMaxYTest = np.argmax(y_test, axis=1)
argMaxYPred = np.argmax(y_pred, axis=1)

print(confusion_matrix(argMaxYTest, argMaxYPred))