# Homework 4.2: Convolutional Neural Networks and ASL
Dartmouth College, LING48/CS72, Winter 2024<br>
Kenneth Lai (Kenneth.Han.Lai@dartmouth.edu)

A convolutional neural network (ConvNet/CNN) is optimized to understand visual data.
This code in particular comes from this URL:
https://github.com/samurainote/CNN_for_Sign_Language_Images/blob/master/CNN_for_Sign_Language_Images.ipynb
Code converted to PyTorch by Colin Kearns (Colin.R.Kearns.25@dartmouth.edu)

In this program, we used a CNN to learn 6 signs from ASL finger spelling (a way to import words from other languages, such as English). The training set has information for approximately 1100 different pictures for each sign. The information is presented as the black-and-white pixel values for 784 pixels (28*28). The training set also contains the gold labels for each picture (a=0, b=1, c=2, d=3, e=4, f=5). The testing set has information
for 2063 pictures for each ASL sign. (331 'a', 432 'b', 310 'c', 245 'd', 498 'e' and 247 'f'. It uses the same format as the training set. The original information (with pictures for all the ASL signs) comes from here: https://www.kaggle.com/datamunge/sign-language-mnist

There are many good sites where you can learn the intuitions behind convolutional networks. These are some examples:

(1) https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/<br>
(2) https://www.youtube.com/watch?v=iaSUYvmCekI<br>
(3) https://adeshpande3.github.io/A-Beginner%27s-Guide-To-Understanding-Convolutional-Neural-Networks/<br>
(4) https://www.freecodecamp.org/news/an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050/

Similarly to part 1, you will build a basic CNN, by filling in the `__init__` and `forward` functions in the `Model` class, as well as the training loop. Please build the CNN as follows:

- Three convolution “blocks”. Each block should contain:
 - A 2D convolution layer
 - A ReLU activation function
 - A 2D max pooling layer

  Because the images are grayscale, the first convolution layer has 1 input channel (if they were color, we would have 3 input channels). The first convolution layer should have 32 output channels, and the second and third should have 64 output channels. Each convolution layer should use a kernel size of (3, 3) and a stride length of 1, while each max pooling layer should use a kernel size of (2, 2) and a stride length of 2.


- A “flatten” layer (to turn 2D images into vectors)
- A hidden Linear layer with 128 neurons
- A ReLU activation function
- An output Linear layer with `classes` neurons

Note that you should be able to reuse the training loop you wrote for part 1. As such, it will only be graded once (in part 1).

Then, you need to write answers to these two questions:

1. Study the links above and explain the structure of the CNN in your own words. What is a kernel? What is pooling? Explain all of these as simply and plainly as you can. Your answer should include how the size of the image changes as it passes through the convolution blocks.


2. Run the program. How is the network behaving after one epoch of training? (Report this based on the accuracy, the precision, and the recall for each of the letters). Include screenshots of your results.

In [1]:
# load packages

import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
from torch.utils.data import DataLoader, TensorDataset

In [2]:
# Load ASL data

train = pd.read_csv("sign-train-a-f.csv")
test = pd.read_csv("sign-test-a-f.csv")

In [3]:
# Split the samples into a training and a test set

totalSamplesTraining = len(train)
totalSamplesTesting  = len(test)

train_T = train["label"]
train.drop("label", axis=1, inplace=True)

test_T = test["label"]
test.drop("label", axis=1, inplace=True)

In [7]:
# Define the PyTorch model
class CNNModel(nn.Module):
    def __init__(self, classes):
        super(CNNModel, self).__init__()
        # Add the layers and activation functions here
        self.layers = nn.Sequential(
            
            #? first convolutional layer
            nn.Conv2d(1, 32, kernel_size=3, stride=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            #? second convolutional layer
            nn.Conv2d(32, 64, kernel_size=3, stride=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            #? third convolutional layer
            nn.Conv2d(64, 64, kernel_size=3, stride=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            #? flatten the output
            nn.Flatten(),
            
            #? 128-neurons linear layer
            nn.Linear(64, 128),
            
            #? relu activation
            nn.ReLU(),
            
            #? output layer with <classes> neurons
            nn.Linear(128, classes)
        )

    def forward(self, x):
        # Define the forward pass here
        y = self.layers(x)
        return y

In [17]:
# Initialize the PyTorch model
classes = len(train_T.unique()) # Specify the value of classes
pytorch_model = CNNModel(classes)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(pytorch_model.parameters())

# Convert data to PyTorch tensors
x_train_tensor = torch.tensor(train.values, dtype=torch.float32).view(totalSamplesTraining, 1, 28, 28)
y_train_tensor = torch.tensor(train_T, dtype=torch.long)

x_test_tensor = torch.tensor(test.values, dtype=torch.float32).view(totalSamplesTesting, 1, 28, 28)
y_test_tensor = torch.tensor(test_T, dtype=torch.long)

# Create DataLoader for training and testing data
train_dataset = TensorDataset(x_train_tensor, y_train_tensor)
test_dataset = TensorDataset(x_test_tensor, y_test_tensor)

# Batch size for training and testing
batch_size = 32

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

In [18]:
# Training loop
num_epochs = 1
for epoch in range(num_epochs):
    for inputs, labels in train_loader:
        
        #? Forward pass
        outputs = pytorch_model(inputs)
        
        #? Compute the loss
        loss = criterion(outputs, labels)
        
        #? Reset the gradient to zero
        optimizer.zero_grad()
        
        #? Backward pass
        loss.backward()
        
        #? Update the parameters
        optimizer.step()
        
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Evaluation on test set
pytorch_model.eval()
with torch.no_grad():
    test_outputs = pytorch_model(x_test_tensor)
    test_loss = criterion(test_outputs, y_test_tensor)
    _, test_preds = torch.max(test_outputs, 1)
    accuracy = (test_preds == y_test_tensor).float().mean()

    print(f"Test Loss: {test_loss.item()}, Test Accuracy: {accuracy.item()}")

# Convert predictions to numpy arrays for classification report
y_test_np = y_test_tensor.numpy()
test_preds_np = test_preds.numpy()

# Classification report
print(classification_report(y_test_np, test_preds_np))

Epoch [1/1], Loss: 0.0124
Test Loss: 0.07614024728536606, Test Accuracy: 0.979641318321228
              precision    recall  f1-score   support

           0       1.00      0.93      0.97       331
           1       1.00      1.00      1.00       432
           2       1.00      1.00      1.00       310
           3       1.00      0.92      0.96       245
           4       0.92      1.00      0.96       498
           5       1.00      1.00      1.00       247

    accuracy                           0.98      2063
   macro avg       0.99      0.98      0.98      2063
weighted avg       0.98      0.98      0.98      2063



In [19]:
print(confusion_matrix(y_test_np, test_preds_np))

[[309   0   0   0  22   0]
 [  0 431   0   0   0   1]
 [  0   0 310   0   0   0]
 [  0   0   0 226  19   0]
 [  0   0   0   0 498   0]
 [  0   0   0   0   0 247]]
