# Method 1: Convolutional Neural Network

For the first method, we decided to use a CNN which benefits from images as input. We can apply filters and kernel functions to reduce the dimensionality of our images.

## 1. Getting Started

First, we import the libraries we are going to use to train our Convolutional Neural Network. We will also define the environments relative paths and some utility/auxiliary functions.

### 1.1 Imports

In [2]:
import os
import sys
import csv
import cv2
import numpy as np
import torch.nn as nn

### 1.2 Environment Configuration

In [3]:
# Setting the path of the training dataset (that was already provided to you)

running_local = True if os.getenv('JUPYTERHUB_USER') is None else False
DATASET_PATH = "../data/sign_lang_train"

# Set the location of the dataset
if running_local:
    # If running on your local machine, the sign_lang_train folder's path should be specified here
    local_path = "../data/sign_lang_train"
    if os.path.exists(local_path):
        DATASET_PATH = local_path
else:
    # If running on the Jupyter hub, this data folder is already available
    # You DO NOT need to upload the data!
    DATASET_PATH = "/data/mlproject22/sign_lang_train"

src_path = os.path.abspath(os.path.join(os.getcwd(), '..', 'src'))
if src_path not in sys.path:
    sys.path.append(src_path)

### 1.3 Auxiliary Functions

In [4]:
# Auxiliary function
def read_csv(csv_file):
    with open(csv_file, newline='') as f:
        reader = csv.reader(f)
        data = list(reader)
    return data

### 1.4 Load the Dataset (Sign-Languages)

In [5]:
import torch
from torch.utils.data import Dataset, DataLoader

from string import ascii_lowercase

class SignLangDataset(Dataset):
    """Sign language dataset"""

    def __init__(self, csv_file, root_dir, class_index_map=None, transform=None):
        """
        Args:
            csv_file (string): Path to the csv file with annotations.
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied on a sample.
        """
        self.data = read_csv(os.path.join(root_dir,csv_file))
        self.root_dir = root_dir
        self.class_index_map = class_index_map
        self.transform = transform
        # List of class names in order
        self.class_names = list(map(str, list(range(10)))) + list(ascii_lowercase)

    def __len__(self):
        """
        Calculates the length of the dataset-
        """
        return len(self.data)

    def __getitem__(self, idx):
        """
        Returns one sample (dict consisting of an image and its label)
        """
        if torch.is_tensor(idx):
            idx = idx.tolist()

        # Read the image and labels
        image_path = os.path.join(self.root_dir, self.data[idx][1])
        image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
        # The Shape of the image should be H, W, C where C=1
        image = np.expand_dims(image, 0)
        # The label is the index of the class name in the list ['0','1',...,'9','a','b',...'z']
        # because we should have integer labels in the range 0-35 (for 36 classes)
        label = self.class_names.index(self.data[idx][0])

        sample = {'image': image, 'label': label}
        if self.transform:
            sample = self.transform(sample)
        return sample

### 1.5 Import the Network

In [6]:
from src.conv_nn import ConvNN

#ConvNN?
#ConvNN??
#help(ConvNN)

cnn = ConvNN().float()
cnn.train()

filename = "../models/cnn_weights.pt"

if not os.path.exists(filename):
    cnn.init_weights()
    torch.save(cnn.state_dict(), filename)
    print(f"Initialized weights saved to '{filename}'")
else:
    print(f"File '{filename}' already exists, skipping save.")

cnn.load_state_dict(torch.load("../models/cnn_weights.pt"))

print("ConvNN imported successfully!")

File '../models/cnn_weights.pt' already exists, skipping save.
ConvNN imported successfully!


## 2. Train the model

### 2.1 Use dataloader to create batches for dataset

First, we will load our dataset by using a dataloader and test if everything works well by printing a random batch.

In [7]:
dataset = SignLangDataset(csv_file='labels.csv', root_dir='../data/sign_lang_train/')
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)

In [8]:
for batch in dataloader:
    images = batch['image']
    labels = batch['label']
    for image, label in zip(images, labels):
        print(f"Image-Dim: {image.shape} is {label}")
    break
print("Done!")

Image-Dim: torch.Size([1, 128, 128]) is 30
Image-Dim: torch.Size([1, 128, 128]) is 12
Image-Dim: torch.Size([1, 128, 128]) is 6
Image-Dim: torch.Size([1, 128, 128]) is 6
Image-Dim: torch.Size([1, 128, 128]) is 11
Image-Dim: torch.Size([1, 128, 128]) is 29
Image-Dim: torch.Size([1, 128, 128]) is 20
Image-Dim: torch.Size([1, 128, 128]) is 6
Image-Dim: torch.Size([1, 128, 128]) is 33
Image-Dim: torch.Size([1, 128, 128]) is 12
Image-Dim: torch.Size([1, 128, 128]) is 32
Image-Dim: torch.Size([1, 128, 128]) is 16
Image-Dim: torch.Size([1, 128, 128]) is 4
Image-Dim: torch.Size([1, 128, 128]) is 23
Image-Dim: torch.Size([1, 128, 128]) is 25
Image-Dim: torch.Size([1, 128, 128]) is 15
Image-Dim: torch.Size([1, 128, 128]) is 21
Image-Dim: torch.Size([1, 128, 128]) is 30
Image-Dim: torch.Size([1, 128, 128]) is 4
Image-Dim: torch.Size([1, 128, 128]) is 35
Image-Dim: torch.Size([1, 128, 128]) is 6
Image-Dim: torch.Size([1, 128, 128]) is 32
Image-Dim: torch.Size([1, 128, 128]) is 21
Image-Dim: torch.

### 2.2 Hyperparameters

In [20]:
EPOCHS = 5
BATCH_SIZE = 32

In [21]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(cnn.parameters(), lr=0.001)

In [23]:
trn_corr = 0
trn_total = 0
for epoch in range(EPOCHS):
    for i, batch in enumerate(dataloader):
        images = batch['image'].float()
        labels = batch['label']
        batch_size = images.size(0)

        y_pred = cnn(images)
        loss = criterion(y_pred, labels)

        predicted = torch.max(y_pred, 1)[1]
        batch_corr = (predicted == labels).sum()
        trn_corr += batch_corr
        trn_total += batch_size

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if i % 50 == 0:
            batch_acc = batch_corr / batch_size * 100
            overall_acc = trn_corr / trn_total * 100
            print(f"Epoch {epoch},\tBatch {i},\tloss: {loss.item()}:.8f,\tBatch Accuracy: {batch_acc:.4f}%,\tOverall Accuracy: {overall_acc:.4f}%")

Epoch 0,	Batch 0,	loss: 0.22706615924835205:.8f,	Batch Accuracy: 90.6250%,	Overall Accuracy: 90.6250%
Epoch 0,	Batch 50,	loss: 0.026161380112171173:.8f,	Batch Accuracy: 100.0000%,	Overall Accuracy: 95.9559%
Epoch 0,	Batch 100,	loss: 0.17253613471984863:.8f,	Batch Accuracy: 93.7500%,	Overall Accuracy: 95.0186%
Epoch 0,	Batch 150,	loss: 0.3576127886772156:.8f,	Batch Accuracy: 93.7500%,	Overall Accuracy: 95.0331%
Epoch 0,	Batch 200,	loss: 0.09526198357343674:.8f,	Batch Accuracy: 96.8750%,	Overall Accuracy: 95.1026%
Epoch 0,	Batch 250,	loss: 0.16261379420757294:.8f,	Batch Accuracy: 93.7500%,	Overall Accuracy: 95.3561%
Epoch 0,	Batch 300,	loss: 0.16095830500125885:.8f,	Batch Accuracy: 93.7500%,	Overall Accuracy: 95.2450%
Epoch 1,	Batch 0,	loss: 0.0854441374540329:.8f,	Batch Accuracy: 96.8750%,	Overall Accuracy: 95.2430%
Epoch 1,	Batch 50,	loss: 0.20213688910007477:.8f,	Batch Accuracy: 96.8750%,	Overall Accuracy: 95.3059%
Epoch 1,	Batch 100,	loss: 0.08905906975269318:.8f,	Batch Accuracy: 96.