# Lab 3: Gesture Recognition using Convolutional Neural Networks

**Deadlines**: 

- Lab 3 Part A: February 6, 11:59pm
- Lab 3 Part B: February 13, 11:59pm

**Late Penalty**: There is a penalty-free grace period of one hour past the deadline. Any work that is submitted between 1 hour and 24 hours past the deadline will receive a 20% grade deduction. No other late work is accepted. Quercus submission time will be used, not your local computer time. You can submit your labs as many times as you want before the deadline, so please submit often and early.

**Grading TAs**: 
- Lab 3 Part A: Geoff Donoghue  
- Lab 3 Part B: Tianshi Cao

This lab is based on an assignment developed by Prof. Lisa Zhang.

This lab will be completed in two parts. In Part A you will you will gain experience gathering your own data set (specifically images of hand gestures), and understand the challenges involved in the data cleaning process. In Part B you will train a convolutional neural network to make classifications on different hand gestures. By the end of the lab, you should be able to:

1. Generate and preprocess your own data
2. Load and split data for training, validation and testing
3. Train a Convolutional Neural Network
4. Apply transfer learning to improve your model

Note that for this lab we will not be providing you with any starter code. You should be able to take the code used in previous labs, tutorials and lectures and modify it accordingly to complete the tasks outlined below.

### What to submit

**Submission for Part A:**  
Submit a zip file containing your images. Three images each of American Sign Language gestures for letters A - I (total of 27 images). You will be required to clean the images before submitting them. Details are provided under Part A of the handout.

Individual image file names should follow the convention of student-number_Alphabet_file-number.jpg
(e.g. 100343434_A_1.jpg).


**Submission for Part B:**  
Submit a PDF file containing all your code, outputs, and write-up
from parts 1-5. You can produce a PDF of your Google Colab file by
going to **File > Print** and then save as PDF. The Colab instructions
has more information. Make sure to review the PDF submission to ensure that your answers are easy to read. Make sure that your text is not cut off at the margins. 

**Do not submit any other files produced by your code.**

Include a link to your colab file in your submission.

Please use Google Colab to complete this assignment. If you want to use Jupyter Notebook, please complete the assignment and upload your Jupyter Notebook file to Google Colab for submission. 

## Colab Link

Include a link to your colab file here

Colab Link: https://colab.research.google.com/drive/1VBpsPpVqrz4mYDW-Xnb1zKeOcccyAT21

## Part A. Data Collection [10 pt]

So far, we have worked with data sets that have been collected, cleaned, and curated by machine learning
researchers and practitioners. Datasets like MNIST and CIFAR are often used as toy examples, both by
students and by researchers testing new machine learning models.

In the real world, getting a clean data set is never that easy. More than half the work in applying machine
learning is finding, gathering, cleaning, and formatting your data set.

The purpose of this lab is to help you gain experience gathering your own data set, and understand the
challenges involved in the data cleaning process.

### American Sign Language

American Sign Language (ASL) is a complete, complex language that employs signs made by moving the
hands combined with facial expressions and postures of the body. It is the primary language of many
North Americans who are deaf and is one of several communication options used by people who are deaf or
hard-of-hearing.

The hand gestures representing English alphabet are shown below. This lab focuses on classifying a subset
of these hand gesture images using convolutional neural networks. Specifically, given an image of a hand
showing one of the letters A-I, we want to detect which letter is being represented.

![alt text](https://www.disabled-world.com/pics/1/asl-alphabet.jpg)


### Generating Data
We will produce the images required for this lab by ourselves. Each student will collect, clean and submit
three images each of Americal Sign Language gestures for letters A - I (total of 27 images)
Steps involved in data collection

1. Familiarize yourself with American Sign Language gestures for letters from A - I (9 letters).
2. Ask your friend to take three pictures at slightly different orientation for each letter gesture using your
mobile phone.
 - Ensure adequate lighting while you are capturing the images.
 - Use a white wall as your background.
 - Use your right hand to create gestures (for consistency).
 - Keep your right hand fairly apart from your body and any other obstructions.
 - Avoid having shadows on parts of your hand.
3. Transfer the images to your laptop for cleaning.

### Cleaning Data
To simplify the machine learning the task, we will standardize the training images. We will make sure that
all our images are of the same size (224 x 224 pixels RGB), and have the hand in the center of the cropped
regions.

You may use the following applications to crop and resize your images:

**Mac**
- Use Preview:
– Holding down CMD + Shift will keep a square aspect ratio while selecting the hand area.
– Resize to 224x224 pixels.

**Windows 10**
- Use Photos app to edit and crop the image and keep the aspect ratio a square.
- Use Paint to resize the image to the final image size of 224x224 pixels.

**Linux**
- You can use GIMP, imagemagick, or other tools of your choosing.
You may also use online tools such as http://picresize.com
All the above steps are illustrative only. You need not follow these steps but following these will ensure that
you produce a good quality dataset. You will be judged based on the quality of the images alone.
Please do not edit your photos in any other way. You should not need to change the aspect ratio of your
image. You also should not digitally remove the background or shadows—instead, take photos with a white
background and minimal shadows.

### Accepted Images
Images will be accepted and graded based on the criteria below
1. The final image should be size 224x224 pixels (RGB).
2. The file format should be a .jpg file.
3. The hand should be approximately centered on the frame.
4. The hand should not be obscured or cut off.
5. The photos follows the ASL gestures posted earlier.
6. The photos were not edited in any other way (e.g. no electronic removal of shadows or background).

### Submission
Submit a zip file containing your images. There should be a total of 27 images (3 for each category)
1. Individual image file names should follow the convention of student-number_Alphabet_file-number.jpg
(e.g. 100343434_A_1.jpg)
2. Zip all the images together and name it with the following convention: last-name_student-number.zip
(e.g. last-name_100343434.zip).
3. Submit the zipped folder.
We will be anonymizing and combining the images that everyone submits. We will announce when the
combined data set will be available for download.

![alt text](https://github.com/UTNeural/APS360/blob/master/Gesture%20Images.PNG?raw=true)

## Part B. Building a CNN [50 pt]

For this lab, we are not going to give you any starter code. You will be writing a convolutional neural network
from scratch. You are welcome to use any code from previous labs, lectures and tutorials. You should also
write your own code.

You may use the PyTorch documentation freely. You might also find online tutorials helpful. However, all
code that you submit must be your own.

Make sure that your code is vectorized, and does not contain obvious inefficiencies (for example, unecessary
for loops, or unnecessary calls to unsqueeze()). Ensure enough comments are included in the code so that
your TA can understand what you are doing. It is your responsibility to show that you understand what you
write.

**This is much more challenging and time-consuming than the previous labs.** Make sure that you
give yourself plenty of time by starting early. In particular, the earlier questions can be completed even if you
do not yet have the full data set.

In [0]:
import numpy as np
import time
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from torch.utils.data.sampler import SubsetRandomSampler
import torchvision.transforms as transforms

import os
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt

### 1. Data Loading and Splitting [10 pt]

Download the anonymized data provided from Quercus. Split the data into training, validation,
and test sets.

Note: Data splitting is not as trivial in this lab. We want our test set to closely resemble the setting in which
our model will be used. In particular, our test set should contain hands that are never seen in training!

Explain how you split the data, either by describing what you did, or by showing the code that you used.
Justify your choice of splitting strategy. How many training, validation, and test images do you have?

For loading the data, you can use plt.imread as in Lab 1, or any other method that you choose. You may find
torchvision.datasets.ImageFolder helpful. (see https://pytorch.org/docs/master/torchvision/datasets.html#imagefolder
) 

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
# define the training, validation, and testing directories
train_path = "/content/drive/My Drive/Colab Notebooks/APS360/Lab3/Lab_3b_Gesture_Dataset/train"
validation_path = "/content/drive/My Drive/Colab Notebooks/APS360/Lab3/Lab_3b_Gesture_Dataset/validation"
test_path = "/content/drive/My Drive/Colab Notebooks/APS360/Lab3/Lab_3b_Gesture_Dataset/test"

# the ASL letters produced in the dataset
classes = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']

In [5]:
# convert all jpgs to tensors
data_transform = transforms.Compose([transforms.Resize((224,224)), 
                                transforms.ToTensor()])

# load training, validation, and testing data
train_data = torchvision.datasets.ImageFolder(root = train_path, 
                                           transform=data_transform)

validation_data = torchvision.datasets.ImageFolder(root = validation_path, 
                                           transform=data_transform)

test_data = torchvision.datasets.ImageFolder(root = test_path, 
                                           transform=data_transform)

# check that I'm actually importing things and not just writing bs
print('Number of training images:', len(train_data))
print('Number of validation images:', len(validation_data))
print('Number of testing images:', len(test_data))

Number of training images: 1398
Number of validation images: 526
Number of testing images: 507


In [0]:
# define dataloader parameters
batch_size  = 32
num_workers = 1

# prepare data loaders
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size, num_workers=num_workers, shuffle=True)
val_loader = torch.utils.data.DataLoader(validation_data, batch_size=batch_size, num_workers=num_workers, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size, num_workers=num_workers, shuffle=True)


In [7]:
"""
# check to see if I'm actually importing the images correctly or if stuff's wack asf

# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy() # convert images to numpy for display

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(20):
    ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])
    plt.imshow(np.transpose(images[idx], (1, 2, 0)))
    ax.set_title(classes[labels[idx]])
"""

"\n# check to see if I'm actually importing the images correctly or if stuff's wack asf\n\n# obtain one batch of training images\ndataiter = iter(train_loader)\nimages, labels = dataiter.next()\nimages = images.numpy() # convert images to numpy for display\n\n# plot the images in the batch, along with the corresponding labels\nfig = plt.figure(figsize=(25, 4))\nfor idx in np.arange(20):\n    ax = fig.add_subplot(2, 20/2, idx+1, xticks=[], yticks=[])\n    plt.imshow(np.transpose(images[idx], (1, 2, 0)))\n    ax.set_title(classes[labels[idx]])\n"

In [8]:
"""
- in Lab 2, all 12,000 images were divided into 2/3 training data, 1/6 validation data, and 1/6 testing data 
- in this Lab, all letters except for 'I', have either 272 or 273 samples
    - 'I' has 249 samples
- we need to split the data so that each letter has enough training, validation, and testing data
- use ~60% of the images for training, ~20% for validation, ~20% for testing

There are 156 training photos, 59 validation photos, and 57 testing photos for A, B, C, E
There are 156 training photos, 60 validation photos, and 56 testing photos for D, F, G, H
There are 147 training photos, 51 validation photos, and 51 testing photos for I

- when the images are split, keep images belonging to the same student together in each dataset
    - i.e.: put all three of student 1's 'A' images in the training dataset, put all three of student 2's 'A' images in the
    validation dataset, put all three of student 3's 'A' images in the testing dataset
- this allows us to test the model on unseen samples (unseen hands)
- this is also why the percentages aren't exactly 60%, 20%, and 20%
"""

"\n- in Lab 2, all 12,000 images were divided into 2/3 training data, 1/6 validation data, and 1/6 testing data \n- in this Lab, all letters except for 'I', have either 272 or 273 samples\n    - 'I' has 249 samples\n- we need to split the data so that each letter has enough training, validation, and testing data\n- use ~60% of the images for training, ~20% for validation, ~20% for testing\n\nThere are 156 training photos, 59 validation photos, and 57 testing photos for A, B, C, E\nThere are 156 training photos, 60 validation photos, and 56 testing photos for D, F, G, H\nThere are 147 training photos, 51 validation photos, and 51 testing photos for I\n\n- when the images are split, keep images belonging to the same student together in each dataset\n    - i.e.: put all three of student 1's 'A' images in the training dataset, put all three of student 2's 'A' images in the\n    validation dataset, put all three of student 3's 'A' images in the testing dataset\n- this allows us to test the 

In [0]:
# some helper functions from Lab 2
def get_model_name(name, batch_size, learning_rate, epoch):
    ''' Generate a name for the model consisting of all the hyperparameter values

    Args:
        config: Configuration object containing the hyperparameters
    Returns:
        path: A string with the hyperparameter name and value concatenated
    '''
    path = "model_{0}_bs{1}_lr{2}_epoch{3}".format(name,
                                                   batch_size,
                                                   learning_rate,
                                                   epoch)
    return path

def get_accuracy(model, train=False):
    if train:
        data_loader = train_loader
    else:
        data_loader = val_loader

    correct = 0
    total = 0
    for imgs, labels in data_loader:
        
        """
        imgs = alexNet.features(imgs) #SLOW
        #############################################
        #To Enable GPU Usage
        if use_cuda and torch.cuda.is_available():
          imgs = imgs.cuda()
          labels = labels.cuda()
        #############################################
        """
        
        output = model(imgs)
        
        #select index with maximum prediction score
        pred = output.max(1, keepdim=True)[1]
        correct += pred.eq(labels.view_as(pred)).sum().item()
        total += imgs.shape[0]
    return correct / total

### 2. Model Building and Sanity Checking [15 pt]

### Part (a) Convolutional Network - 5 pt

Build a convolutional neural network model that takes the (224x224 RGB) image as input, and predicts the gesture
letter. Your model should be a subclass of nn.Module. Explain your choice of neural network architecture: how
many layers did you choose? What types of layers did you use? Were they fully-connected or convolutional?
What about other decisions like pooling layers, activation functions, number of channels / hidden units?

In [0]:
# from Lab 2
"""class Lab3CNN(nn.Module):
    def __init__(self):
        super(Lab3CNN, self).__init__()
        self.name = "Lab3CNN"
        self.conv = nn.Conv2d(3, 5, 3)
        self.pool = nn.MaxPool2d(2, 2)
        # how is the input to the fc determined? use that formula n(i+1) = [n(i) + 2*padding - kernel] / stride 
        # ok but using that formula I get that the fully connected layer should be 111*111*5???
        # unless there are 2 feature maps being used but how would you know???
        self.fc = nn.Linear(220 * 220 * 10, 1) # uhh

    def forward(self, x):
        x = self.pool(F.relu(self.conv(x)))
        print("pool relu:", x.shape)
        x = self.pool(x)
        print("pool:", x.shape)
        x = x.view(-1, 220 * 220 * 10) # uhh why does this change every time
        print("view:", x.shape)
        x = self.fc(x)
        print("fc:", x.shape)
        #x = x.squeeze(1) # Flatten to [batch_size]
        print("squeeze:", x.shape)
        return x"""

class Lab3CNN(nn.Module):
    def __init__(self):
        super(Lab3CNN, self).__init__()
        self.name = "small"
        self.conv = nn.Conv2d(3, 5, 3)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc = nn.Linear(136125, 9)

    def forward(self, x):
        x = self.pool(F.relu(self.conv(x)))
        x = self.pool(x)
        x = x.view(-1, 136125)
        x = self.fc(x)
        #x = x.squeeze(1) # Flatten to [batch_size]
        return x


In [11]:
"""
Number of layers: 
- 

Types of layers: 1 convolution layer, 1 pooling layer, 1 fully-connected layer
- 

Fully connected or convolutional: convolutional first, then fully-connected
- 

Pooling layers: 1
- 

Activation function: ReLU 
- 

Number of hidden units: 
- 

"""

'\nNumber of layers: \n- \n\nTypes of layers: 1 convolution layer, 1 pooling layer, 1 fully-connected layer\n- \n\nFully connected or convolutional: convolutional first, then fully-connected\n- \n\nPooling layers: 1\n- \n\nActivation function: ReLU \n- \n\nNumber of hidden units: \n- \n\n'

### Part (b) Training Code - 5 pt

Write code that trains your neural network given some training data. Your training code should make it easy
to tweak the usual hyperparameters, like batch size, learning rate, and the model object itself. Make sure
that you are checkpointing your models from time to time (the frequency is up to you). Explain your choice
of loss function and optimizer.

In [12]:
#@title 
"""def train(model, batch_size=9, learning_rate = 0.01, num_epochs=30):
    ########################################################################
    # Train a classifier on A-I
    target_classes = ["A", "B", "C", "D", "E", "F", "G", "H", "I"]
    ########################################################################
    # Fixed PyTorch random seed for reproducible result
    torch.manual_seed(1000)
    ########################################################################
    # Obtain the PyTorch data loader objects to load batches of the datasets
    train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size)
    val_loader = torch.utils.data.DataLoader(validation_data, batch_size=batch_size)
    test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size)
    ########################################################################
    # Define the Loss function and optimizer
    # The loss function will be Binary Cross Entropy (BCE). In this case we
    # will use the BCEWithLogitsLoss which takes unnormalized output from
    # the neural network and scalar label.
    # Optimizer will be SGD with Momentum.
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)    
    ########################################################################
    # Set up some numpy arrays to store the training/test loss/erruracy
    train_err = np.zeros(num_epochs)
    train_loss = np.zeros(num_epochs)
    val_err = np.zeros(num_epochs)
    val_loss = np.zeros(num_epochs)
    ########################################################################
    # Train the network
    # Loop over the data iterator and sample a new batch of training data
    # Get the output from the network, and optimize our loss function.
    start_time = time.time()
    for epoch in range(num_epochs):  # loop over the dataset multiple times
        total_train_loss = 0.0
        total_train_err = 0.0
        total_epoch = 0
        for i, data in enumerate(train_loader, 0):
            # Get the inputs
            inputs, labels = data
            #print("inputs:",inputs.shape)
            #print("labels:",labels.shape)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = model(inputs).unsqueeze(dim=1)
            #print("outputs:",outputs.shape)
            #print("labels version 2:", labels.shape)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # Calculate the statistics
            corr = (outputs > 0.0).squeeze().long() != labels
            total_train_err += int(corr.sum())
            total_train_loss += loss.item()
            total_epoch += len(labels)
        train_err[epoch] = float(total_train_err) / total_epoch
        train_loss[epoch] = float(total_train_loss) / (i+1)
        val_err[epoch], val_loss[epoch] = evaluate(net, val_loader, criterion)
        print(("Epoch {}: Train err: {}, Train loss: {} |"+
               "Validation err: {}, Validation loss: {}").format(
                   epoch + 1,
                   train_err[epoch],
                   train_loss[epoch],
                   val_err[epoch],
                   val_loss[epoch]))
        # Save the current model (checkpoint) to a file
        model_path = get_model_name(net.name, batch_size, learning_rate, epoch)
        torch.save(net.state_dict(), model_path)
    print('Finished Training')
    end_time = time.time()
    elapsed_time = end_time - start_time
    print("Total time elapsed: {:.2f} seconds".format(elapsed_time))
    # Write the train/test loss/err into CSV file for plotting later
    epochs = np.arange(1, num_epochs + 1)
    np.savetxt("{}_train_err.csv".format(model_path), train_err)
    np.savetxt("{}_train_loss.csv".format(model_path), train_loss)
    np.savetxt("{}_val_err.csv".format(model_path), val_err)
    np.savetxt("{}_val_loss.csv".format(model_path), val_loss)"""

'def train(model, batch_size=9, learning_rate = 0.01, num_epochs=30):\n    ########################################################################\n    # Train a classifier on A-I\n    target_classes = ["A", "B", "C", "D", "E", "F", "G", "H", "I"]\n    ########################################################################\n    # Fixed PyTorch random seed for reproducible result\n    torch.manual_seed(1000)\n    ########################################################################\n    # Obtain the PyTorch data loader objects to load batches of the datasets\n    train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size)\n    val_loader = torch.utils.data.DataLoader(validation_data, batch_size=batch_size)\n    test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size)\n    ########################################################################\n    # Define the Loss function and optimizer\n    # The loss function will be Binary Cross Entropy (BC

In [36]:
def train(model, data, batch_size=9, learning_rate = 0.01, num_epochs=30):
    train_loader = torch.utils.data.DataLoader(data, batch_size=batch_size)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)

    iters, losses, train_acc, val_acc = [], [], [], []

    # training
    n = 0 # the number of iterations
    for epoch in range(num_epochs):
        total_train_loss = 0.0
        total_train_err = 0.0
        total_epoch = 0
        for i, data in enumerate(train_loader, 0):
            # get the inputs; data is a list of [inputs, labels]
            inputs, labels = data
            print("\ninputs:",inputs.shape)
            #print("labels:",labels.shape)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = model(inputs)#.unsqueeze(dim=1)
        
            print("outputs:",outputs.shape)
            print("labels:", labels.shape)
            loss = criterion(outputs, labels.long())
            loss.backward()
            optimizer.step()
"""
            # Calculate the statistics
            corr = (outputs > 0.0).squeeze().long() != labels
            total_train_err += int(corr.sum())
            total_train_loss += loss.item()
            total_epoch += 1
        train_err[epoch] = float(total_train_err) / total_epoch
        train_loss[epoch] = float(total_train_loss) / (i+1)
        #val_err[epoch], val_loss[epoch] = evaluate(net, val_loader, criterion)
        print(("Epoch {}: Train err: {}, Train loss: {} |").format(
                   epoch + 1,
                   train_err[epoch],
                   train_loss[epoch]))

        model_path = get_model_name(model.name, batch_size, learning_rate, epoch)

        torch.save(net.state_dict(), model_path)
    print('Finished Training')
    end_time = time.time()
    elapsed_time = end_time - start_time
    print("Total time elapsed: {:.2f} seconds".format(elapsed_time))
    # Write the train/test loss/err into CSV file for plotting later
    epochs = np.arange(1, num_epochs + 1)
    np.savetxt("{}_train_err.csv".format(model_path), train_err)
    np.savetxt("{}_train_loss.csv".format(model_path), train_loss)
    np.savetxt("{}_val_err.csv".format(model_path), val_err)
    np.savetxt("{}_val_loss.csv".format(model_path), val_loss)"""

'\n            # Calculate the statistics\n            corr = (outputs > 0.0).squeeze().long() != labels\n            total_train_err += int(corr.sum())\n            total_train_loss += loss.item()\n            total_epoch += 1\n        train_err[epoch] = float(total_train_err) / total_epoch\n        train_loss[epoch] = float(total_train_loss) / (i+1)\n        #val_err[epoch], val_loss[epoch] = evaluate(net, val_loader, criterion)\n        print(("Epoch {}: Train err: {}, Train loss: {} |").format(\n                   epoch + 1,\n                   train_err[epoch],\n                   train_loss[epoch]))\n\n        model_path = get_model_name(model.name, batch_size, learning_rate, epoch)\n\n        torch.save(net.state_dict(), model_path)\n    print(\'Finished Training\')\n    end_time = time.time()\n    elapsed_time = end_time - start_time\n    print("Total time elapsed: {:.2f} seconds".format(elapsed_time))\n    # Write the train/test loss/err into CSV file for plotting later\n    e

In [51]:
model = Lab3CNN()
train(model, train_data)
# ERROR: Target 1 is out of bounds.
#model_path_small = get_model_name("small", 64, 0.01, 29)
#plot_training_curve(model_path_small)


inputs: torch.Size([9, 3, 224, 224])
outputs: torch.Size([1, 9])
labels: torch.Size([9])


ValueError: ignored

In [0]:
"""
Type of loss function chosen: cross-entropy
- 

Type of optimizer chosen: single gradient descent
- 

"""

### Part (c) “Overfit” to a Small Dataset - 5 pt

One way to sanity check our neural network model and training code is to check whether the model is capable
of “overfitting” or “memorizing” a small dataset. A properly constructed CNN with correct training code
should be able to memorize the answers to a small number of images quickly.

Construct a small dataset (e.g. just the images that you have collected). Then show that your model and
training code is capable of memorizing the labels of this small data set.

With a large batch size (e.g. the entire small dataset) and learning rate that is not too high, You should be
able to obtain a 100% training accuracy on that small dataset relatively quickly (within 200 iterations).

### 3. Hyperparameter Search [10 pt]

### Part (a) - 1 pt

List 3 hyperparameters that you think are most worth tuning. Choose at least one hyperparameter related to
the model architecture.

### Part (b) - 6 pt

Tune the hyperparameters you listed in Part (a), trying as many values as you need to until you feel satisfied
that you are getting a good model. Plot the training curve of at least 4 different hyperparameter settings.

### Part (c) - 1 pt
Choose the best model out of all the ones that you have trained. Justify your choice.

### Part (d) - 2 pt
Report the test accuracy of your best model. You should only do this step once and prior to this step you should have only used the training and validation data.

### 4. Transfer Learning [15 pt]
For many image classification tasks, it is generally not a good idea to train a very large deep neural network
model from scratch due to the enormous compute requirements and lack of sufficient amounts of training
data.

One of the better options is to try using an existing model that performs a similar task to the one you need
to solve. This method of utilizing a pre-trained network for other similar tasks is broadly termed **Transfer
Learning**. In this assignment, we will use Transfer Learning to extract features from the hand gesture
images. Then, train a smaller network to use these features as input and classify the hand gestures.

As you have learned from the CNN lecture, convolution layers extract various features from the images which
get utilized by the fully connected layers for correct classification. AlexNet architecture played a pivotal
role in establishing Deep Neural Nets as a go-to tool for image classification problems and we will use an
ImageNet pre-trained AlexNet model to extract features in this assignment.

### Part (a) - 5 pt
Here is the code to load the AlexNet network, with pretrained weights. When you first run the code, PyTorch
will download the pretrained weights from the internet.

In [0]:
import torchvision.models
alexnet = torchvision.models.alexnet(pretrained=True)

The alexnet model is split up into two components: *alexnet.features* and *alexnet.classifier*. The
first neural network component, *alexnet.features*, is used to compute convolutional features, which are
taken as input in *alexnet.classifier*.

The neural network alexnet.features expects an image tensor of shape Nx3x224x224 as input and it will
output a tensor of shape Nx256x6x6 . (N = batch size).

Compute the AlexNet features for each of your training, validation, and test data. Here is an example code
snippet showing how you can compute the AlexNet features for some images (your actual code might be
different):

In [0]:
# img = ... a PyTorch tensor with shape [N,3,224,224] containing hand images ...
data = train_data
batchSize = 64

train_loader = torch.utils.data.DataLoader(train_data, batch_size=batchSize)
for imgs, labels in iter(train_loader):
    imgs = features = alexnet.features(imgs) #SLOW
features = alexnet.features(img)

KeyboardInterrupt: ignored

**Save the computed features**. You will be using these features as input to your neural network in Part
(b), and you do not want to re-compute the features every time. Instead, run *alexnet.features* once for
each image, and save the result.

In [0]:
# Save Features to Folder (assumes code from 1. has been evaluated)

import os
import torchvision.models
alexnet = torchvision.models.alexnet(pretrained=True)

# location on Google Drive
master_path = '/content/gdrive/My Drive/Colab Notebooks/APS360/Lab3/Lab3_Data/Features'

# Prepare Dataloader (requires code from 1.)
batch_size = 1 # save 1 file at a time, hence batch_size = 1
num_workers = 1
data_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, 
                                           num_workers=num_workers, shuffle=True)

classes = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I']

# save features to folder as tensors
n = 0
for img, label in data_loader:
  features = alexnet.features(img)
  features_tensor = torch.from_numpy(features.detach().numpy())

  folder_name = master_path + '/' + str(classes[label])
  if not os.path.isdir(folder_name):
    os.mkdir(folder_name)
  torch.save(features_tensor.squeeze(0), folder_name + '/' + str(n) + '.tensor')
  n += 1

### Part (b) - 3 pt
Build a convolutional neural network model that takes as input these AlexNet features, and makes a
prediction. Your model should be a subclass of nn.Module.

Explain your choice of neural network architecture: how many layers did you choose? What types of layers
did you use: fully-connected or convolutional? What about other decisions like pooling layers, activation
functions, number of channels / hidden units in each layer?

Here is an example of how your model may be called:

In [0]:
# features = ... load precomputed alexnet.features(img) ...
output = model(features)
prob = F.softmax(output)

### Part (c) - 5 pt
Train your new network, including any hyperparameter tuning. Plot and submit the training curve of your
best model only.

Note: Depending on how you are caching (saving) your AlexNet features, PyTorch might still be tracking
updates to the **AlexNet weights**, which we are not tuning. One workaround is to convert your AlexNet
feature tensor into a numpy array, and then back into a PyTorch tensor.

In [0]:
tensor = torch.from_numpy(tensor.detach().numpy())

### Part (d) - 2 pt
Report the test accuracy of your best model. How does the test accuracy compare to part 4(d)?