<a href="https://colab.research.google.com/github/swadhwa5/MLFinalProject/blob/main/ML_FinalProject_Submission.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Learning Final Project
By: Shreya Wadhwa, Alan Zhang, Aidan Aug, Trisha Karani
JHED: swadhwa, azhang, tkarani1, aaug1

*Due: April 28th, 2022*

**Description:**

This is the iPython/Jupyter Notebook for our Machine Learning Final Project. For this project, we decided to develop a Majority Vote classifer model over three different CNNs to train a model to recognize sign language letters. The project into the following sections:

1. Required Packages for Running the Notebook

2. Data Augmentation

3. Model Implementation

4. Model Training and Testing

5. Conclusions and Future Works

## Part 1 Python Packages:

This section is simply a compilation of all the required packages for every section in the notebook. Please make sure to run this prior to any of the other code sections.

In [None]:
## Data Processing, Augmentation, and Feature Engineering:
import numpy as np
import random
from PIL import Image, ImageEnhance
from os import listdir
import imghdr
import skimage
from skimage.transform import rotate, AffineTransform, warp

## Model Implementation
import sys
import csv
import os
import numpy as np
import datetime
import torch
import torch.nn as nn
import torch.nn.functional as F
import skimage
from skimage.transform import rotate

## Part 2: Data Augmentation and Feature Engineering

For Data Augmentation, we decided to increase our dataset via the following processes:
1. Blur
2. Brighten
3. Rotate
4. Translate
5. Zoom

## Part 3: Model Implementation

For our model, we decided to implement a majority vote classifier based on three Convolutional Neural Networks, each with differing structures. Each model structure has basis in other current models.

### Model 1: AlexNet

AlexNet was one of the breakthrough CNN models that competed and won the ImageNet Large Scale Visual Recognition Challenge in 2021. The model achieved an error of 15.3%, which was greatly better than the runner-up error. The following is an implementation for this CNN.

In [None]:
class AlexNet(torch.nn.Module):
    ### TODO Implement your model's structure and input/filter/output dimensions
    def __init__(self, input_height=600, input_width=600, n_classes=27):
        super().__init__()

        # Initialize the parameters of the model
        self.input_height = input_height
        self.input_width = input_width
        self.n_classes = n_classes

        # AlexNet Implementation; Same Structure with different outputs die to input
        self.model_convolution = nn.Sequential(
            # INPUT: 600x600x3
            nn.Conv2d(in_channels=3,out_channels=96, kernel_size=12, stride=4), # output: 148x148
            nn.ReLU(),
            nn.AvgPool2d(kernel_size=4, stride=2), #73x73
            nn.ReLU(),
            nn.Conv2d(in_channels=96,out_channels=256,kernel_size=5, pad=2), #73x73
            nn.ReLU(),
            nn.AvgPool2d(3,3), #36x36
            nn.Conv2d(in_channels=256,out_channels=384,kernel_size=3, pad=1), #36x36
            nn.ReLU(),
            nn.Conv2d(in_channels=384,out_channels=384,kernel_size=3, pad=1), #36x36
            nn.ReLU(),
            nn.Conv2d(in_channels=384,out_channels=256,kernel_size=3, pad=1), #36x36
            nn.ReLU(),
            nn.AvgPool2d(kernel_size=6,stride=2), #16x16x256
            nn.Conv2d(256,256, 12, padding=1, stride=2) #4x4x256
        )

        # The dense network architecture. Assumes input has 4096 nodes, or 4x4x256
        self.model_dense = nn.Sequential(
            nn.Flatten(),
            nn.Linear(4096, 4096),
            nn.ReLU(),
            nn.Dropout(p=0.5),
            nn.Linear(4096, 1000),
            nn.ReLU(),
            nn.Dropout(p=0.5),
            nn.Linear(1000, n_classes)
        )

    def forward(self, x):
        
        ### TODO Implement your best model's forward pass module    
        # Reshape back into 28 x 28
        x = x.reshape(x.shape[0], self.input_height, self.input_width)
        x = torch.unsqueeze(x, 1)
        x = self.model_convolution(x)
        x = self.model_dense(x)
        return x
    
        

### Model 2: ResNet Basis

In 2015, the next field-shaking model to be proposed for image classification was ResNet. One major issue for training models with many hidden layers is the vanishing gradient problem. ResNet, through a skip-layer structure, also called an "identity shortcut connection," avoids this problem.

In [None]:
# We use CNN blocks, which are multiple CNNs, multiple times.
class CNNblock(nn.Module):
    def __init__(self, in_chan, interm_chan, identity_downsample=None, stride=1):
        super(CNNblock, self).__init__()
        self.expansion = 4 # Hyperparameter for tuning

        self.model_convolution = nn.Sequential(
            nn.Conv2d(in_chan, interm_chan, kernel_size=1),
            nn.BatchNorm2d(interm_chan),
            nn.Conv2d(interm_chan, interm_chan, kernel_size=3, stride=stride, padding=1, bias=False),
            nn.BatchNorm2d(interm_chan),
            nn.Conv2d(interm_chan, interm_chan * self.expansion, kernel_size=1),
            nn.BatchNorm2d(interm_chan * self.expansion),
            nn.ReLU()
        )
        self.relu = nn.ReLU()
        self.identity_downsample = identity_downsample

    def forward(self, x):
        identity = x.clone()
        x = self.model_convolution(x)

        # Skip Connection
        if self.identity_downsample is not None:
            identity = self.identity_downsample(identity)

        x += identity
        x = self.relu(x)
        return x


class ResNet(nn.Module):
    def __init__(self, block, layers, image_channels, num_classes):
        super(ResNet, self).__init__()
        self.in_channels = 64
        self.model_convolution = torch.nn.Sequential(
            nn.Conv2d(image_channels, 64, kernel_size=7, stride=2, padding=3, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        )

        self.layer1 = self._make_layer(
          block, layers[0], intermediate_channels=64, stride=1
        )
        self.layer2 = self._make_layer(
          block, layers[1], intermediate_channels=128, stride=2
        )
        self.layer3 = self._make_layer(
          block, layers[2], intermediate_channels=256, stride=2
        )
        self.layer4 = self._make_layer(
          block, layers[3], intermediate_channels=512, stride=2
        )

        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * 4, num_classes)


    def forward(self, x):
      x = self.model_convolution(x)
      x = self.layer1(x)
      x = self.layer2(x)
      x = self.layer3(x)
      x = self.layer4(x)

      x = self.avgpool(x)
      x = x.reshape(x.shape[0], -1)
      x = self.fc(x)

      return x

    def _make_layer(self, block, num_residual_blocks, intermediate_channels, stride):
      identity_downsample = None
      layers = []

      # Either if we half the input space for ex, 56x56 -> 28x28 (stride=2), or channels changes
      # we need to adapt the Identity (skip connection) so it will be able to be added
      # to the layer that's ahead

      if stride != 1 or self.in_channels != intermediate_channels * 4:
        identity_downsample = nn.Sequential(
          nn.Conv2d(self.in_channels, intermediate_channels * 4, kernel_size=1, stride=stride),
          nn.BatchNorm2d(intermediate_channels * 4),
        )
        layers.append(
          block(self.in_channels, intermediate_channels, identity_downsample, stride)
        )

      # The expansion size is always 4 for ResNet 50,101,152
      self.in_channels = intermediate_channels * 4

      # For example for first resnet layer: 256 will be mapped to 64 as intermediate layer,
      # then finally back to 256. Hence no identity downsample is needed, since stride = 1,
      # and also same amount of channels.
      
      for i in range(num_residual_blocks - 1):
        layers.append(block(self.in_channels, intermediate_channels))
      return nn.Sequential(*layers)

def ResNet50(img_channel=3, num_classes=1000):
  return ResNet(CNNblock, [3, 4, 6, 3], img_channel, num_classes)

def test():
    net = ResNet50(img_channel=3, num_classes=1000)
    device = "cuda" if torch.cuda.is_available() else "cpu"
    y = net(torch.randn(4, 3, 224, 224)).to(device)
    print(y.size())

test()


### Model 3: Inception



Citations:

ResNet:
https://www.analyticsvidhya.com/blog/2021/06/build-resnet-from-scratch-with-python/

