# Homework 2: Convolutional Neural Networks (100 points)

### Overview

With new knowledge of convolutional neural networks, we can accomplish a more difficult image recognition task. The CIFAR-10 classification dataset consists of 60,000 labelled images split between 10 classes: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships and trucks.

For the purposes of this assignment, we will compare two models on the same dataset: a fully connected neural network (as in Homework 1) called ANN and a new convolutional architecture called CNN, as outlined in the next section. To be fair, we attempt to allow the same number of trainable parameters in the ANN as the CNN, which means we need to use the same input transformation to flatten grayscale used in Homework 1 for the ANN. The CNN reaps the full benefit of the original 2D image in RGB.

### CNN Architecture

Each image consists of 32x32 RGB pixel values between 0 and 255. We do not need to perform any preprocessing as the convolutional model will use all three channels concurrently as input.

The architecture in use has 5 layers: a convolution layer followed by a pooling layer, then another convolutional layer, then two fully connected dense layers. The latter of these has 10 neurons to provide classification output.

### Your Task

At the bottom of this notebook file, there are four short answer questions testing your understanding of this neural network architecture. As before, some questions will require you to experiment with model hyperparameters.

Below each question is a cell with the text “Type Markdown and LaTex.” Double-click the cell and type your response to the question. Save your responses by clicking on the floppy disk icon or choosing File - Save and Checkpoint.

After responding to the questions, download your notebook as a `.html` file by choosing File - Download as - html (.html). You will be submitting this `.html` file to your instructor for grading.

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms

In [2]:
torch.manual_seed(0)
torch.set_num_threads(4)
torch.set_num_interop_threads(4)

In [3]:
# Original transform
# trainTransform = transforms.Compose([#add yours here!
#                                      transforms.ToTensor(),
#                                      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
#                                     ])
# testTransform = transforms.Compose([transforms.ToTensor(),
#                                      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
#                                     ])

In [4]:
trainTransform = transforms.Compose([transforms.RandomRotation(5),
                                     transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
                                    ])
testTransform = transforms.Compose([transforms.ToTensor(),
                                     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
                                    ])

In [5]:
root_dir = 'assets_week2'
trainDataset = torchvision.datasets.CIFAR10(root=root_dir, train=True, download=True, transform=trainTransform)
trainLoader = torch.utils.data.DataLoader(trainDataset, batch_size=4, shuffle=True, num_workers=2)
testDataset = torchvision.datasets.CIFAR10(root=root_dir, train=False, download=True, transform=testTransform)
testLoader = torch.utils.data.DataLoader(testDataset, batch_size=4, shuffle=False, num_workers=2)

Files already downloaded and verified
Files already downloaded and verified


In [6]:
class ANNModel(nn.Module):
    def __init__(self, hiddenSize, dropoutRate, activate):
        super().__init__()
        # Note that 'layer' and 'dense' differ only in name (to show similarity to CNN)
        self.activate = nn.Sigmoid() if activate == "Sigmoid" else nn.ReLU()
        self.layer1 = nn.Linear(1024, 100)
        self.layer2 = nn.Linear(100, 15 * 5 * 5)
        self.dense1 = nn.Linear(15 * 5 * 5, hiddenSize)
        self.dropout = nn.Dropout(dropoutRate)
        self.dense2 = nn.Linear(hiddenSize, 10)
        
    def forward(self, x):
        x = self.activate(self.layer1(x))
        x = self.activate(self.layer2(x))
        x = self.dropout(self.activate(self.dense1(x)))
        return self.dense2(x)

class CNNModel(nn.Module):
    def __init__(self, hiddenSize, outChannels, dropoutRate, activate):
        super().__init__()
        self.outChannels = outChannels
        self.activate = nn.Sigmoid() if activate == "Sigmoid" else nn.ReLU()
        self.conv1 = nn.Conv2d(3, 24, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(24, outChannels, 5)
        self.dense1 = nn.Linear(outChannels * 5 * 5, hiddenSize)
        self.dropout = nn.Dropout(dropoutRate)
        self.dense2 = nn.Linear(hiddenSize, 10)

    def forward(self, x):
        x = self.pool(self.activate(self.conv1(x)))
        x = self.pool(self.activate(self.conv2(x)))
        x = x.view(-1, self.outChannels * 5 * 5)
        x = self.dropout(self.activate(self.dense1(x)))
        return self.dense2(x)

In [7]:
# Number of neurons in the first fully-connected layer
hiddenSize = 100
# Number of feature filters in second convolutional layer
numFilters = 25
# Dropout rate
dropoutRate = 0
# Activation function
activation = "ReLU"
# Learning rate
learningRate = 0.001
# Momentum for SGD optimizer
momentum = 0.9
# Number of training epochs
numEpochs = 10

In [8]:
ann = ANNModel(hiddenSize, dropoutRate, activation)
cnn = CNNModel(hiddenSize, numFilters, dropoutRate, activation)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(list(ann.parameters()) + list(cnn.parameters()), lr=learningRate, momentum=momentum)

print('>>> Beginning training!') 
ann.train()
cnn.train()
for epoch in range(numEpochs):  # loop over the dataset multiple times
    annRunningLoss, cnnRunningLoss = 0, 0
    for i, (inputs, labels) in enumerate(trainLoader, 0):
        annInputs = torch.sum(inputs, axis=1).view(-1, 32*32)
        
        optimizer.zero_grad()

        # Forward propagation
        annOutputs = ann(annInputs)
        cnnOutputs = cnn(inputs)
        
        # Backpropagation
        annLoss = criterion(annOutputs, labels)
        cnnLoss = criterion(cnnOutputs, labels)
        annLoss.backward()
        cnnLoss.backward()
        
        # Gradient update
        optimizer.step()

        annRunningLoss += annLoss.item()
        cnnRunningLoss += cnnLoss.item()
        if (i+1) % 2000 == 0:    # print every 2000 mini-batches
            print('Epoch [{}/{}], Step [{}/{}], ANN Loss: {}, CNN Loss: {}'.format(epoch + 1, numEpochs, i + 1, len(trainDataset)//4, annRunningLoss/2000, cnnRunningLoss/2000))
            annRunningLoss, cnnRunningLoss = 0, 0

print()
print('>>> Beginning validation!')
ann.eval()
cnn.eval()
annCorrect, cnnCorrect = 0, 0
total = 0
for inputs, labels in testLoader:
    annInputs = torch.sum(inputs, axis=1).view(-1, 32*32)
    annOutputs = ann(annInputs)
    cnnOutputs = cnn(inputs)
    _, annPredicted = torch.max(annOutputs.data, 1)
    _, cnnPredicted = torch.max(cnnOutputs.data, 1)
    total += labels.size(0)
    annCorrect += (annPredicted == labels).sum().item()
    cnnCorrect += (cnnPredicted == labels).sum().item()
print('ANN validation accuracy: {}%, CNN validation accuracy: {}%'.format(annCorrect / total * 100, cnnCorrect / total * 100))

>>> Beginning training!
Epoch [1/10], Step [2000/12500], ANN Loss: 2.0651431404650213, CNN Loss: 2.011094207406044
Epoch [1/10], Step [4000/12500], ANN Loss: 1.9222866840958595, CNN Loss: 1.651698141783476
Epoch [1/10], Step [6000/12500], ANN Loss: 1.866415710836649, CNN Loss: 1.501900658980012
Epoch [1/10], Step [8000/12500], ANN Loss: 1.8486922891438007, CNN Loss: 1.4304361519664526
Epoch [1/10], Step [10000/12500], ANN Loss: 1.819504773169756, CNN Loss: 1.3662658289670944
Epoch [1/10], Step [12000/12500], ANN Loss: 1.792697628468275, CNN Loss: 1.294509585440159
Epoch [2/10], Step [2000/12500], ANN Loss: 1.7398476891368628, CNN Loss: 1.2263958215303719
Epoch [2/10], Step [4000/12500], ANN Loss: 1.721903137549758, CNN Loss: 1.2103745007775724
Epoch [2/10], Step [6000/12500], ANN Loss: 1.7148346040248872, CNN Loss: 1.1875471148602665
Epoch [2/10], Step [8000/12500], ANN Loss: 1.7160489804148673, CNN Loss: 1.1464335404336452
Epoch [2/10], Step [10000/12500], ANN Loss: 1.7132752705663443

In [9]:
# Original performance result
# >>> Beginning training!
# Epoch [1/10], Step [2000/12500], ANN Loss: 2.062768382251263, CNN Loss: 2.0118962318003177
# Epoch [1/10], Step [4000/12500], ANN Loss: 1.922449570864439, CNN Loss: 1.6491626717150212
# Epoch [1/10], Step [6000/12500], ANN Loss: 1.8609392429888247, CNN Loss: 1.496796541929245
# Epoch [1/10], Step [8000/12500], ANN Loss: 1.8343419369757175, CNN Loss: 1.4247962787225843
# Epoch [1/10], Step [10000/12500], ANN Loss: 1.8108680999577045, CNN Loss: 1.3564418687969446
# Epoch [1/10], Step [12000/12500], ANN Loss: 1.7755021077692508, CNN Loss: 1.2840593818947672
# Epoch [2/10], Step [2000/12500], ANN Loss: 1.7118807710856199, CNN Loss: 1.217776041738689
# Epoch [2/10], Step [4000/12500], ANN Loss: 1.7023028466850518, CNN Loss: 1.1937822642866522
# Epoch [2/10], Step [6000/12500], ANN Loss: 1.6855913656800985, CNN Loss: 1.169634734245017
# Epoch [2/10], Step [8000/12500], ANN Loss: 1.6861593794375658, CNN Loss: 1.1278142586536706
# Epoch [2/10], Step [10000/12500], ANN Loss: 1.681460771769285, CNN Loss: 1.11159008590132
# Epoch [2/10], Step [12000/12500], ANN Loss: 1.6844521416574716, CNN Loss: 1.093757688254118
# Epoch [3/10], Step [2000/12500], ANN Loss: 1.5980612008571624, CNN Loss: 1.0147096219323575
# Epoch [3/10], Step [4000/12500], ANN Loss: 1.5933891667425633, CNN Loss: 1.0154092156793921
# Epoch [3/10], Step [6000/12500], ANN Loss: 1.6101581390574575, CNN Loss: 1.0260999397682027
# Epoch [3/10], Step [8000/12500], ANN Loss: 1.623055516809225, CNN Loss: 0.9893379398109391
# Epoch [3/10], Step [10000/12500], ANN Loss: 1.5694836803525687, CNN Loss: 0.9597428971934132
# Epoch [3/10], Step [12000/12500], ANN Loss: 1.602022365361452, CNN Loss: 0.986960273547098
# Epoch [4/10], Step [2000/12500], ANN Loss: 1.5354256336390972, CNN Loss: 0.8970438048327342
# Epoch [4/10], Step [4000/12500], ANN Loss: 1.5316589372605085, CNN Loss: 0.9051831461470574
# Epoch [4/10], Step [6000/12500], ANN Loss: 1.5507742081359028, CNN Loss: 0.9081797932619229
# Epoch [4/10], Step [8000/12500], ANN Loss: 1.5432515789866448, CNN Loss: 0.8950935913156718
# Epoch [4/10], Step [10000/12500], ANN Loss: 1.5296328798532486, CNN Loss: 0.9136955788973719
# Epoch [4/10], Step [12000/12500], ANN Loss: 1.5434830002486706, CNN Loss: 0.8828933099175338
# Epoch [5/10], Step [2000/12500], ANN Loss: 1.452810761027038, CNN Loss: 0.794904001011746
# Epoch [5/10], Step [4000/12500], ANN Loss: 1.4939219054579735, CNN Loss: 0.8356246943548322
# Epoch [5/10], Step [6000/12500], ANN Loss: 1.4754717657081782, CNN Loss: 0.842789939025417
# Epoch [5/10], Step [8000/12500], ANN Loss: 1.48746871900931, CNN Loss: 0.8491292101730942
# Epoch [5/10], Step [10000/12500], ANN Loss: 1.4935839128568769, CNN Loss: 0.8415487864455208
# Epoch [5/10], Step [12000/12500], ANN Loss: 1.489127569541335, CNN Loss: 0.8365766553091817
# Epoch [6/10], Step [2000/12500], ANN Loss: 1.4074806213155389, CNN Loss: 0.7510692907674238
# Epoch [6/10], Step [4000/12500], ANN Loss: 1.4143287734538317, CNN Loss: 0.7461342083658091
# Epoch [6/10], Step [6000/12500], ANN Loss: 1.4441396532729267, CNN Loss: 0.7792961364542134
# Epoch [6/10], Step [8000/12500], ANN Loss: 1.4471339211985468, CNN Loss: 0.8049173939772881
# Epoch [6/10], Step [10000/12500], ANN Loss: 1.436086897470057, CNN Loss: 0.789288026013528
# Epoch [6/10], Step [12000/12500], ANN Loss: 1.450567073944956, CNN Loss: 0.7796823141543427
# Epoch [7/10], Step [2000/12500], ANN Loss: 1.3581053456217052, CNN Loss: 0.6901736634639092
# Epoch [7/10], Step [4000/12500], ANN Loss: 1.368674589846283, CNN Loss: 0.7057648525964468
# Epoch [7/10], Step [6000/12500], ANN Loss: 1.369694784026593, CNN Loss: 0.7207892820711713
# Epoch [7/10], Step [8000/12500], ANN Loss: 1.4126161734834313, CNN Loss: 0.765363468083553
# Epoch [7/10], Step [10000/12500], ANN Loss: 1.4122630998045207, CNN Loss: 0.7621235234646593
# Epoch [7/10], Step [12000/12500], ANN Loss: 1.409927348356694, CNN Loss: 0.7461324564729875
# Epoch [8/10], Step [2000/12500], ANN Loss: 1.2936645967960358, CNN Loss: 0.6263135505313954
# Epoch [8/10], Step [4000/12500], ANN Loss: 1.320595586769283, CNN Loss: 0.6764677218067227
# Epoch [8/10], Step [6000/12500], ANN Loss: 1.346045205898583, CNN Loss: 0.6883500699620927
# Epoch [8/10], Step [8000/12500], ANN Loss: 1.3628277169056238, CNN Loss: 0.700319167233436
# Epoch [8/10], Step [10000/12500], ANN Loss: 1.366694432295859, CNN Loss: 0.7077743821138284
# Epoch [8/10], Step [12000/12500], ANN Loss: 1.3888688515126706, CNN Loss: 0.7156861224401364
# Epoch [9/10], Step [2000/12500], ANN Loss: 1.2629833322651685, CNN Loss: 0.5982729022389977
# Epoch [9/10], Step [4000/12500], ANN Loss: 1.3036618775427342, CNN Loss: 0.6152381861356844
# Epoch [9/10], Step [6000/12500], ANN Loss: 1.3122947664409876, CNN Loss: 0.66245829876489
# Epoch [9/10], Step [8000/12500], ANN Loss: 1.3103235375266522, CNN Loss: 0.654794091342861
# Epoch [9/10], Step [10000/12500], ANN Loss: 1.3175036150328816, CNN Loss: 0.6717786931779482
# Epoch [9/10], Step [12000/12500], ANN Loss: 1.3396734294369816, CNN Loss: 0.6851124844889855
# Epoch [10/10], Step [2000/12500], ANN Loss: 1.2198837846461683, CNN Loss: 0.5523294631745921
# Epoch [10/10], Step [4000/12500], ANN Loss: 1.259059353346005, CNN Loss: 0.5920002504468648
# Epoch [10/10], Step [6000/12500], ANN Loss: 1.2515505373105407, CNN Loss: 0.6000785271073276
# Epoch [10/10], Step [8000/12500], ANN Loss: 1.3098676465936006, CNN Loss: 0.6619040474448702
# Epoch [10/10], Step [10000/12500], ANN Loss: 1.2947476158775388, CNN Loss: 0.6574993534947862
# Epoch [10/10], Step [12000/12500], ANN Loss: 1.3021455377489328, CNN Loss: 0.6422082813492307

# >>> Beginning validation!
# ANN validation accuracy: 42.120000000000005%, CNN validation accuracy: 68.45%

## Homework Questions

**To make sure your code produces consistent results, it is advisable to click "Kernel -> Restart & Run All" every time you want to run your code.**

### Question 1: CNN Advantage (10 points)

Compute the accuracy of a simple dense neural network and a simple CNN on the dataset. Explain the results and briefly overview the advantages of a CNN over a standard neural network for image-related tasks.

ANN validation accuracy is 42% whereas CNN validation accuracy is 68.45%. While training both the models started with almost similar loss on the train data set but loss started going down faster for the CNN which ultimately achieved higher accuracy.

The ANN and CNN models have approximately equal number of trainable parameters.But the CNN model has convolutional layers that can extract features from the images and therefore for the image-related tasks CNN can learn the image patterns using the convolutional filters and still keeping the parameters under control by applying pooling.

### Question 2: Dropout Rate (25 points)

Explain the purpose of dropout in any neural network model. In doing so, note what can happen if the dropout rate is too high and what can happen if the dropout rate is too low.

Dropout is a regularization technique used in neural network models to reduce overfitting. Overfitting occurs when a model learns the noise of the training data, leading to poor generalization on unseen data. Dropout addresses this problem by randomly dropping out (i.e., setting to zero) a proportion of the neurons in the network during training, making each neuron more independent and forces the network to learn more robust features that are not reliant on the presence of specific neurons.

If the dropout rate is too high, then too many neurons are dropped out during training, and the network will not be able to learn effectively. The model's performance will suffer, and it may even fail to converge.

If the dropout rate is too low, then not enough neurons are dropped out during training, and the network will continue to overfit the training data.

### Question 3: Kernel Size (25 points)

Explain the purpose of spatial filters (kernels) in a CNN. Additionally, explain where they fit into the overall architecture of the CNN in this coding example. Finally, explain what can happen if the kernel size is too large and what can happen if the kernel size is too small.

Spatial filters (kernels) in a CNN are used to extract features from an image by performing a convolution operation. The purpose of these filters is to detect specific patterns or features such as edges, corners, and blobs within an image. These filters slide over the input image, element-wise multiplying the values in the filter with the corresponding values in the input image and then summing the resulting products to produce a single output value. By varying the parameters of the filters (such as size and stride), different types of features can be extracted from the image.

In the given coding example, spatial filters are applied to the input image in the convolutional layers to extract features. These filters are initialized with random values and then optimized during training using backpropagation to improve the performance of the model.

In a CNN architecture, spatial filters are usually placed after the input layer and before the output layer. These filters are typically applied in a series of convolutional layers, with each layer detecting increasingly complex and abstract features.

If the kernel size is too large, it can lead to the loss of important details and features from the input image, resulting in reduced accuracy. On the other hand, if the kernel size is too small, it can lead to overfitting and the extraction of noisy or irrelevant features, which can also result in reduced accuracy. Therefore, it is important to choose an appropriate kernel size based on the size and complexity of the input image and the specific features being detected.

### Question 4: Data Augmentation (40 points)

Use the code snippet provided in the next box to implement data augmentation by updating the contents of box 3 and re-running the model. Compare your accuracy without and with data augmentation and explain the results. In doing so, explain the purpose of data augmentation.

In [10]:
transforms.RandomRotation(5),

(RandomRotation(degrees=[-5.0, 5.0], interpolation=nearest, expand=False, fill=0),)

After applying data augmentation using the rotation transform it appears that the train loss for CNN has gone up but the validation accuracy has slightly gone up.

Previous CNN train loss:0.64

Previous CNN validation accuracy:68.45%

Post data augmentation CNN train loss:0.70

Post data augmentation CNN validation accuracy:69.1%

The purpose of data augmentation is to increase the size and diversity of the training set, which can help to prevent overfitting and improve the generalization performance of the model. By applying random transformations to the input images, we can create new training examples that are similar to the original ones but differ in small ways, such as orientation, scale, or position. This can help the model learn to be more robust and invariant to these variations in the input data.

Data augmentation can lead to increased generalization accuracy however the amount of improvement may depend on the specific dataset, model architecture, and choice of data augmentation techniques. It is therefore important to experiment with different data augmentation strategies and evaluate their impact on the model's performance.