<a href="https://colab.research.google.com/github/charlesincharge/Caltech-CS155-2022/blob/main/sets/set4/set4_prob3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Problem 3

Use this notebook to write your code for problem 3.

In [23]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## Problem A: Zero Padding

A benefit of the zero-padding scheme is that it maintains the size of the image. 
A downside is that it it more computationally expensive.

## Problem B: Number of parameters in Conv Layer

Each filter is 5 x 5 x 3 so it has 5 x 5 x 3 + 1 weights per filter. 
With 8 filters, the total number of parameters is 76 * 8 = 608 parameters.

## Problem C: Shape of output tensor


The output tensor will be 28 x 28 x 8. 

In [12]:
%%latex
$\textbf{Problem D: Apply 2 x 2 average pooling with stride 2}$

First: 
$
M=
  \begin{bmatrix}
    1 & 0.5 \\
    0.5 & 0.25
  \end{bmatrix}
$, 
Second: 
$
M=
  \begin{bmatrix}
    0.5 & 1 \\
    0.25 & 0.5
  \end{bmatrix}
$
Third: 
$
M=
  \begin{bmatrix}
    0.25 & 0.5 \\
    0.5 & 1
  \end{bmatrix}
$
Four: 
$
M=
  \begin{bmatrix}
    0.5 & 0.25 \\
    1 & 0.5
  \end{bmatrix}
$


<IPython.core.display.Latex object>

In [14]:
%%latex
$\textbf{Problem E: Apply 2 x 2 max pooling with stride 2}$

All of them will be: 
$
M=
  \begin{bmatrix}
    1 & 1 \\
    1 & 1
  \end{bmatrix}
$

<IPython.core.display.Latex object>

## Problem F:

Pooling may be advantageous because it can help prevent overfitting to the noisy data while maintaining translational invariance by taking the average or max of a patch.

## 3D - Convolutional network

As in problem 2, we have conveniently provided for your use code that loads and preprocesses the MNIST data.

In [24]:
# load MNIST data into PyTorch format
import torch
import torchvision
import torchvision.transforms as transforms

# set batch size
batch_size = 32

# load training data downloaded into data/ folder
mnist_training_data = torchvision.datasets.MNIST('data/', train=True, download=True,
                                                transform=transforms.ToTensor())
# transforms.ToTensor() converts batch of images to 4-D tensor and normalizes 0-255 to 0-1.0
training_data_loader = torch.utils.data.DataLoader(mnist_training_data,
                                                  batch_size=batch_size,
                                                  shuffle=True)

# load test data
mnist_test_data = torchvision.datasets.MNIST('data/', train=False, download=True,
                                                transform=transforms.ToTensor())
test_data_loader = torch.utils.data.DataLoader(mnist_test_data,
                                                  batch_size=batch_size,
                                                  shuffle=False)

In [25]:
# look at the number of batches per epoch for training and validation
print(f'{len(training_data_loader)} training batches')
print(f'{len(training_data_loader) * batch_size} training samples')
print(f'{len(test_data_loader)} validation batches')

1875 training batches
60000 training samples
313 validation batches


In [174]:
# sample model
import torch.nn as nn

model = nn.Sequential(
    nn.Conv2d(1, 8, kernel_size=(3,3)),
    nn.ReLU(),
    nn.MaxPool2d(2),
    nn.Dropout(p=0.5),
    
    nn.Conv2d(8, 8, kernel_size=(3,3)),
    nn.ReLU(),
    nn.MaxPool2d(2),
    nn.Dropout(p=0.5),
    
    nn.Flatten(),
    nn.Linear(25*8, 64),
    nn.ReLU(),
    nn.Linear(64, 10)
    # PyTorch implementation of cross-entropy loss includes softmax layer
)

In [175]:
# why don't we take a look at the shape of the weights for each layer
for p in model.parameters():
    print(p.data.shape)

torch.Size([8, 1, 3, 3])
torch.Size([8])
torch.Size([8, 8, 3, 3])
torch.Size([8])
torch.Size([64, 200])
torch.Size([64])
torch.Size([10, 64])
torch.Size([10])


In [176]:
# our model has some # of parameters:
count = 0
for p in model.parameters():
    n_params = np.prod(list(p.data.shape)).item()
    count += n_params
print(f'total params: {count}')

total params: 14178


In [177]:
# For a multi-class classification problem
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.RMSprop(model.parameters())

In [178]:
# Train the model for 10 epochs, iterating on the data in batches
n_epochs = 10

# store metrics
training_accuracy_history = np.zeros([n_epochs, 1])
training_loss_history = np.zeros([n_epochs, 1])
validation_accuracy_history = np.zeros([n_epochs, 1])
validation_loss_history = np.zeros([n_epochs, 1])

for epoch in range(n_epochs):
    print(f'Epoch {epoch+1}/10:', end='')
    train_total = 0
    train_correct = 0
    # train
    model.train()
    for i, data in enumerate(training_data_loader):
        images, labels = data
        optimizer.zero_grad()
        # forward pass
        output = model(images)
        # calculate categorical cross entropy loss
        loss = criterion(output, labels)
        # backward pass
        loss.backward()
        optimizer.step()
        
        # track training accuracy
        _, predicted = torch.max(output.data, 1)
        train_total += labels.size(0)
        train_correct += (predicted == labels).sum().item()
        # track training loss
        training_loss_history[epoch] += loss.item()
        # progress update after 180 batches (~1/10 epoch for batch size 32)
        if i % 180 == 0: print('.',end='')
    training_loss_history[epoch] /= len(training_data_loader)
    training_accuracy_history[epoch] = train_correct / train_total
    print(f'\n\tloss: {training_loss_history[epoch,0]:0.4f}, acc: {training_accuracy_history[epoch,0]:0.4f}',end='')
        
    # validate
    test_total = 0
    test_correct = 0
    with torch.no_grad():
        model.eval()
        for i, data in enumerate(test_data_loader):
            images, labels = data
            # forward pass
            output = model(images)
            # find accuracy
            _, predicted = torch.max(output.data, 1)
            test_total += labels.size(0)
            test_correct += (predicted == labels).sum().item()
            # find loss
            loss = criterion(output, labels)
            validation_loss_history[epoch] += loss.item()
        validation_loss_history[epoch] /= len(test_data_loader)
        validation_accuracy_history[epoch] = test_correct / test_total
    print(f', val loss: {validation_loss_history[epoch,0]:0.4f}, val acc: {validation_accuracy_history[epoch,0]:0.4f}')

Epoch 1/10:...........
	loss: 0.7356, acc: 0.7599, val loss: 0.2735, val acc: 0.9252
Epoch 2/10:...........
	loss: 0.5097, acc: 0.8373, val loss: 0.2748, val acc: 0.9234
Epoch 3/10:...

KeyboardInterrupt: 

Above, we output the training loss/accuracy as well as the validation loss and accuracy. Not bad! Let's see if you can do better.

In [186]:
# sample model
import torch.nn as nn

model = nn.Sequential(
    nn.Conv2d(1, 8, kernel_size=(10,10), padding=1),
    nn.BatchNorm2d(8),
    nn.ReLU(),
    nn.Dropout(0.1),
    nn.Conv2d(8, 8, kernel_size=(5,5), padding=1),
    nn.BatchNorm2d(8),
    nn.ReLU(),
    nn.MaxPool2d(2),
    nn.Conv2d(8, 8, kernel_size=(3,3), padding=1),
    nn.BatchNorm2d(8),
    nn.ReLU(),
    nn.Conv2d(8, 8, kernel_size=(3,3), padding = 1),
    nn.BatchNorm2d(8),
    nn.ReLU(),
    nn.Conv2d(8, 8, kernel_size=(3,3), padding = 1),
    nn.BatchNorm2d(8),
    nn.ReLU(),
    nn.MaxPool2d(2),
    nn.Flatten(),
    nn.Linear(128, 64),
    nn.ReLU(),
    nn.Linear(64, 10)
    # PyTorch implementation of cross-entropy loss includes softmax layer
)

In [187]:
for p in model.parameters():
    print(p.data.shape)

torch.Size([8, 1, 10, 10])
torch.Size([8])
torch.Size([8])
torch.Size([8])
torch.Size([8, 8, 5, 5])
torch.Size([8])
torch.Size([8])
torch.Size([8])
torch.Size([8, 8, 3, 3])
torch.Size([8])
torch.Size([8])
torch.Size([8])
torch.Size([8, 8, 3, 3])
torch.Size([8])
torch.Size([8])
torch.Size([8])
torch.Size([8, 8, 3, 3])
torch.Size([8])
torch.Size([8])
torch.Size([8])
torch.Size([64, 128])
torch.Size([64])
torch.Size([10, 64])
torch.Size([10])


In [188]:
# our model has some # of parameters:
count = 0
for p in model.parameters():
    n_params = np.prod(list(p.data.shape)).item()
    count += n_params
print(f'total params: {count}')

total params: 13154


In [189]:
# For a multi-class classification problem
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [190]:
# Train the model for 10 epochs, iterating on the data in batches
n_epochs = 10

# store metrics
training_accuracy_history = np.zeros([n_epochs, 1])
training_loss_history = np.zeros([n_epochs, 1])
validation_accuracy_history = np.zeros([n_epochs, 1])
validation_loss_history = np.zeros([n_epochs, 1])

for epoch in range(n_epochs):
    print(f'Epoch {epoch+1}/10:', end='')
    train_total = 0
    train_correct = 0
    # train
    model.train()
    for i, data in enumerate(training_data_loader):
        images, labels = data
        optimizer.zero_grad()
        # forward pass
        output = model(images)
        # calculate categorical cross entropy loss
        loss = criterion(output, labels)
        # backward pass
        loss.backward()
        optimizer.step()
        
        # track training accuracy
        _, predicted = torch.max(output.data, 1)
        train_total += labels.size(0)
        train_correct += (predicted == labels).sum().item()
        # track training loss
        training_loss_history[epoch] += loss.item()
        # progress update after 180 batches (~1/10 epoch for batch size 32)
        if i % 180 == 0: print('.',end='')
    training_loss_history[epoch] /= len(training_data_loader)
    training_accuracy_history[epoch] = train_correct / train_total
    print(f'\n\tloss: {training_loss_history[epoch,0]:0.4f}, acc: {training_accuracy_history[epoch,0]:0.4f}',end='')
        
    # validate
    test_total = 0
    test_correct = 0
    with torch.no_grad():
        model.eval()
        for i, data in enumerate(test_data_loader):
            images, labels = data
            # forward pass
            output = model(images)
            # find accuracy
            _, predicted = torch.max(output.data, 1)
            test_total += labels.size(0)
            test_correct += (predicted == labels).sum().item()
            # find loss
            loss = criterion(output, labels)
            validation_loss_history[epoch] += loss.item()
        validation_loss_history[epoch] /= len(test_data_loader)
        validation_accuracy_history[epoch] = test_correct / test_total
    print(f', val loss: {validation_loss_history[epoch,0]:0.4f}, val acc: {validation_accuracy_history[epoch,0]:0.4f}')

Epoch 1/10:...........
	loss: 0.1963, acc: 0.9410, val loss: 0.0633, val acc: 0.9803
Epoch 2/10:...........
	loss: 0.0743, acc: 0.9771, val loss: 0.0504, val acc: 0.9848
Epoch 3/10:...........
	loss: 0.0593, acc: 0.9817, val loss: 0.0445, val acc: 0.9868
Epoch 4/10:...........
	loss: 0.0498, acc: 0.9842, val loss: 0.0440, val acc: 0.9863
Epoch 5/10:...........
	loss: 0.0453, acc: 0.9858, val loss: 0.0394, val acc: 0.9874
Epoch 6/10:...........
	loss: 0.0393, acc: 0.9874, val loss: 0.0355, val acc: 0.9877
Epoch 7/10:...........
	loss: 0.0383, acc: 0.9875, val loss: 0.0388, val acc: 0.9883
Epoch 8/10:...........
	loss: 0.0339, acc: 0.9890, val loss: 0.0415, val acc: 0.9872
Epoch 9/10:...........
	loss: 0.0312, acc: 0.9899, val loss: 0.0360, val acc: 0.9880
Epoch 10/10:...........
	loss: 0.0291, acc: 0.9902, val loss: 0.0314, val acc: 0.9899
