# Phase 1 - (Convolutional) Neural Networks

Please follow the notebook in order. Make sure to add code where indicated by `''' TODO '''` or `# YOUR CODE HERE`. Ensure your notebook is easy to follow. All written report answers should be provided in the notebook itself.

You may find [this](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) pytorch tutorial helpful.

In [None]:
import torch
import torch.nn as nn
import torchvision.datasets as datasets
import torchvision.transforms as transforms

import matplotlib.pyplot as plt
import numpy as np

## 1. Loading & Visualizing Data

Please use MNIST for this Phase. MNIST is a digit classification dataset, with greyscale 28x28 images, and 10 classes (numbers 0-9). See [documentaion.](https://pytorch.org/vision/0.15/generated/torchvision.datasets.MNIST.html) You can also see the MNIST_Tutorial.ipynb

Please complete loading the data.

Please also complete `show_imgs()`. Function should input the dataloader, and show random 10 images and their labels as title to plot. Make sure the 10 images are organized in a grid or format or with matplotlib axis.

In [None]:
train_data = ''' TODO '''

test_data = ''' TODO '''

batch_size = 32

train_loader = ''' TODO '''

test_loader = ''' TODO '''

In [None]:
# Function should input the dataloader, and show random 10 images and their labels as title to plot
def show_imgs():
    ''' TODO '''

## 2. Utility Functions

Please complete `train()` and `plot_learning_curves()` functions. `test_accuracy()` has already been provided to you.

In [None]:
def test_accuracy(model, test_loader, input_size, device):
    model.to(device)
    correct = 0
    total = 0
    with torch.no_grad():
        for test_data in test_loader:
            images, labels = test_data[0].cuda(), test_data[1].cuda()
            images = images.view(-1, input_size)
            outputs = net(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    print('Accuracy: %d %%' % (100 * correct / total))


### 2.1 Train Function
Please refer to the train function in `MNIST_Classifier.ipynb` to complete this function. You will need to store the store the loss and accuracies per iteration to plot. Please print the loss, accuracy and time taken for training each epoch.

In [None]:
def train(model, loss_fn, optimizer, train_loader, batch_size, num_epochs, device):
    ''' TODO '''

### 2.2 Plot Learning Curves Function
Plot the loss and accuracies from training

In [None]:
def plot_learning_curve(''' TODO '''):
    ''' TODO '''

## 3) Define CNNs

A 1-Layer Architecture is already defined for you (`Net`) as a reference. Please complete the definitions of the 2-Layer CNN (`Net2`), 5-Layer CNN (`Net5`). The network architecture definitions are provided to you. You must calculate some dimensions yourself.

**NOTE:** For now, please use ReLU activation. You will experiment with other activations in Section 5.

In [None]:
class Net(nn.Module):
    def __init__(self, input_size, num_classes):
        super(Net,self).__init__()
        self.fc1 = nn.Linear(input_size, 500)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(500, num_classes)

    def forward(self,x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

### 3.1) 2-Layer CNN
Complete `Net2` class.

 Network Architecture:
* ***Layer 1 (Input)***: Convolutional; input channel = 1, output channel = 20, kernel size = 3, step size = 1.
* ***Layer 2 (Output)***: Fully connected; input dimension = < you find out >, output dimension = < you find out >.

In [None]:
# Define 2-Layer Network
class Net2(nn.Module):
    ''' TODO '''

## 4) Train and Evaluate

### 4.1) Here, you must test your utility functions (`train()` and `plot_learning_curves()`)with the 1-Layer Model. Please at this stage ensure your utility functions are working correctly.

**NOTE**: You can repeatedly use this below code black in Section 5 to run experiments, while making changes to the hyperparameters as requested.

In [None]:
# Define Parameters
input_size = ''' TODO ''''
num_classes = ''' TODO '''
lr = 0.01
num_epochs = 10

# Instantiate 1-Layer Model
net = Net(input_size, num_classes)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
net.to(device)

# Define Loss func and Optimizer
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=lr)

# Train Model
train(''' TODO ''')

# Plot Learning Curves
plot_learning_curve(''' TODO ''')

# Evaluate on Test Set
test_accuracy(model=net, test_loader=test_loader, input_size=input_size)

### 4.2) Please also train your 2-Layer network to ensure it is working properly.

In [None]:
# Define Parameters
input_size = ''' TODO ''''
num_classes = ''' TODO '''
lr = 0.01
num_epochs = 10

# Instantiate 1-Layer Model
net = Net2(''' TODO ''')
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
net.to(device)

# Define Loss func and Optimizer
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=lr)

# Train Model
train(''' TODO ''')

# Plot Learning Curves
plot_learning_curve(''' TODO ''')

# Evaluate on Test Set
test_accuracy(model=net, test_loader=test_loader, input_size=input_size)

## 5) Experiments and Reporting

In this section you will use your utility functioins and model definitons from before, and test for different scenarios. There are report questions you must answer for all sections below. Please use the loss, accuracy, and learning curves to help answer questions.

**NOTE:** In this section we will only change different parameters, so you can call most of your functions from previous sections. Any modifications needed in code asked by the questions in Section 5, please add them in the code sections indicated by `# YOUR CODE HERE`

### 5.1) Learning Curves

#### **Q 5.1) What is a Learning curve and why is it useful? You can refer to learning curves you have plotted in Section 4.**

---

### 5.1) Learning Rates

Please plot the curves for three separate training instances with learning rates of 2e-3, 3e-4, 5e-2. Use the 2-Layer CNN.

#### **Q 5.2) For each learning rate, explain if the model is underfitting, overfitting, or is training well. Explain the difference between high learning rates and low learning rates in terms of the optimization process.**

In [None]:
# YOUR CODE HERE

---

### 5.3) Optimizer
The optimizer we have used before is Adam. Change the optimizer to use stochastic gradient descent (SGD), and SGD with momentum. Use 2-Layer CNN.

#### **Q 5.3) Explain the differences in Adam and both SGD optimizations. Compare the results of each optimizer based on the learning curves.**

In [None]:
# YOUR CODE HERE

---

### 5.4) Initializing Weights

Initialize the weights to all zeros, all ones and all randomly initialized with a normal distribution. Use 2-Layer CNN.

#### **Q 5.4) How is the training process affected when we initialize our network weights differently? Based on what you observe, give a recommendation as to how weights should be initialized. Explain your reasoning.**

In [None]:
# YOUR CODE HERE

---

### 5.5) Activation Function

Please change the relu function in the example code to a tanh function. Use 2-Layer CNN.

#### **Q 5.5) How does changing the activation function to tanh affect the performance. Is it better or worse? Explain why.**

In [None]:
# YOUR CODE HERE

---

### 5.6) Batch Size

Please use batch size of 128, 256, 512. Use 2-Layer CNN.

#### **Q 5.6) How does changing batch size affect the training process?**

In [None]:
# YOUR CODE HERE

---

### 5.7) Different Network Architectures

Please complete the `Net5` and `FCN` definitions. The archtiectures are defined below.

Train `Net`, `Net2`, `Net5`, and `FCN` using batch size of 64, the best learning rate from Section 5.1, and the best optimizer froom Section 5.5

#### **Q 5.7.1) Explain which model is better and why. Use your learning curves as well as what you know about model capacity to explain your reasoning.**

#### **Q 5.7.2) Which model converges to a minimum faster? Why? What hyperparameters would you tune in order to get a model to converge faster?**

#### **Q 5.7.3) Explain the purpose of the pooling layer**

#### **Q 5.7.4)  Is it possible for a model to have a smaller final loss, even if it has worse test accuracy.**

#### **Q 5.7.5) Explain the difference between the CNN models and FCN.**

In [None]:
# YOUR CODE HERE (NET)

In [None]:
# YOUR CODE HERE (NET2)

#### 5-Layer CNN
Similar to the 2-Layer model, make a new class ```Net5```

* ***Layer 1 (Input)***: Convolutional, input channel = 1, output channel = 32, kernel size = 5, stride = 1, padding = 2.
* ***Layer 2 (Hidden 1)***: Pooling, kernel size = 2, stride = 2.
* ***Layer 3 (Hidden 2)***: Convolutional, input channel = < you find out> , output channel = 64, kernel size = 5, stride = 1 padding = 2.
* ***Layer 4 (Hidden 3)***: Fully connected, input channel = < you find out>, output channel = 1024.
* ***Layer 5 (Output)***: Fully connected, input channel = < you find out>, < you find out >

In [None]:
# YOUR CODE HERE (NET5)

# ...

# Define 5-Layer Network
class Net5(nn.Module):
    ''' TODO '''

# ...

#### FCN

Here we will define a Fully Connected Network `FCN` (Not a CNN).

- **Layer 1 (Input):** Size = < you find out >
- **Layer 2 (Hidden 1):** 256 neurons
- **Layer 3 (Hidden 2):** 256 neurons
- **Layer 4 (Output):** Size = < you find out >

In [None]:
# YOUR CODE HERE (FCN)

# ...

# Define FCN
class FCN(nn.Module):
    ''' TODO '''

# ...



---



---



### 5.8) Batch Normalization

Choose the model that performs the best (`Net`, `Net2`, `Net5`, `FCN`). Add batch normalization layers where you see fit. Repeat and train and plot learning curves.

#### **Q 5.8) Explain the purpose of the batch normalization layers,and how they affect training.**



In [None]:
# YOUR CODE HERE

---