# Assignment 1
## [Section 2]
This assignment will make you familier with 
1. loading and preprocessing data using built-in function
2. how to construct a simple CNN model
3. the training and testing pipeline


In this assignment, you might find some tutorials useful, such as https://pytorch.org/tutorials/beginner/basics/intro.html.

In [1]:
# Import dependencies.
import random
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

In [2]:
# Set up your device 
cuda = torch.cuda.is_available()
device = torch.device("cuda:0" if cuda else "cpu")

In [3]:
# Set up random seed to 1008. Do not change the random seed.
# Yes, these are all necessary when you run experiments!
seed = 1008
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if cuda:
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

## 1. Data: MNIST [2 pt]
#### Load the MNIST training and test dataset using $\texttt{torch.utils.data.DataLoader}$ and $\texttt{torchvision.datasets}$. 

This dataset consists of images of handwritten digit, and thus the number of classes is 10. The shape of image in MNIST dataset is (28, 28, 1)

The normalization parameters we will use is (0.1307, 0.3081)

More details please refer to  http://yann.lecun.com/exdb/mnist/.

### 1.1. Load Training Set [1 pt]

In [5]:
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]) 
# Load the MNIST training set with batch size 128, apply data shuffling and normalization 
# train_loader = TODO
train_loader = DataLoader(datasets.MNIST('data', train=True, download=True, transform=transform), batch_size=128, shuffle=True)



Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:02<00:00, 4633574.53it/s]


Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 220030.29it/s]


Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:02<00:00, 654979.33it/s]


Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 1069173.24it/s]

Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw






### 1.2. Load Test Set [1 pt]

In [6]:
# Load the MNIST test set with batch size 128, apply normalization
# test_loader = TODO

test_loader = DataLoader(datasets.MNIST('data', train=False, download=True, transform=transform), batch_size=128, shuffle=False)

## 2. Models [3 pts]
#### You are going to define two convolutional neural networks which are trained to classify MNIST digits

### 2.1. CNN without Batch Norm [2 pts]

In [17]:
# Fill in the values below that make this network valid for MNIST data
# Hint: to make sure these, you may calculate the shape of x of every line in the forward.
conv1_in_ch = 1
conv2_in_ch = 20
fc1_in_features = 800
fc2_in_features = 500
n_classes = 10

In [19]:
class NetWithoutBatchNorm(nn.Module):
    def __init__(self):
        super(NetWithoutBatchNorm, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=conv1_in_ch, out_channels=20, kernel_size=5, stride=1)
        self.conv2 = nn.Conv2d(in_channels=conv2_in_ch, out_channels=50, kernel_size=5, stride=1)
        self.fc1 = nn.Linear(in_features=fc1_in_features, out_features=500)
        self.fc2 = nn.Linear(in_features=fc2_in_features, out_features=n_classes)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        x = x.view(-1, fc1_in_features) # reshaping
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        
        # Return the log_softmax of x.
        return F.log_softmax(x, dim=1)

### 2.2. CNN with Batch Norm [1 pt]

In [25]:
# Fill in the values below that make this network valid for MNIST data
# Hint: to make sure these, you may calculate the shape of x of every line in the forward.
conv1_bn_size = 20
conv2_bn_size = 50
fc1_bn_size  = 500

In [23]:
# Define the CNN with architecture explained in Part 2.2
class NetWithBatchNorm(nn.Module):
    def __init__(self):
        super(NetWithBatchNorm, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=conv1_in_ch, out_channels=20, kernel_size=5, stride=1)
        self.conv1_bn = nn.BatchNorm2d(conv1_bn_size)
        self.conv2 = nn.Conv2d(in_channels=conv2_in_ch, out_channels=50, kernel_size=5, stride=1)
        self.conv2_bn = nn.BatchNorm2d(conv2_bn_size)
        self.fc1 = nn.Linear(in_features=fc1_in_features, out_features=500)
        self.fc1_bn = nn.BatchNorm1d(fc1_bn_size)
        self.fc2 = nn.Linear(in_features=fc2_in_features, out_features=n_classes)

    def forward(self, x):
        x = F.relu(self.conv1_bn(self.conv1(x)))
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        x = F.relu(self.conv2_bn(self.conv2(x)))
        x = F.max_pool2d(x, kernel_size=2, stride=2)
        x = x.view(-1, fc1_in_features)
        x = F.relu(self.fc1_bn(self.fc1(x)))
        x = self.fc2(x)

        # Return the log_softmax of x.
        return F.log_softmax(x, dim=1)


## 3. Training & Evaluation [4 pts]

### 3.1. Define training method [1 pt]

In [15]:
def train(model, device, train_loader, optimizer, epoch, log_interval = 100):
    # Set model to training mode
    model.train()
    # Loop through data points
    for batch_idx, (data, target) in enumerate(train_loader):
    
        # Send data and target to device
        data, target  =data.to(device), target.to(device)
        
        # Zero out the ortimizer
        optimizer.zero_grad()
        
        # Pass data through model
        output = model(data)
        
        # Compute the negative log likelihood loss
        loss = F.nll_loss(output, target)
        
        # Backpropagate loss
        loss.backward()
        
        # Make a step with the optimizer
        optimizer.step()
    
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

### 3.2. Define test method [1 pt]

In [13]:
# Define test method
def test(model, device, test_loader):
    # Set model to evaluation mode
    model.eval()
    # Variable for the total loss 
    test_loss = 0
    # Counter for the correct predictions
    num_correct = 0
    
    # don't need autograd for eval
    with torch.no_grad():
        # Loop through data points
        for data, target in test_loader:

            # Send data to device
            data.to(device)
            
            # Pass data through model
            output = model(data)
            
            # Compute the negative log likelihood loss with reduction='sum' and add to total test_loss
            test_loss += F.nll_loss(output, target, reduction='sum').item()
            
            # Get predictions from the model for each data point
            pred = output.argmax(dim=1, keepdim=True)
            
            # Add number of correct predictions to total num_correct
            num_correct += pred.eq(target.view_as(pred)).sum().item()
    
    # Compute the average test_loss
    avg_test_loss = test_loss / len(test_loader.dataset)
    
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        avg_test_loss, num_correct, len(test_loader.dataset),
        100. * num_correct / len(test_loader.dataset)))

### 3.3 Train NetWithoutBatchNorm() [1 pt]

In [20]:
# Deifne model and sent to device
model = NetWithoutBatchNorm().to(device)

# Optimizer: SGD with learning rate of 1e-2 and momentum of 0.5
optimizer = optim.SGD(model.parameters(), lr=1e-2, momentum=0.5)

# Training loop with 10 epochs
for epoch in range(1, 10 + 1):

    # Train model
    train(model, device, train_loader, optimizer, epoch)

    # Test model
    test(model, device, test_loader)


Test set: Average loss: 0.1565, Accuracy: 9546/10000 (95%)


Test set: Average loss: 0.0929, Accuracy: 9726/10000 (97%)


Test set: Average loss: 0.0664, Accuracy: 9802/10000 (98%)


Test set: Average loss: 0.0547, Accuracy: 9833/10000 (98%)


Test set: Average loss: 0.0512, Accuracy: 9839/10000 (98%)


Test set: Average loss: 0.0480, Accuracy: 9849/10000 (98%)


Test set: Average loss: 0.0525, Accuracy: 9831/10000 (98%)


Test set: Average loss: 0.0452, Accuracy: 9865/10000 (99%)


Test set: Average loss: 0.0386, Accuracy: 9867/10000 (99%)


Test set: Average loss: 0.0350, Accuracy: 9889/10000 (99%)



### 3.4 Train NetWithBatchNorm() [1 pt]

In [26]:
# Deifne model and sent to device
model = NetWithBatchNorm().to(device)

# Optimizer: SGD with learning rate of 1e-2 and momentum of 0.5
optimizer = optim.SGD(model.parameters(), lr=1e-2, momentum=0.5)

# Training loop with 10 epochs
for epoch in range(1, 10 + 1):
    
    # Train model
    train(model, device, train_loader, optimizer, epoch)
    
    # Test model
    test(model, device, test_loader)


Test set: Average loss: 0.0765, Accuracy: 9811/10000 (98%)


Test set: Average loss: 0.0521, Accuracy: 9855/10000 (99%)


Test set: Average loss: 0.0426, Accuracy: 9873/10000 (99%)


Test set: Average loss: 0.0382, Accuracy: 9883/10000 (99%)


Test set: Average loss: 0.0334, Accuracy: 9899/10000 (99%)


Test set: Average loss: 0.0326, Accuracy: 9906/10000 (99%)


Test set: Average loss: 0.0293, Accuracy: 9910/10000 (99%)


Test set: Average loss: 0.0291, Accuracy: 9913/10000 (99%)


Test set: Average loss: 0.0281, Accuracy: 9911/10000 (99%)


Test set: Average loss: 0.0279, Accuracy: 9915/10000 (99%)



## 4. Empirically, which of the models achieves higher accuracy faster? [1pt]

Answer: 