# Lab (2) Training SimpleNN on CIFAR-10
In this project, you will use the SimpleNN model to perform image classification on CIFAR-10. CIFAR-10 orginally contains 60K images from 10 categories. We split it into 45K/5K/10K images to serve as train/valiation/test set. We only release the ground-truth labels of training/validation dataset to you.

## Step 0: Set up the SimpleNN model
As you have practiced to implement simple neural networks in Homework 1, we just prepare the implementation for you.

In [1]:
# import necessary dependencies
import argparse
import os, sys
import time
import datetime
from tqdm import tqdm_notebook as tqdm

import torch
import torch.nn as nn

import torch.nn.functional as F
import numpy as np

In [19]:
# define the SimpleNN without BN;
class SimpleNN_nobn(nn.Module):
    def __init__(self):
        super(SimpleNN_nobn, self).__init__()
        self.conv1 = nn.Conv2d(3, 8, 5)
        self.conv2 = nn.Conv2d(8, 16, 3)
        self.fc1   = nn.Linear(16*6*6, 120)
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)

    def forward(self, x):
        out = F.relu(self.conv1(x))
        out = F.max_pool2d(out, 2)
        out = F.relu(self.conv2(out))
        out = F.max_pool2d(out, 2)
        out = out.view(out.size(0), -1)
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        out = self.fc3(out)
        return out

In [2]:
# define the SimpleNN mode with BN;
class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 8, 5)
        self.conv2 = nn.Conv2d(8, 16, 3)
        self.fc1   = nn.Linear(16*6*6, 120)
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)
        self.conv1_bn   = nn.BatchNorm2d(8)
        self.conv2_bn   = nn.BatchNorm2d(16)

    def forward(self, x):
        out = F.relu(self.conv1(x))
        out = F.relu(self.conv1_bn(out))
        out = F.max_pool2d(out, 2)
        out = F.relu(self.conv2(out))
        out = F.relu(self.conv2_bn(out))
        out = F.max_pool2d(out, 2)
        out = out.view(out.size(0), -1)
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        out = self.fc3(out)
        return out

In [3]:
# Question 3 Part b) Swish Activation 
    
# define the SwishNN mode;
class SwishNN(nn.Module):
    def __init__(self):
        super(SwishNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 8, 5)
        self.conv2 = nn.Conv2d(8, 16, 3)
        self.fc1   = nn.Linear(16*6*6, 120)
        self.fc2   = nn.Linear(120, 84)
        self.fc3   = nn.Linear(84, 10)
        self.conv1_bn   = nn.BatchNorm2d(8)
        self.conv2_bn   = nn.BatchNorm2d(16)
        
    def swish(self, input):
        return input * (1.0 / (1.0 + torch.exp(-input)))

    def forward(self, x):
        out = self.swish(self.conv1(x))
        out = self.swish(self.conv1_bn(out))
        out = F.max_pool2d(out, 2)
        out = self.swish(self.conv2(out))
        out = self.swish(self.conv2_bn(out))
        out = F.max_pool2d(out, 2)
        out = out.view(out.size(0), -1)
        out = self.swish(self.fc1(out))
        out = self.swish(self.fc2(out))
        out = self.fc3(out)
        return out

### Question (a)
Here is a sanity check to verify the implementation of SimpleNN. 
You need to:
1. Write down your code.
2. **In the PDF report**, give a brief description on how the code helps you know that SimpleNN is implemented correctly.

In [4]:
#############################################
# your code here
# sanity check for the correctness of SimpleNN
net = SimpleNN()
device = 'cuda' if torch.cuda.is_available() else 'cpu'
net.to(device)
#############################################

SimpleNN(
  (conv1): Conv2d(3, 8, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
  (conv1_bn): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv2_bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)

## Step 1: Set up preprocessing functions
Preprocessing is very important as discussed in the lecture.
You will need to write preprocessing functions with the help of *torchvision.transforms* in this step.
You can find helpful tutorial/API at [here](https://pytorch.org/vision/stable/transforms.html).

### Question (b)
For the question, you need to:
1. Complete the preprocessing code below.
2. **In the PDF report**, briefly describe what preprocessing operations you used and what are the purposes of them.

Hint: 
1. Only two operations are necessary to complete the basic preprocessing here.
2. The raw input read from the dataset will be PIL images.
3. Data augmentation operations are not mendatory, but feel free to incorporate them if you want.
4. Reference value for mean/std of CIFAR-10 images (assuming the pixel values are within [0,1]): mean (RGB-format): (0.4914, 0.4822, 0.4465), std (RGB-format): (0.2023, 0.1994, 0.2010)

In [5]:
# useful libraries
import torchvision
import torchvision.transforms as transforms

#############################################
# your code here
# specify preprocessing function
# convert it to a tensor, and normalize
# add random crop with padding of 4 and random flips to image
transform_train = transforms.Compose([
    transforms.RandomCrop(size=(32, 32), padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994,0.2010))])

transform_val = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994,0.2010))])
#############################################

## Step 2: Set up dataset and dataloader

### Question (c)
Set up the train/val datasets and dataloders that are to be used during the training. Check out the [official API](https://pytorch.org/docs/stable/data.html) for more information about **torch.utils.data.DataLoader**.

Here, you need to:
1. Complete the code below.

In [6]:
# do NOT change these
from tools.dataset import CIFAR10
from torch.utils.data import DataLoader

torch.manual_seed(0)

# a few arguments, do NOT change these
DATA_ROOT = "./data"
TRAIN_BATCH_SIZE = 128
VAL_BATCH_SIZE = 100

#############################################
# your code here
# construct dataset
train_set = CIFAR10(
    root=DATA_ROOT, 
    mode='train', 
    download=True,
    transform=transform_train    # your code
)
val_set = CIFAR10(
    root=DATA_ROOT, 
    mode='val', 
    download=True,
    transform=transform_val   # your code
)

# construct dataloader
train_loader = DataLoader(
    train_set, 
    batch_size=TRAIN_BATCH_SIZE,  # your code
    shuffle=True,     # your code
    num_workers=4
)
val_loader = DataLoader(
    val_set, 
    batch_size=VAL_BATCH_SIZE,  # your code
    shuffle=False,     # your code
    num_workers=4
)
#############################################

Using downloaded and verified file: ./data/cifar10_trainval_F22.zip
Extracting ./data/cifar10_trainval_F22.zip to ./data
Files already downloaded and verified
Using downloaded and verified file: ./data/cifar10_trainval_F22.zip
Extracting ./data/cifar10_trainval_F22.zip to ./data
Files already downloaded and verified


## Step 3: Instantiate your SimpleNN model and deploy it to GPU devices.
### Question (d)
You may want to deploy your model to GPU device for efficient training. Please assign your model to GPU if possible. If you are training on a machine without GPUs, please deploy your model to CPUs.

Here, you need to:
1. Complete the code below.
2. **In the PDF report**, briefly describe how you verify that your model is indeed deployed on GPU. (Hint: check $\texttt{nvidia-smi}$.)

In [7]:
# specify the device for computation
#############################################
# your code here
if torch.cuda.is_available():
    # a CUDA device object
    device = torch.device("cuda")          
    # casting tensors to a cuda data type
    dtype = torch.cuda.FloatTensor  
    print(torch.cuda.is_available())
    print(torch.cuda.get_device_name(0))
else:
    device = torch.device("cpu")
    
#############################################

True
TITAN Xp


## Step 4: Set up the loss function and optimizer
Loss function/objective function is used to provide "feedback" for the neural networks. Typically, we use multi-class cross-entropy as the loss function for classification models. As for the optimizer, we will use SGD with momentum. 

### Question (e)
Here, you need to:
1. Set up the cross-entropy loss as the criterion. (Hint: there are implemented functions in **torch.nn**)
2. Specify a SGD optimizer with momentum. (Hint: there are implemented functions in **torch.optim**)

In [8]:
import torch.nn as nn
import torch.optim as optim

# hyperparameters, do NOT change right now
# initial learning rate
INITIAL_LR = 0.01

# momentum for optimizer
MOMENTUM = 0.9

# L2 regularization strength
REG = 1e-4

#############################################
# your code here
# create loss function
criterion = nn.CrossEntropyLoss()

# Add optimizer
optimizer = torch.optim.SGD(params = net.parameters(), lr=INITIAL_LR, momentum = MOMENTUM, weight_decay = REG)
#############################################

## Step 5: Start the training process.

### Question (f)/(g)
Congratulations! You have completed all of the previous steps and it is time to train our neural network.

Here you need to:
1. Complete the training codes.
2. Actually perform the training.

Hint: Training a neural network usually repeats the following 4 steps: 

**i) Get a batch of data from the dataloader and copy it to your device (GPU).**

**ii) Do a forward pass to get the outputs from the neural network and compute the loss. Be careful about your inputs to the loss function. Are the inputs required to be the logits or softmax probabilities?)**

**iii) Do a backward pass (back-propagation) to compute gradients of all weights with respect to the loss.**

**iiii) Update the model weights with the optimizer.**

You will also need to compute the accuracy of training/validation samples to track your model's performance over each epoch (the accuracy should be increasing as you train for more and more epochs).


In [20]:
# no batch norm Simple NN
net_nobn = SimpleNN_nobn()
device = 'cuda' if torch.cuda.is_available() else 'cpu'
net_nobn.to(device)
optimizer = torch.optim.SGD(params = net_nobn.parameters(), lr=INITIAL_LR, momentum = MOMENTUM, weight_decay = REG)

In [21]:
# no batch norm
EPOCHS = 30

DECAY_EPOCHS = INITIAL_LR / EPOCHS

# the folder where the trained model is saved
CHECKPOINT_FOLDER = "./saved_model"

# start the training/validation process
# the process should take about 5 minutes on a GTX 1070-Ti
# if the code is written efficiently.
best_val_acc = 0
current_learning_rate = INITIAL_LR

print("==> Training starts!")
print("="*50)

for i in range(0, EPOCHS):
    #######################
    # your code here
    # switch to train mode
    net_nobn.train()
    #######################
    print("Epoch %d:" %i)
    # this help you compute the training accuracy
    total_examples = 0
    correct_examples = 0

    train_loss = 0 # track training loss if you want
    
    # Train the model for 1 epoch.
    for batch_idx, (inputs, targets) in enumerate(train_loader):
        ####################################
        # your code here
        # copy inputs to device
        inputs, targets = inputs.to(device), targets.to(device) 
        
        # compute the output and loss
        y_pred = net_nobn(inputs)
        loss = criterion(y_pred, targets)
        
        # zero the gradient
        optimizer.zero_grad()
        
        # backpropagation
        loss.backward()
        train_loss += loss.item()
                         
        # apply gradient and update the weights
        optimizer.step()

        # count the number of correctly predicted samples in the current batch
        _, predicted = torch.max(y_pred.data, 1)
        correct_examples += (predicted == targets).sum().item()
        total_examples += targets.size(0)
        ####################################            
    avg_loss = train_loss / len(train_loader)
    avg_acc = correct_examples / total_examples
    print("Training loss: %.4f, Training accuracy: %.4f" %(avg_loss, avg_acc))

    # Validate on the validation dataset
    #######################
    # your code here
    # switch to eval mode
    net_nobn.eval()
    #######################

    # this help you compute the validation accuracy
    total_examples = 0
    correct_examples = 0
    
    val_loss = 0 # again, track the validation loss if you want

    # disable gradient during validation, which can save GPU memory
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(val_loader):
            ####################################
            # your code here
            # copy inputs to device
            inputs, targets = inputs.to(device), targets.to(device)
            
            # compute the output and loss
            y_pred = net_nobn(inputs)
            loss = criterion(y_pred, targets)
            val_loss += loss.item()
            
            # count the number of correctly predicted samples in the current batch
            _, predicted = torch.max(y_pred.data, 1)
            correct_examples += (predicted == targets).sum().item()
            total_examples += targets.size(0)
            ####################################

    avg_loss = val_loss / len(val_loader)
    avg_acc = correct_examples / total_examples
    print("Validation loss: %.4f, Validation accuracy: %.4f" % (avg_loss, avg_acc))
    
    # save the model checkpoint
    if avg_acc > best_val_acc:
        best_val_acc = avg_acc
        if not os.path.exists(CHECKPOINT_FOLDER):
            os.makedirs(CHECKPOINT_FOLDER)
        print("Saving ...")
        state = {'state_dict': net_nobn.state_dict(),
                 'epoch': i,
                 'lr': current_learning_rate}
        torch.save(state, os.path.join(CHECKPOINT_FOLDER, 'simplenn_nobn.pth'))
        
    print('')

print("="*50)
print(f"==> Optimization finished! Best validation accuracy: {best_val_acc:.4f}")

==> Training starts!
Epoch 0:
Training loss: 1.9658, Training accuracy: 0.2665
Validation loss: 1.6299, Validation accuracy: 0.4060
Saving ...

Epoch 1:
Training loss: 1.5814, Training accuracy: 0.4172
Validation loss: 1.4513, Validation accuracy: 0.4798
Saving ...

Epoch 2:
Training loss: 1.4272, Training accuracy: 0.4835
Validation loss: 1.3145, Validation accuracy: 0.5272
Saving ...

Epoch 3:
Training loss: 1.3336, Training accuracy: 0.5178
Validation loss: 1.1993, Validation accuracy: 0.5690
Saving ...

Epoch 4:
Training loss: 1.2640, Training accuracy: 0.5484
Validation loss: 1.1563, Validation accuracy: 0.5960
Saving ...

Epoch 5:
Training loss: 1.2143, Training accuracy: 0.5679
Validation loss: 1.1313, Validation accuracy: 0.6072
Saving ...

Epoch 6:
Training loss: 1.1752, Training accuracy: 0.5817
Validation loss: 1.1375, Validation accuracy: 0.6008

Epoch 7:
Training loss: 1.1478, Training accuracy: 0.5922
Validation loss: 1.0677, Validation accuracy: 0.6304
Saving ...

Epoch 

In [9]:
# some hyperparameters
# total number of training epochs
EPOCHS = 30

DECAY_EPOCHS = INITIAL_LR / EPOCHS

# the folder where the trained model is saved
CHECKPOINT_FOLDER = "./saved_model"

# start the training/validation process
# the process should take about 5 minutes on a GTX 1070-Ti
# if the code is written efficiently.
best_val_acc = 0
current_learning_rate = INITIAL_LR

print("==> Training starts!")
print("="*50)

for i in range(0, EPOCHS):
    # handle the learning rate scheduler.
    if i % DECAY_EPOCHS == 0 and i != 0:
        current_learning_rate = current_learning_rate * DECAY
        for param_group in optimizer.param_groups:
            param_group['lr'] = current_learning_rate
        print("Current learning rate has decayed to %f" %current_learning_rate)
    #######################
    # your code here
    # switch to train mode
    net.train()
    #######################
    print("Epoch %d:" %i)
    # this help you compute the training accuracy
    total_examples = 0
    correct_examples = 0

    train_loss = 0 # track training loss if you want
    
    # Train the model for 1 epoch.
    for batch_idx, (inputs, targets) in enumerate(train_loader):
        ####################################
        # your code here
        # copy inputs to device
        inputs, targets = inputs.to(device), targets.to(device) 
        
        # compute the output and loss
        y_pred = net(inputs)
        loss = criterion(y_pred, targets)
        
        # zero the gradient
        optimizer.zero_grad()
        
        # backpropagation
        loss.backward()
        train_loss += loss.item()
                         
        # apply gradient and update the weights
        optimizer.step()

        # count the number of correctly predicted samples in the current batch
        _, predicted = torch.max(y_pred.data, 1)
        correct_examples += (predicted == targets).sum().item()
        total_examples += targets.size(0)
        ####################################            
    avg_loss = train_loss / len(train_loader)
    avg_acc = correct_examples / total_examples
    print("Training loss: %.4f, Training accuracy: %.4f" %(avg_loss, avg_acc))

    # Validate on the validation dataset
    #######################
    # your code here
    # switch to eval mode
    net.eval()
    #######################

    # this help you compute the validation accuracy
    total_examples = 0
    correct_examples = 0
    
    val_loss = 0 # again, track the validation loss if you want

    # disable gradient during validation, which can save GPU memory
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(val_loader):
            ####################################
            # your code here
            # copy inputs to device
            inputs, targets = inputs.to(device), targets.to(device)
            
            # compute the output and loss
            y_pred = net(inputs)
            loss = criterion(y_pred, targets)
            val_loss += loss.item()
            
            # count the number of correctly predicted samples in the current batch
            _, predicted = torch.max(y_pred.data, 1)
            correct_examples += (predicted == targets).sum().item()
            total_examples += targets.size(0)
            ####################################

    avg_loss = val_loss / len(val_loader)
    avg_acc = correct_examples / total_examples
    print("Validation loss: %.4f, Validation accuracy: %.4f" % (avg_loss, avg_acc))
    
    # save the model checkpoint
    if avg_acc > best_val_acc:
        best_val_acc = avg_acc
        if not os.path.exists(CHECKPOINT_FOLDER):
            os.makedirs(CHECKPOINT_FOLDER)
        print("Saving ...")
        state = {'state_dict': net.state_dict(),
                 'epoch': i,
                 'lr': current_learning_rate}
        torch.save(state, os.path.join(CHECKPOINT_FOLDER, 'simplenn.pth'))
        
    print('')

print("="*50)
print(f"==> Optimization finished! Best validation accuracy: {best_val_acc:.4f}")

==> Training starts!
Epoch 0:
Training loss: 1.7201, Training accuracy: 0.3657
Validation loss: 1.3892, Validation accuracy: 0.4924
Saving ...

Epoch 1:
Training loss: 1.4077, Training accuracy: 0.4885
Validation loss: 1.3052, Validation accuracy: 0.5198
Saving ...

Epoch 2:
Training loss: 1.3000, Training accuracy: 0.5326
Validation loss: 1.2319, Validation accuracy: 0.5648
Saving ...

Epoch 3:
Training loss: 1.2269, Training accuracy: 0.5643
Validation loss: 1.1558, Validation accuracy: 0.5942
Saving ...

Epoch 4:
Training loss: 1.1745, Training accuracy: 0.5795
Validation loss: 1.0751, Validation accuracy: 0.6242
Saving ...

Epoch 5:
Training loss: 1.1351, Training accuracy: 0.5944
Validation loss: 1.0413, Validation accuracy: 0.6326
Saving ...

Epoch 6:
Training loss: 1.0991, Training accuracy: 0.6102
Validation loss: 1.0401, Validation accuracy: 0.6376
Saving ...

Epoch 7:
Training loss: 1.0763, Training accuracy: 0.6212
Validation loss: 1.0496, Validation accuracy: 0.6366

Epoch 

3b)  Use empirical results to show that batch normalization allows a larger learning rate.

In [23]:
net_lr = SimpleNN()
net_lr.to(device)
# change LR to 0.05
optimizer = torch.optim.SGD(params = net_lr.parameters(), lr=0.05, momentum = MOMENTUM, weight_decay = REG)

In [24]:
EPOCHS = 30
DECAY_EPOCHS = INITIAL_LR / EPOCHS

# the folder where the trained model is saved
CHECKPOINT_FOLDER = "./saved_model"

best_val_acc = 0

print("==> Training starts!")
print("="*50)

for i in range(0, EPOCHS):
    #######################
    # your code here
    # switch to train mode
    net_lr.train()
    #######################
    print("Epoch %d:" %i)
    # this help you compute the training accuracy
    total_examples = 0
    correct_examples = 0

    train_loss = 0 # track training loss if you want
    
    # Train the model for 1 epoch.
    for batch_idx, (inputs, targets) in enumerate(train_loader):
        ####################################
        # your code here
        # copy inputs to device
        inputs, targets = inputs.to(device), targets.to(device) 
        
        # compute the output and loss
        y_pred = net_lr(inputs)
        loss = criterion(y_pred, targets)
        
        # zero the gradient
        optimizer.zero_grad()
        
        # backpropagation
        loss.backward()
        train_loss += loss.item()
                         
        # apply gradient and update the weights
        optimizer.step()

        # count the number of correctly predicted samples in the current batch
        _, predicted = torch.max(y_pred.data, 1)
        correct_examples += (predicted == targets).sum().item()
        total_examples += targets.size(0)
        ####################################            
    avg_loss = train_loss / len(train_loader)
    avg_acc = correct_examples / total_examples
    print("Training loss: %.4f, Training accuracy: %.4f" %(avg_loss, avg_acc))

    # Validate on the validation dataset
    #######################
    # your code here
    # switch to eval mode
    net_lr.eval()
    #######################

    # this help you compute the validation accuracy
    total_examples = 0
    correct_examples = 0
    
    val_loss = 0 # again, track the validation loss if you want

    # disable gradient during validation, which can save GPU memory
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(val_loader):
            ####################################
            # your code here
            # copy inputs to device
            inputs, targets = inputs.to(device), targets.to(device)
            
            # compute the output and loss
            y_pred = net_lr(inputs)
            loss = criterion(y_pred, targets)
            val_loss += loss.item()
            
            # count the number of correctly predicted samples in the current batch
            _, predicted = torch.max(y_pred.data, 1)
            correct_examples += (predicted == targets).sum().item()
            total_examples += targets.size(0)
            ####################################

    avg_loss = val_loss / len(val_loader)
    avg_acc = correct_examples / total_examples
    print("Validation loss: %.4f, Validation accuracy: %.4f" % (avg_loss, avg_acc))
    
    # save the model checkpoint
    if avg_acc > best_val_acc:
        best_val_acc = avg_acc
        if not os.path.exists(CHECKPOINT_FOLDER):
            os.makedirs(CHECKPOINT_FOLDER)
        print("Saving ...")
        state = {'state_dict': net_lr.state_dict(),
                 'epoch': i}
        torch.save(state, os.path.join(CHECKPOINT_FOLDER, 'simplenn_lr.pth'))
        
    print('')

print("="*50)
print(f"==> Optimization finished! Best validation accuracy: {best_val_acc:.4f}")

==> Training starts!
Epoch 0:
Training loss: 1.6703, Training accuracy: 0.3821
Validation loss: 1.3839, Validation accuracy: 0.5026
Saving ...

Epoch 1:
Training loss: 1.3644, Training accuracy: 0.5111
Validation loss: 1.2831, Validation accuracy: 0.5528
Saving ...

Epoch 2:
Training loss: 1.2575, Training accuracy: 0.5568
Validation loss: 1.1473, Validation accuracy: 0.5942
Saving ...

Epoch 3:
Training loss: 1.1855, Training accuracy: 0.5820
Validation loss: 1.0838, Validation accuracy: 0.6264
Saving ...

Epoch 4:
Training loss: 1.1438, Training accuracy: 0.5929
Validation loss: 1.0657, Validation accuracy: 0.6306
Saving ...

Epoch 5:
Training loss: 1.1098, Training accuracy: 0.6093
Validation loss: 1.0298, Validation accuracy: 0.6308
Saving ...

Epoch 6:
Training loss: 1.0775, Training accuracy: 0.6233
Validation loss: 0.9718, Validation accuracy: 0.6634
Saving ...

Epoch 7:
Training loss: 1.0612, Training accuracy: 0.6273
Validation loss: 0.9959, Validation accuracy: 0.6462

Epoch 

## Swish activation

In [35]:
net_swish = SwishNN()
net_swish.to(device)
optimizer = torch.optim.SGD(params = net_swish.parameters(), lr=0.01, momentum = MOMENTUM, weight_decay = REG)

In [36]:
EPOCHS = 30
DECAY_EPOCHS = INITIAL_LR / EPOCHS

# the folder where the trained model is saved
CHECKPOINT_FOLDER = "./saved_model"

best_val_acc = 0

# change LR to 0.1 
current_learning_rate = 0.1

print("==> Training starts!")
print("="*50)

for i in range(0, EPOCHS):
    # handle the learning rate scheduler.
    if i % DECAY_EPOCHS == 0 and i != 0:
        current_learning_rate = current_learning_rate * DECAY
        for param_group in optimizer.param_groups:
            param_group['lr'] = current_learning_rate
        print("Current learning rate has decayed to %f" %current_learning_rate)
    #######################
    # your code here
    # switch to train mode
    net_swish.train()
    #######################
    print("Epoch %d:" %i)
    # this help you compute the training accuracy
    total_examples = 0
    correct_examples = 0

    train_loss = 0 # track training loss if you want
    
    # Train the model for 1 epoch.
    for batch_idx, (inputs, targets) in enumerate(train_loader):
        ####################################
        # your code here
        # copy inputs to device
        inputs, targets = inputs.to(device), targets.to(device) 
        
        # compute the output and loss
        y_pred = net_swish(inputs)
        loss = criterion(y_pred, targets)
        
        # zero the gradient
        optimizer.zero_grad()
        
        # backpropagation
        loss.backward()
        train_loss += loss.item()
                         
        # apply gradient and update the weights
        optimizer.step()

        # count the number of correctly predicted samples in the current batch
        _, predicted = torch.max(y_pred.data, 1)
        correct_examples += (predicted == targets).sum().item()
        total_examples += targets.size(0)
        ####################################            
    avg_loss = train_loss / len(train_loader)
    avg_acc = correct_examples / total_examples
    print("Training loss: %.4f, Training accuracy: %.4f" %(avg_loss, avg_acc))

    # Validate on the validation dataset
    #######################
    # your code here
    # switch to eval mode
    net_swish.eval()
    #######################

    # this help you compute the validation accuracy
    total_examples = 0
    correct_examples = 0
    
    val_loss = 0 # again, track the validation loss if you want

    # disable gradient during validation, which can save GPU memory
    with torch.no_grad():
        for batch_idx, (inputs, targets) in enumerate(val_loader):
            ####################################
            # your code here
            # copy inputs to device
            inputs, targets = inputs.to(device), targets.to(device)
            
            # compute the output and loss
            y_pred = net_swish(inputs)
            loss = criterion(y_pred, targets)
            val_loss += loss.item()
            
            # count the number of correctly predicted samples in the current batch
            _, predicted = torch.max(y_pred.data, 1)
            correct_examples += (predicted == targets).sum().item()
            total_examples += targets.size(0)
            ####################################

    avg_loss = val_loss / len(val_loader)
    avg_acc = correct_examples / total_examples
    print("Validation loss: %.4f, Validation accuracy: %.4f" % (avg_loss, avg_acc))
    
    # save the model checkpoint
    if avg_acc > best_val_acc:
        best_val_acc = avg_acc
        if not os.path.exists(CHECKPOINT_FOLDER):
            os.makedirs(CHECKPOINT_FOLDER)
        print("Saving ...")
        state = {'state_dict': net_swish.state_dict(),
                 'epoch': i,
                 'lr': current_learning_rate}
        torch.save(state, os.path.join(CHECKPOINT_FOLDER, 'swishnn.pth'))
        
    print('')

print("="*50)
print(f"==> Optimization finished! Best validation accuracy: {best_val_acc:.4f}")

==> Training starts!
Epoch 0:
Training loss: 1.6538, Training accuracy: 0.3906
Validation loss: 1.3505, Validation accuracy: 0.5140
Saving ...

Epoch 1:
Training loss: 1.3444, Training accuracy: 0.5180
Validation loss: 1.1985, Validation accuracy: 0.5704
Saving ...

Epoch 2:
Training loss: 1.2366, Training accuracy: 0.5593
Validation loss: 1.1169, Validation accuracy: 0.6048
Saving ...

Epoch 3:
Training loss: 1.1640, Training accuracy: 0.5848
Validation loss: 1.0773, Validation accuracy: 0.6132
Saving ...

Epoch 4:
Training loss: 1.1085, Training accuracy: 0.6062
Validation loss: 1.0539, Validation accuracy: 0.6228
Saving ...

Epoch 5:
Training loss: 1.0665, Training accuracy: 0.6231
Validation loss: 0.9755, Validation accuracy: 0.6528
Saving ...

Epoch 6:
Training loss: 1.0303, Training accuracy: 0.6358
Validation loss: 0.9678, Validation accuracy: 0.6524

Epoch 7:
Training loss: 0.9947, Training accuracy: 0.6499
Validation loss: 0.9406, Validation accuracy: 0.6690
Saving ...

Epoch 

C Part 1) Apply different learning rate values: 1.0, 0.1, 0.05, 0.01, 0.005, 0.001, to see how the learning rate affects the model performance, and report results for each.

In [12]:
lr_lst = [1.0, 0.1, 0.05, 0.01, 0.005, 0.001]
for lr in lr_lst:
    EPOCHS = 30
    DECAY_EPOCHS = INITIAL_LR / EPOCHS

    # the folder where the trained model is saved
    CHECKPOINT_FOLDER = "./saved_model"

    best_val_acc = 0
    
    current_learning_rate = lr
    print("Learning rate %f" %lr)
    print("==> Training starts!")
    print("="*50)
    net = SimpleNN()
    net.to(device)
    optimizer = torch.optim.SGD(params = net.parameters(), lr=lr, momentum = MOMENTUM, weight_decay = REG)
    for i in range(0, EPOCHS):
        #######################
        # your code here
        # switch to train mode
        net.train()
        #######################
        print("Epoch %d:" %i)
        # this help you compute the training accuracy
        total_examples = 0
        correct_examples = 0

        train_loss = 0 # track training loss if you want

        # Train the model for 1 epoch.
        for batch_idx, (inputs, targets) in enumerate(train_loader):
            ####################################
            # your code here
            # copy inputs to device
            inputs, targets = inputs.to(device), targets.to(device) 

            # compute the output and loss
            y_pred = net(inputs)
            loss = criterion(y_pred, targets)

            # zero the gradient
            optimizer.zero_grad()

            # backpropagation
            loss.backward()
            train_loss += loss.item()

            # apply gradient and update the weights
            optimizer.step()

            # count the number of correctly predicted samples in the current batch
            _, predicted = torch.max(y_pred.data, 1)
            correct_examples += (predicted == targets).sum().item()
            total_examples += targets.size(0)
            ####################################            
        avg_loss = train_loss / len(train_loader)
        avg_acc = correct_examples / total_examples
        #print("Training loss: %.4f, Training accuracy: %.4f" %(avg_loss, avg_acc))

        # Validate on the validation dataset
        #######################
        # your code here
        # switch to eval mode
        net.eval()
        #######################

        # this help you compute the validation accuracy
        total_examples = 0
        correct_examples = 0

        val_loss = 0 # again, track the validation loss if you want

        # disable gradient during validation, which can save GPU memory
        with torch.no_grad():
            for batch_idx, (inputs, targets) in enumerate(val_loader):
                ####################################
                # your code here
                # copy inputs to device
                inputs, targets = inputs.to(device), targets.to(device)

                # compute the output and loss
                y_pred = net(inputs)
                loss = criterion(y_pred, targets)
                val_loss += loss.item()

                # count the number of correctly predicted samples in the current batch
                _, predicted = torch.max(y_pred.data, 1)
                correct_examples += (predicted == targets).sum().item()
                total_examples += targets.size(0)
                ####################################

        avg_loss = val_loss / len(val_loader)
        avg_acc = correct_examples / total_examples
        #print("Validation loss: %.4f, Validation accuracy: %.4f" % (avg_loss, avg_acc))

        # save the model checkpoint
        if avg_acc > best_val_acc:
            best_val_acc = avg_acc
            if not os.path.exists(CHECKPOINT_FOLDER):
                os.makedirs(CHECKPOINT_FOLDER)
            print("Saving ...")
            state = {'state_dict': net.state_dict(),
                     'epoch': i}
            torch.save(state, os.path.join(CHECKPOINT_FOLDER, 'nn_lr.pth'))

        #print('')

    print("="*50)
    print(f"==> Optimization finished! Best validation accuracy: {best_val_acc:.4f}")

Learning rate 1.000000
==> Training starts!
Epoch 0:
Saving ...
Epoch 1:
Saving ...
Epoch 2:
Saving ...
Epoch 3:
Saving ...
Epoch 4:
Epoch 5:
Epoch 6:
Saving ...
Epoch 7:
Epoch 8:
Epoch 9:
Epoch 10:
Epoch 11:
Epoch 12:
Saving ...
Epoch 13:
Epoch 14:
Epoch 15:
Epoch 16:
Epoch 17:
Epoch 18:
Epoch 19:
Epoch 20:
Saving ...
Epoch 21:
Epoch 22:
Epoch 23:
Epoch 24:
Epoch 25:
Epoch 26:
Epoch 27:
Epoch 28:
Epoch 29:
==> Optimization finished! Best validation accuracy: 0.1054
Learning rate 0.100000
==> Training starts!
Epoch 0:
Saving ...
Epoch 1:
Saving ...
Epoch 2:
Epoch 3:
Epoch 4:
Epoch 5:
Saving ...
Epoch 6:
Epoch 7:
Epoch 8:
Epoch 9:
Saving ...
Epoch 10:
Epoch 11:
Epoch 12:
Epoch 13:
Epoch 14:
Epoch 15:
Saving ...
Epoch 16:
Saving ...
Epoch 17:
Epoch 18:
Saving ...
Epoch 19:
Epoch 20:
Epoch 21:
Epoch 22:
Epoch 23:
Epoch 24:
Epoch 25:
Saving ...
Epoch 26:
Epoch 27:
Epoch 28:
Epoch 29:
==> Optimization finished! Best validation accuracy: 0.5718
Learning rate 0.050000
==> Training starts!
Epo

C Part 2) Use different L2 regularization strengths of 1e-2, 1e-3, 1e-4, 1e-5, and 0.0 to see
how the L2 regularization strength affects the model performance. In this problem use a
learning rate of 0.01.

In [13]:
# L2 regularization strength
reg_lst = [1e-2, 1e-3, 1e-4, 1e-5, 0.0]

In [14]:
for strength in reg_lst:
    net = SimpleNN()
    net.to(device)
    optimizer = torch.optim.SGD(params = net.parameters(), lr=0.01, momentum = MOMENTUM, weight_decay = strength)
    EPOCHS = 30

    # the folder where the trained model is saved
    CHECKPOINT_FOLDER = "./saved_model"

    best_val_acc = 0
    
    print("L2 regularization strength %f" %strength)
    print("==> Training starts!")
    print("="*50)

    for i in range(0, EPOCHS):
        #######################
        # your code here
        # switch to train mode
        net.train()
        #######################
        print("Epoch %d:" %i)
        # this help you compute the training accuracy
        total_examples = 0
        correct_examples = 0

        train_loss = 0 # track training loss if you want

        # Train the model for 1 epoch.
        for batch_idx, (inputs, targets) in enumerate(train_loader):
            ####################################
            # your code here
            # copy inputs to device
            inputs, targets = inputs.to(device), targets.to(device) 

            # compute the output and loss
            y_pred = net(inputs)
            loss = criterion(y_pred, targets)

            # zero the gradient
            optimizer.zero_grad()

            # backpropagation
            loss.backward()
            train_loss += loss.item()

            # apply gradient and update the weights
            optimizer.step()

            # count the number of correctly predicted samples in the current batch
            _, predicted = torch.max(y_pred.data, 1)
            correct_examples += (predicted == targets).sum().item()
            total_examples += targets.size(0)
            ####################################            
        avg_loss = train_loss / len(train_loader)
        avg_acc = correct_examples / total_examples
        #print("Training loss: %.4f, Training accuracy: %.4f" %(avg_loss, avg_acc))

        # Validate on the validation dataset
        #######################
        # your code here
        # switch to eval mode
        net.eval()
        #######################

        # this help you compute the validation accuracy
        total_examples = 0
        correct_examples = 0

        val_loss = 0 # again, track the validation loss if you want

        # disable gradient during validation, which can save GPU memory
        with torch.no_grad():
            for batch_idx, (inputs, targets) in enumerate(val_loader):
                ####################################
                # your code here
                # copy inputs to device
                inputs, targets = inputs.to(device), targets.to(device)

                # compute the output and loss
                y_pred = net(inputs)
                loss = criterion(y_pred, targets)
                val_loss += loss.item()

                # count the number of correctly predicted samples in the current batch
                _, predicted = torch.max(y_pred.data, 1)
                correct_examples += (predicted == targets).sum().item()
                total_examples += targets.size(0)
                ####################################

        avg_loss = val_loss / len(val_loader)
        avg_acc = correct_examples / total_examples
        #print("Validation loss: %.4f, Validation accuracy: %.4f" % (avg_loss, avg_acc))

        # save the model checkpoint
        if avg_acc > best_val_acc:
            best_val_acc = avg_acc
            if not os.path.exists(CHECKPOINT_FOLDER):
                os.makedirs(CHECKPOINT_FOLDER)
            print("Saving ...")
            state = {'state_dict': net.state_dict(),
                     'epoch': i,
                     'lr': current_learning_rate}
            torch.save(state, os.path.join(CHECKPOINT_FOLDER, 'nn_reg.pth'))

        #print('')

    print("="*50)
    print(f"==> Optimization finished! Best validation accuracy: {best_val_acc:.4f}")

L2 regularization strength 0.010000
==> Training starts!
Epoch 0:
Saving ...
Epoch 1:
Saving ...
Epoch 2:
Saving ...
Epoch 3:
Saving ...
Epoch 4:
Epoch 5:
Epoch 6:
Saving ...
Epoch 7:
Saving ...
Epoch 8:
Epoch 9:
Epoch 10:
Epoch 11:
Epoch 12:
Epoch 13:
Epoch 14:
Epoch 15:
Epoch 16:
Saving ...
Epoch 17:
Epoch 18:
Epoch 19:
Epoch 20:
Saving ...
Epoch 21:
Epoch 22:
Epoch 23:
Epoch 24:
Epoch 25:
Epoch 26:
Epoch 27:
Epoch 28:
Saving ...
Epoch 29:
==> Optimization finished! Best validation accuracy: 0.5958
L2 regularization strength 0.001000
==> Training starts!
Epoch 0:
Saving ...
Epoch 1:
Saving ...
Epoch 2:
Saving ...
Epoch 3:
Saving ...
Epoch 4:
Saving ...
Epoch 5:
Saving ...
Epoch 6:
Saving ...
Epoch 7:
Epoch 8:
Saving ...
Epoch 9:
Saving ...
Epoch 10:
Saving ...
Epoch 11:
Epoch 12:
Epoch 13:
Saving ...
Epoch 14:
Epoch 15:
Epoch 16:
Saving ...
Epoch 17:
Saving ...
Epoch 18:
Saving ...
Epoch 19:
Epoch 20:
Epoch 21:
Epoch 22:
Epoch 23:
Epoch 24:
Epoch 25:
Epoch 26:
Saving ...
Epoch 27:
Ep

# Bonus: L1 penalty

Switch the regularization penalty from L2 penalty to L1 penalty. This means
you may not use the weight_decay parameter in PyTorch builtin optimizers, as it does not
support L1 regularization. Instead, you need to add L1 penalty as a part of the loss function.
Compare the distribution of weight parameters after L1/L2 regularization. Describe your
observations.