# Homework Assignment #1

In this homework, we will implement a few core CNN blocks and practice training neural networks by following the guidance step by step. The model to implement is similar to the [LeNet](http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf). To do this some implementation sketch will be provided on which you can fill in your implementation.

![LeNet.png](attachment:LeNet.png)
Image Source: https://github.com/dsgiitr/d2l-pytorch/blob/master/img/lenet.png

In the following, we first import the basic packages. Feel free to add other packages if necessary.
**Note**: The only allowed deep learning framework is PyTorch. Please use Python 3.6 or newer verions and PyTorch 1.3 or newer verions for this homework.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
import matplotlib.pyplot as plt
import torchvision.utils as vutils
import numpy as np

The following is a sketch of the LeNet class which you will be filling in step-by-step. For now, you don't need to do anything with the following code.

In [None]:
# LeNet sketch code
class LeNet(nn.Module):
  def __init__(self):
    super(LeNet, self).__init__()
    """
    self.c1 = Conv2D()
    self.p2 = nn.AvgPool(2, stride=2)
    self.c3 = Conv2D()
    self.p4 = nn.AvgPool(2, stride=2)
    self.c5 = Linear()
    self.f6 = Linear()
    """
    
    
  def forward(self, imgs, labels):
    """
    scores = self.net(imgs)
    o = softmax(scores)
    loss = objective(o, labels)
    """
    return loss

## Task 1: Implement Convolutional Layer

The first task is to implement a convolutional layer by completing the following Conv2D class. The class takes the number of input channels, the number of output channels, stride size, and padding values as inputs. 

**To-do**:
    - (10 points)Implement code in-between (### start your code here ### and ### End of the code ###)

In [None]:
class Conv2D(nn.Module):
  def __init__(self, dim_in, dim_out, kernel_size, stride, padding, device):
    super(Conv2D, self).__init__()
    """
    inputs:
      dim_in: integer, number of channels in the input
      dim_out: integer, number of channels produced by the convolution
      kernel_size: integer list of length 2, spatial size of the convolving kernel
      stride: integer list of length 2, stride of the convolution along the the height dimension and width dimension
      padding: integers list of length 4, zero-padding added to both sides of the height dimension and width dimension
      
    """
    # initialize kernel and bias
    self.kernel = nn.Parameter(torch.randn([dim_out, dim_in]+kernel_size, dtype=torch.float32, device=device)*0.1, requires_grad=True)
    self.bias = nn.Parameter(torch.zeros([dim_out], dtype=torch.float32, device=device), requires_grad=True)
    
    self.dim_in = dim_in
    self.dim_out = dim_out
    
    self.kernel_size = kernel_size
    self.stride = stride
    self.padding = padding
    self.device = device
    
  def conv2d_forward(self, X):
    """
    inputs:
      X: input images
    outputs:
      Y: output produced by the convolution
    """

    ###  Star your code here ###
    
        
      
      
      
      
    ###  End of the code ###
    return Y

        
  def forward(self, x):
    return self.conv2d_forward(x)

## Conv2D Correctness Check

Run the correctness checking code. If your implementation is correct, you should be able to see the output as follows:
```python
tensor([[[[ 1.0519,  1.3811],
          [ 2.5701,  2.0508]],

         [[ 1.3159,  2.4203],
          [ 8.4296, 10.1662]]],


        [[[ 4.4640,  5.9235],
          [ 4.5179,  2.3144]],

         [[13.1246, 15.5932],
          [20.2635, 22.9700]]]], grad_fn=<AddBackward0>)
```

In [None]:
# correctness checking
torch.random.manual_seed(0)
x = torch.arange(50).view(2,1,5,5).float()
my_conv = Conv2D(1,2,[3,3],[3,3],[1,1,1,1], torch.device('cpu'))
y = my_conv(x)
print(y)

## Guide: Pooling Layer

We will not implement the pooling layer. Instead, we will use the Pytorch API (torch.nn.AvgPool2d). Feel free to implement it by yourself (if you want). For the detail information about the pooling API, check the [documents](https://pytorch.org/docs/1.3.1/nn.html#avgpool2d).

## Task 2: Implementing Linear Layer

Complete the following linear layer module. To specify a linear layer, input dimension and output dimension are provided. The linear layer performs the following computation: $y = xW^T+ b$.

**To-do**:
    - (5 points)Implement code in-between (### start your code here ### and ### End of the code ###)

In [None]:
class Linear(nn.Module):
  def __init__(self, dim_in, dim_out, device):
    super(Linear, self).__init__()
    
    self.weights = nn.Parameter(torch.randn([dim_out, dim_in], dtype=torch.float32, device=device)*0.1, requires_grad=True)
    
    self.bias = nn.Parameter(torch.zeros([dim_out], dtype=torch.float32, device=device), requires_grad=True)
    self.dim_out = dim_out
    self.device = device
    
  def linear_forward(self, X):
    """
    inputs:
     X: tensor of shape (batch_size, *, dim_in)
    outputs:
     Y: tensor of shape (batch_size, *, dim_out)
    """
    
    ###  Star your code here ### 
    
    ###  End of the code ###
    return Y
  
  def forward(self, X):
    return self.linear_forward(X)

## Linear Correctness Check

Run the following correctness checking code. If your implementation is correct, you should be able to see the output as follows:
```python
tensor([[ -2.1595,  -0.2037,   1.8567],
        [ -6.9537,   0.7306,   2.5298],
        [-11.7479,   1.6648,   3.2028],
        [-16.5422,   2.5991,   3.8759],
        [-21.3364,   3.5334,   4.5489]], grad_fn=<AddBackward0>)
```

In [None]:
# correctness checking
torch.random.manual_seed(0)
x = torch.arange(50).view(5, 10).float()
my_linear = Linear(10, 3, torch.device('cpu'))
y = my_linear(x)
print(y)

## Task 3: Loss Functions and SGD

The loss function for classification task is the Cross-Entropy Loss. For this, we need to implement the softmax output layer first and then the cross-entropy loss.

Softmax function normalizes the output so that its sum becomes 1 and each output is nonnegative:

$$
\hat{\mathbf{y}} = \text{softmax}(\mathbf{o}),\text{ where } \hat{y}_i = \frac{\text{exp}(o_i)}{\sum_j \text{exp}(o_j)}
$$

Cross-Entropy Loss is as the objective function for this classification task. When the loss is minimized, the likelihood function will be maximized:

$$
l = -\log{P(y \mid x)} = -\sum_j y_j\log{\hat{y}_j}
$$

**To-do**:
    - (10 points)Complete the function **softmax1d()**.
    - (10 points)Complete the function **cross_entropy_loss()**.

Note that you should implement the function using primative PyTorch APIs such as exp() and matmul(), instead of simply using pythor API for softmax and cross_entropy_loss.

In [None]:
def softmax1d(scores):
  """
  inputs:
    scores: (N, C), predicted scores for each input, where N is the number of samples and C is the number of
            classes.
  outputs:
    p: (N, C), probability distribution over classes. Converted from input (scores) with a softmax operation.
    
  Note: Do be careful of the numerial error!
  """
  ###  Star your code here ### 

  ###  End of the code ###
  
  return p

def cross_entropy_loss(pred_score, labels):
  """
  inputs:
    pred_score: (N, C), probaility distribution or pred_scores over classes, where N is the number of samples and C is the number of
      classes.
  outputs:
    loss: (N,), cross entropy loss for each sample.

  Note: Do be careful of the numerial error!
  """
  ###  Star your code here ### 

  
  
  ###  End of the code ###
  
  return loss.mean()

**To-do**:
    - (10 points)Next task is to implement the update rule of stochastic gradient descent. Complete the following function.

In [None]:
def step(weights, grad, lr):
  """
  inputs:
    weights: list of learnable parameters
    grad: list of gradient of the loss w.r.t the learnable parameters
    lr: learning rate for gradient descent
  outputs:
    None. Make sure updating the weights with in-place operation, e.g. tensor.add_(). No output need be returned.
  """
  ###  Star your code here ### 

  ###  End of the code ###

## Task 4: LeNet Forward Pass

Using the above components required to implement the LeNet, we can complete the LeNet class as follows. 

**To-do**:
    - (20 points)Complete the function **forward()** which takes the input images and labels and outputs the cross-entropy loss (for the batch) and predicted distribution. For more details, refer to the comments below.

In [None]:
# LeNet sketch code
class LeNet(nn.Module):
  def __init__(self, img_c, device):
    super(LeNet, self).__init__()
    self.c1 = Conv2D(img_c, 6, [5,5], [1,1], [2,2,2,2], device)
    self.p2 = nn.MaxPool2d(2, stride=2)
    self.c3 = Conv2D(6, 16, [5,5], [1,1], [0,0,0,0], device)
    self.p4 = nn.MaxPool2d(2, stride=2)
    self.f5 = Linear(400, 120, device)
    self.f6 = Linear(120, 84, device)
    self.f7 = Linear(84, 10, device)
    self.device = device
    
    
  def forward(self, imgs, labels):
    """
    inputs:
      imgs: (N, C, H, W), training samples from the MNIST training set, where N is the number of samples (batch_size),
          C is the image color channle number, H and W are the spatial size of the input images.
      labels: (N, L), ground truth for the input images, where N is the number of samples (batch_size) and L is the 
          number of classes.
    outputs:
      loss: (1,), mean loss value over this batch of inputs.
    
    """
    N = imgs.shape[0]
    
    o_c1 = F.relu(self.c1(imgs))
    o_p2 = self.p2(o_c1)
    o_c3 = F.relu(self.c3(o_p2))
    o_p4 = self.p4(o_c3)
    
    ### Start the code here  ###
    # 1. Please complete the rest of LeNet to get the scores predicted by LeNet for each input images #
    
    
    
    
    # 2. Please use the implemented objective function to obtain the losses of each input. #
    
    
    
    
    
    # 3. We will return the mean value of the losses. #
    
    
    
    ###  End of the code ###
    return loss.mean(), p

## Guide:  Dataset Preparation

We use MNIST dataset to train the LeNet. Run the following cell to get the dataset ready for the training. Change the data_path to a proper one.

In [None]:
import os
import urllib.request

data_path = './CS536_MNIST/'
if not os.path.exists(data_path):
  os.mkdir(data_path)
  print("Starting downloading MNIST to {}".format(data_path))
  
  import urllib
  dataset_dict = {
        'train_images': "http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz",
        'train_labels': "http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz",
        'test_images': "http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz",
        'test_labels': "http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz",}

  for f, url in dataset_dict.items():
    urllib.request.urlretrieve(url, data_path + f)

The class dataset has been provided below. More about PyTorch Dataset can be found [here](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html#dataset-class). Please run the following Jupyter cell to make sure the dataset is ready for training.

In [None]:
train_img_file = data_path + 'train_images'
train_lb_file = data_path + 'train_labels'
test_img_file = data_path + 'test_images'
test_lb_file = data_path + 'test_labels'

class MNISTDataset(Dataset):
  def __init__(self, ds_size=10000, split='training'):
    self.split = split
    if self.split == 'training':
      img_file = train_img_file
      lb_file = train_lb_file
      n_samples = 60000
    else:
      img_file = test_img_file
      lb_file = test_lb_file
      n_samples = 10000
    self.ds_size = ds_size
      
    import gzip
    with gzip.open(img_file, 'rb') as f:
      imgs = f.read()
    imgs = np.frombuffer(imgs[16:], dtype=np.uint8).astype(np.float32)
    with gzip.open(lb_file, 'rb') as f:
      lb = f.read()
    lbs = np.frombuffer(lb[8:], dtype=np.uint8).astype(np.float32)
    
    imgs = torch.tensor(imgs).view(n_samples, 1, 28, 28) - 125.
    lbs = torch.tensor(lbs).long()
    
    self.imgs = imgs[:ds_size]
    self.lbs= lbs[:ds_size]
    
  def __len__(self):
    return self.ds_size
  
  def __getitem__(self, idx):
    return self.imgs[idx], self.lbs[idx]
  

## Guide: Training for Overfitting

First, we will make our model overfit. It is a good practice to check if a model can overfit well (It should do it well in a proper setting. If not, your model may have some bug, the model complexity is too simple, or some training parameters like learning rate are not good.)

For this, we will first use only 1000 data points instead of the full dataset (smaller datasets makes models overfit more easily if the model complexity is fixed.)

We also use the  20% -split as the validation set. The dataset loading script is provided below.

In [None]:
dataset_size = 1000
validation_size = int(0.2 * 1000)

ds = MNISTDataset(ds_size=dataset_size)

# split the dataset into training set and validation set
train_ds, val_ds = torch.utils.data.random_split(ds, [dataset_size - validation_size, validation_size])

# training batch size, hyper-parameter
batch_size = 24

# dataset loader
train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=True, drop_last=True)
val_dl = DataLoader(val_ds, batch_size=batch_size, shuffle=True, drop_last=True)
img_channel = 1

The training script is provided as follows. After $100$ epochs, you will see LeNet is overfitted as the validation error is a lot larger than the training error. Also, the classification accuracy is almost $100\%$ on the training set, while only around $70\%$ on the validation set.

In [None]:
# learning rate, hyper-parameter
lr = 1e-4

# using GPU if it's availble
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

net = LeNet(img_channel, device)
# keep a list of moel parameters
params = [p for (n, p) in net.named_parameters()]

# training epochs, hyper-parameter
epochs = 100

# keep tracking of the changing of loss and accuracy of predictions
train_loss_list = []
val_loss_list = []
train_acc_list = []
val_acc_list = []

# the printing frequency, feel free to change this
print_interval = 50
for e in range(epochs):
  net.train()
  for i, (imgs, lbs) in enumerate(train_dl):
    imgs = imgs.to(device)
    lbs = lbs.to(device)
    loss, prob = net(imgs, lbs)
    
    net.zero_grad()
    grad = torch.autograd.grad(loss, params)
    
    # update weights
    step(params, grad, lr)
    
    # obtain the predictions
    pred = prob.argmax(dim=-1).view(batch_size)
    acc = (pred == lbs).float().mean()
    
    if i % print_interval == 0:
      print("step {}, loss {}".format(i + e*len(train_dl), loss))
      print("Target:\t {}\nPred:\t {}".format(lbs[:8], pred[:8]))
      # visualize some samples
      imgs_to_vis = vutils.make_grid(imgs[:8].cpu()+125., nrow=8, pad_value=1)
      plt.imshow(imgs_to_vis.permute(1,2,0).numpy().astype(np.uint8))
      plt.axis("off")
      plt.show()
      
  train_loss_list.append(loss.detach().mean())
  train_acc_list.append(acc.detach().mean())
  
  net.eval()
  for i, (imgs, lbs) in enumerate(val_dl):
    imgs = imgs.to(device)
    lbs = lbs.to(device)
    loss, prob = net(imgs, lbs)
    
    pred = prob.argmax(dim=-1).view(batch_size)
    acc = (pred == lbs).float().mean()
    
  val_loss_list.append(loss.detach().mean())
  val_acc_list.append(acc.detach().mean())
  

# ploting logs
plt.plot(np.arange(epochs), train_loss_list, '-r',
         np.arange(epochs), val_loss_list, '-g')
plt.legend(('training error', 'validation error'))
plt.show()
plt.plot(np.arange(epochs), train_acc_list, '-r',
         np.arange(epochs), val_acc_list, '-g')
plt.legend(('training acc', 'validation acc'))
plt.show()

## Guide: Model Saving and Loading

Once the training is completed, we can save the model on disk for evaluation or future use. It is also helpful to save the model regularly in case of unexcepted situations. Below is the snippet of how to save and load a model. More information can be found [here](https://pytorch.org/tutorials/beginner/saving_loading_models.html). You will be asked to evaluate your LeNet in the end.
```python
# Saving a model on disk:
torch.save(net.state_dict(), PATH_to_save)

# Loading a model from disk:
net = LeNet(img_c)
net.load_state_dict(torch.load(PATH_to_save))
```

## Task 5: Weight Decay

As we can see, LeNet now is overfitted on this 1k dataset. Instead of providing more data, we can use Weight Decay to improve its generalization ability.

**To-dos**: 
    - (5 points)Write the dataset loading script, training script
    - (5 points)Add **weights penalty** into the loss function.
    - (10 points)Train LeNet from scratch.
    - (10 points)Plot out the training loss curve, validation loss curve, training accuracy curve, and validation accuracy curve as those plots in the above section.

## Task 6: Dropout

[Dropout](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf) can also help with generalization. It randomly drops units from the neural network during training. Thus, training a neural network with dropout can be seen as training a collection of many sub-networks.

**To-dos**:
    - (5 points)Complete the **dropout_forward()** function for class Dropout.
    - (5 points)Modify the class LeNet with dropout layer and build a new class named LeNetDrop: add a dropout layer after layer c1, layer c3, and layer c5.
    - Write the dataset loading script, training script.
    - Train LeNetDrop on the 1k dataset.
    - (10 points)Plot out the training loss curve, validation loss curve, training accuracy curve and validation accuracy curve.

In [None]:
class Dropout(nn.Module):
  def __init__(self, p, device):
    super(Dropout, self).__init__()
    """
    inputs:
     p: scalar, the probability of an element to be zeroed.
    """
    self.p = p
    self.device = device
    
    
  def dropout_forward(self, X, training=True):
    """
    inputs:
      X: (N, *), input of dropout layer
      training: boolean, apply dropout if true. Note: We do not apply dropout during testing.
    outputs:
      Y: (N, *)
    """
    ### Start the code here  ###
    
    
    
    
    ###  End of the code ###
    return Y
  
  def forward(self, X):
    return self.dropout_forward(X)

In [None]:
# dataset loading, training script and other codes starts here:

## Task 7: Train on full dataset

Now, train the LeNetDrop on the full MNIST dataset. 

**To-do**:
    - Write the dataset loading script, training script for LeNetDrop.
    - Train LeNetDrop on the full MNIST dataset below.
    - (10 points)Plot out the training loss curve, validation loss curve, training accuracy curve and validation accuracy curve.

In [None]:
# dataset loading, training script and other codes starts here:

## Batch Normalization

[Batch Normalization](https://arxiv.org/pdf/1502.03167.pdf) can accelerate the training procedure by shifting and scaling the input for each layer. 

**To-do**:
    - (10 points)Complete the **batch_norm_forward()** function for class BatchNorm.
    - (5 points)Modify the class LeNetDrop to class LeNetDropNorm: add a batch normalization layer after layer each dropout layer for LeNetDrop. 
    - Write the dataset loading script, training script for LeNetDropNorm.
    - Train LeNetDropNorm on the full MNIST dataset.
    - (10 points)Plot out the training loss curve, validation loss curve, training accuracy curve and validation accuracy curve.

In [None]:
class BatchNorm(nn.Module):
  def __init__(self, num_features, device, eps=1e-5):
    super(BatchNorm, self).__init__()
    self.gamma = nn.Parameter(torch.ones(num_features, dtype=torch.float32, device=device), requires_grad=True)
    self.beta = nn.Parameter(torch.zeros(num_features, dtype=torch.float32, device=device), requires_grad=True)
    self.moving_mean_gamma = 0.
    self.moving_mean_beta = 0.
    self.eps = eps
    self.num_features = num_features
    self.device = device
    
  def batch_norm_forward(self, x):
    """
    input:
      X: (N, *), input of dropout layer, where N is the batch size
    output:
      Y: (N, *), where N is the batch size
    """
    ### Start the code here  ###
    
    
    
    
    ###  End of the code ###
    
    return Y
    
  def forward(self, inputs):
    return self.batch_norm_forward(inputs)

## Submission

- Make sure you have finished all the required implementation tasks.
- Check your codes and make sure the result in each section could be reproducible.
- Upload the Jupyter file with all required figures plotted.
- Upload a pdf version of this Jupyter note. You can first download a html file by clicking on the Jupyter menu bar: File --> Download as --> HTML (.html). Then open the html file and convert it into a pdf file with your browser.