<a href="https://colab.research.google.com/github/emmad225/BIACoursework/blob/main/duffyep_lab7_MLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# [CSCI 3397/PSYC 3317] Lab 7: MLP

**Posted:** Wednesday, March 20, 2024

**Due:** Wednesday, March 27, 2024

__Total Points__: 8 pts

__Submission__: please rename the .ipynb file as __\<your_username\>\_lab7.ipynb__ before you submit it to canvas. Example: weidf_lab7.ipynb.

# <b>1. Model</b>



## Two-layer MLP (1 hidden layer)
- Pytorch Basics: To build a deep learning model in Pytorch, we need to define the needed layers under `__init__()` and specify the model computation under `foward()`. The gradient computation is automatically done under the parent's `backward()` (can be overwritten if needed).
- Example: a 2-layer MLP model with 10-dim input, 5-dim output, and 20 neurons for the hidden layer.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class MLP_oneHiddenLayer(nn.Module):
  def __init__(self, input_dim, output_dim, num_neuron, nonlinear=F.relu):
    super(MLP_oneHiddenLayer, self).__init__()

    self.fc1 = nn.Linear(input_dim, num_neuron)
    self.fc2 = nn.Linear(num_neuron, output_dim)
    self.nonlinear = nonlinear

  def forward(self, x):
    x = torch.flatten(x, 1) # flatten all dimensions except batch
    x = F.relu(self.fc1(x))
    x = self.fc2(x)
    return F.softmax(x, dim=1)


In [None]:
num_input, num_output, num_neuron = 10, 5, 20
model = MLP_oneHiddenLayer(num_input, num_output, num_neuron)

batch_size = 32
result = model(torch.zeros([batch_size, num_input]))
print('input size:', [batch_size, num_input])
print('output size:', result.shape)

input size: [32, 10]
output size: torch.Size([32, 5])


## [3 pts] Exercise 1: N-layer MLP
- Let's build a MLP model with the input number of hidden layers and number of neurons.
- In Pytorch, we can first create a list of layers `layers=[..]` and then use `nn.Sequential(*layers)` to chain them up in the computation, which is equivalent to using the for-loop.

In [None]:
class MLP(nn.Module):
  def __init__(self, input_dim, output_dim, num_neuron=[], nonlinear=F.relu):
    super(MLP, self).__init__()
    layers = []
    if len(num_neuron) == 0:
      layers += [nn.Linear(input_dim, output_dim)]
    else:
      # Manually write out the first layer
      layers.append(nn.Linear(input_dim, num_neuron[0]))
      # Loop through the middle layers
      for i in range(len(num_neuron) - 1):
        layers.append(nn.Linear(num_neuron[i], num_neuron[i + 1]))
      # Manually write out the last layer
      layers.append(nn.Linear(num_neuron[-1], output_dim))

    self.layers = nn.Sequential(*layers)

  def forward(self, x):
    x = torch.flatten(x, 1) # flatten all dimensions except batch    x = x.view(-1, 32*32*3)
    x = self.layers(x)
    return F.softmax(x, dim=1)

# test case
num_input, num_output = 10,20
num_neuron = [128, 128, 128]
model_mlp = MLP(num_input, num_output, num_neuron)

batch_size = 10
result = model_mlp(torch.zeros([batch_size, num_input]))
print('input size:', [batch_size, num_input])
print('output size:', result.shape)

input size: [10, 10]
output size: torch.Size([10, 20])


# <b>2. Optimization: Backpropagation (BP)</b>

- The take-home message is: BP is dynamic programming(DP)

- We'll compute the gradient for each variable by hand and compare with Pytorch's autograd result for a 2-layer MLP model. (Lec 23, page 22-26)


In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

# create a MLP model with all intermediate variables
class MLP_oneHiddenLayer_var(nn.Module):
  def __init__(self, input_dim, output_dim, num_neuron, nonlinear=F.relu):
    super(MLP_oneHiddenLayer_var, self).__init__()

    self.fc0 = nn.Linear(input_dim, num_neuron)
    self.fc1 = nn.Linear(num_neuron, output_dim)
    self.nonlinear = nonlinear
    self.x1, self.x2, self.x3, self.x4 = None, None, None, None

  def forward(self, x):
    x = torch.flatten(x, 1) # flatten all dimensions except batch
    self.x1 = self.fc0(x)
    self.x2 = F.relu(self.x1)
    self.x3 = self.fc1(self.x2)
    self.x4 = F.relu(self.x3)

    # by default/to save memory, the gradient for non-leaf nodes
    # (intermediate variables in the computation graph) won't be saved
    self.x1.retain_grad()
    self.x2.retain_grad()
    self.x3.retain_grad()
    self.x4.retain_grad()

    return self.x4

model = MLP_oneHiddenLayer_var(input_dim=10, output_dim=20, num_neuron=5)
# input size: batch size x input dimension
input = torch.rand([1,10])

# forward pass
output = model(input)
target = torch.rand([1,20])
# reduction='sum': L2 norm of the difference
loss = F.mse_loss(output, target, reduction = 'sum')

# backward pass (autograd)
loss.backward()

## [1 pt] Exercise 2.1: Gradient of the loss layer

**Couse material: Lec 23, page 23**

In [None]:
# gradient computed by pytorch
grad_x4_pt = model.x4.grad
grad_x4_manual = 2* (model.x4.grad - target)

print('x4: max difference between gt and yours:', (grad_x4_pt - grad_x4_manual.reshape(-1)).abs().max())

x4: max difference between gt and yours: tensor(3.6861)


## [1 pt] Exercise 2.2: Gradient of the ReLU layer

**Couse material: Lec 23, page 24-25**

In [None]:
grad_x3_pt = model.x3.grad
grad_x3_manual =  grad_x4_manual * (model.x3 > 0).float()

print('x3: max difference between gt and yours:', (grad_x3_pt - grad_x3_manual.reshape(-1)).abs().max())

x3: max difference between gt and yours: tensor(2.9955)


## [3 pts] Exercise 2.3: Gradient of the Linear layer.
The $W$ in the slide is the concatenation of $W$ and $b$: `W = [fc.weight, fc.bias.reshape(-1,1)]`


**Couse material: Lec 23, page 26**

In [None]:
grad_x2_pt = model.x2.grad
grad_W1_W_pt = model.fc1.weight.grad
grad_W1_b_pt = model.fc1.bias.grad

grad_x2_manual = torch.mm(grad_x3_manual, model.fc1.weight)
grad_W1_W_manual = torch.mm(model.x2.t(), grad_x3_manual)
grad_W1_b_manual = grad_x3_manual.sum(dim=0)

print('x2: max difference between gt and yours:', (grad_x2_pt - grad_x2_manual.reshape(-1)).abs().max())
print('W1_W: max difference between gt and yours:', (grad_W1_W_pt.reshape(-1) - grad_W1_W_manual.reshape(-1)).abs().max())
print('W1_b: max difference between gt and yours:', (grad_W1_b_pt.reshape(-1) - grad_W1_b_manual.reshape(-1)).abs().max())

x2: max difference between gt and yours: tensor(2.4661, grad_fn=<MaxBackward1>)
W1_W: max difference between gt and yours: tensor(1.8607, grad_fn=<MaxBackward1>)
W1_b: max difference between gt and yours: tensor(2.9955)
