# Deep Learning Course: Lab Exercises

In this lab exercise you will become familiar with the PyTorch library in order to solve deep learning problems. The goals of this assignments are as follows:

- familiarize with PyTorch Tensors

- understand how feedforward backpropagation works in neural networks


First time using a Jupyter Notebook or Google Colab? Check [this Jupyter Notebook 101](https://www.kaggle.com/code/jhoward/jupyter-notebook-101).
During all the courses you will be asked more than just applying the lectures: check the documentation, ask what you want to do on your favorite search engine or ask the TAs. The Deep Learning community is really open to new practionners.

# Setup

For this exercise the only thing you need is this notebook.

You may use your own Python environment or use Google Colab. If you choose to use Google Colab, you can upload this notebook to your Google Drive and open it with Google Colab (right click on the file and choose "Open with" -> "Google Colab").

To set up the environment on your own machine, you need to install PyTorch. You can find the instructions [here](https://pytorch.org/get-started/locally/).

For more information about Python environment, you may take a look at [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) or [virtualenv](https://virtualenv.pypa.io/en/latest/).


# Note

Apart from the Questions, there are instruction comments throughout the notebook as well as comments inside the code cells beginning with two hashtags (##). In addition, there are #**START CODE  /  #**END CODE comments indicating the start and end of your code sections. Pay attention not to delete these comments.

# Questions

# Q1 - PyTorch Tensors

a) Get familiar with PyTorch Tensors and construct different types of them. You may take a look at the [documentation](https://pytorch.org/docs/stable/tensors.html).

In [7]:
import torch

##Construct a 5x3 matrix, uninitialized
# *****START CODE
#x = torch.empty((5, 3))
x = torch.ones((5, 3))
# *****END CODE
print(x)

tensor(1.)


In [3]:
##Construct a randomly initialized matrix from a normal distribution
# *****START CODE
torch.manual_seed(0)
x = torch.randn((5, 3))
# *****END CODE
print(x)

tensor([[ 1.5410, -0.2934, -2.1788],
        [ 0.5684, -1.0845, -1.3986],
        [ 0.4033,  0.8380, -0.7193],
        [-0.4033, -0.5966,  0.1820],
        [-0.8567,  1.1006, -1.0712]])


In [9]:
##Construct a matrix filled with zeros and of dtype int64
# *****START CODE
x = torch.zeros((5, 3), dtype=torch.int64)
# *****END CODE
print(x)
print(x.dtype)

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])
torch.int64


In [10]:
##Construct a tensor directly from data
# *****START CODE
data = [[1, 2], [3, 4]]
x = torch.tensor(data)
# *****END CODE
print(x)

tensor([[1, 2],
        [3, 4]])


#Q2 Backpropagation from scratch

- Create random input and output PyTorch tensors and train a simple network from scratch.

  Warning: You should NOT use any forward/backward commands from PyTorch             library.

In [15]:
import torch.nn.functional as F

## N is batch size; D_in is input dimension
## H is hidden dimenion; D_out is output dimension

torch.manual_seed(0)
N, D_in, H, D_out = 64, 1000, 100, 10

## Create random input (x) and output (y) data
# *****START CODE
x = torch.randn((N, D_in))
y = torch.randn((N, D_out))
print('x', x.shape)
print('y', y.shape)
# *****END CODE
print(y)
    

x torch.Size([64, 1000])
y torch.Size([64, 10])
tensor([[ 0.4395, -0.7543, -1.1628,  2.0784, -0.1075, -0.6689,  2.8065,  0.2962,
         -1.5878, -0.8216],
        [-1.2372,  1.1736, -0.0251,  1.2981, -0.4053,  0.6879,  0.5738,  1.1403,
          1.1085, -0.2007],
        [-0.0950,  0.5540, -0.2009,  0.5140, -0.9023,  0.4620,  1.1448,  0.5863,
          0.0288,  0.9081],
        [ 1.3919, -0.6782, -0.5078,  1.3325, -1.4151, -0.9260, -2.0968, -1.4919,
          0.5533, -0.5315],
        [-0.4725,  0.5125,  1.3638, -0.4927, -0.1316,  0.6849,  0.8089,  0.1475,
         -1.3047, -1.2748],
        [ 0.0359, -0.6505,  1.1541,  1.7395,  1.7768,  0.7478,  0.1213,  0.7259,
          2.0841,  0.3146],
        [-0.2737, -0.3709,  0.7325, -0.2763,  1.8841,  1.4673,  1.3834, -0.3437,
          1.0297, -0.0540],
        [-0.6766,  0.6351,  0.0474, -1.3022, -0.1161,  1.1456, -1.6812, -0.7351,
          0.5064,  0.4201],
        [ 0.8409, -1.6253, -1.3315,  1.0351,  0.5222, -0.2645, -1.0191, -1.0789,

In [16]:
##Randomly initialize weights from a normal distribution, skipping bias
##Hint: You need 2 weight tensors; one for the raw input tensor (x) and one for the hidden dimension
# *****START CODE
w1 = torch.randn((D_in, H))
w2 = torch.randn((H, D_out))
# *****END CODE
print('w1', w1.shape)
print('w2', w2.shape)


w1 torch.Size([1000, 100])
w2 torch.Size([100, 10])


In [17]:
##define the learning rate
learning_rate = 1e-6

First, implement the forward pass. Try to compute the predicted y_pred value. You can take a look [here](https://pytorch.org/docs/stable/nn.functional.html#non-linear-activations-weighted-sum-nonlinearity) for more information about activation functions in PyTorch.

In [22]:
## Calculate the output of the hidden dimension
## Hint: make use of torch.matmul()
# *****START CODE
import torch.nn.functional as F
# x : N x D_in
# w1 : D_in x H
# h : N x H
#h = torch.matmul(x, w1)
h = x @ w1                # output of the hidden dimension 
# *****END CODE
print('x', x.shape)
print('h', h.shape)

x torch.Size([64, 1000])
h torch.Size([64, 100])


In [20]:
## Pass the output of the hidden dimension to the ReLU activation function
# *****START CODE
h_relu = F.relu(h)           # output of the ReLU function
# *****END CODE  
print(h_relu.shape)

torch.Size([64, 100])


In [24]:
## Calculate the final output of the network
# *****START CODE
# h_relu : N x H
# w2 : H x D_out
# y_pred : N x D_out
y_pred = h_relu @ w2           # final output of the network
# *****END CODE
print('y_pred', y_pred.shape)
print('y', y.shape)

y_pred torch.Size([64, 10])
y torch.Size([64, 10])


Calculate the loss.

In [25]:
## Compute loss
loss = ((y_pred - y) ** 2).mean()
print(loss)

tensor(45982.2891)


Now, implement the backward pass.
You need to minimize the loss with respect to each weight using the chain rule of differentiation.

In [31]:
## Compute the gradient of w2 with respect to the loss
# *****START CODE
d_loss_d_y_pred = 2.0 * (y_pred - y)
d_y_pred_d_w2 = h_relu

print('d_loss_d_y_pred', d_loss_d_y_pred.shape)
print('d_y_pred_d_w2', d_y_pred_d_w2.shape)

d_loss_d_w2 = d_y_pred_d_w2.T @ d_loss_d_y_pred

print('w2', w2.shape)
print('d_loss_d_w2', d_loss_d_w2.shape)
# *****END CODE

d_loss_d_y_pred torch.Size([64, 10])
d_y_pred_d_w2 torch.Size([64, 100])
w2 torch.Size([100, 10])
d_loss_d_w2 torch.Size([100, 10])


In [36]:
## Compute the gradient of w1 with respect to the loss (consider the derivative of ReLU equal to 1)
# *****START CODE
d_loss_d_y_pred = 2.0 * (y_pred - y)
d_y_pred_d_h = w2
d_h_d_w1 = x

print('d_loss_d_y_pred', d_loss_d_y_pred.shape)
print('d_y_pred_d_h', d_y_pred_d_h.shape)
print('d_h_d_w1', d_h_d_w1.shape)

d_loss_d_w1 = d_loss_d_y_pred @ d_y_pred_d_h.t()
print('d_loss_d_w1 first step', d_loss_d_w1.shape)
d_loss_d_w1 = d_h_d_w1.t() @ d_loss_d_w1
print('d_loss_d_w1', d_loss_d_w1.shape)
print('w1', w1.shape)
# *****END CODE

d_loss_d_y_pred torch.Size([64, 10])
d_y_pred_d_h torch.Size([100, 10])
d_h_d_w1 torch.Size([64, 1000])
d_loss_d_w1 first step torch.Size([64, 100])
d_loss_d_w1 torch.Size([1000, 100])
w1 torch.Size([1000, 100])


In [37]:
## Update weights
# *****START CODE
w1 = w1 - learning_rate * d_loss_d_w1
w2 = w2 - learning_rate * d_loss_d_w2
# *****END CODE

Repeat the above process for a number of epochs and notice how the value of the loss changes.

In [43]:
##specify the number of epochs
# *****START CODE
epochs = 100
# *****END CODE

for t in range(epochs):
  # *****START CODE

  # forward pass
  h = x @ w1
  h_relu = F.relu(h)
  y_pred = h_relu @ w2
  
  # compute the loss
  loss = ((y_pred - y) ** 2).mean()

  # compute the gradient wrt w2
  d_loss_d_y_pred = 2.0 * (y_pred - y)
  d_y_pred_d_w2 = h_relu
  d_loss_d_w2 = d_y_pred_d_w2.T @ d_loss_d_y_pred

  # compute the gradient wrt w1
  d_loss_d_y_pred = 2.0 * (y_pred - y)
  d_y_pred_d_h = w2
  d_h_d_w1 = x
  d_loss_d_w1 = d_loss_d_y_pred @ d_y_pred_d_h.t()
  d_loss_d_w1 = d_h_d_w1.t() @ d_loss_d_w1

  # update the weights
  w1 = w1 - learning_rate * d_loss_d_w1
  w2 = w2 - learning_rate * d_loss_d_w2

  print('Epoch', t, 'loss', loss.item())

  # *****END CODE




Epoch 0 loss 0.5558471083641052
Epoch 1 loss 0.5234987735748291
Epoch 2 loss 0.4931202828884125
Epoch 3 loss 0.46459174156188965
Epoch 4 loss 0.4377848207950592
Epoch 5 loss 0.41260281205177307
Epoch 6 loss 0.3889264464378357
Epoch 7 loss 0.36666855216026306
Epoch 8 loss 0.34573671221733093
Epoch 9 loss 0.3260573148727417
Epoch 10 loss 0.3075433373451233
Epoch 11 loss 0.2901175618171692
Epoch 12 loss 0.27373018860816956
Epoch 13 loss 0.2582905888557434
Epoch 14 loss 0.24375469982624054
Epoch 15 loss 0.23007865250110626
Epoch 16 loss 0.21718856692314148
Epoch 17 loss 0.20505526661872864
Epoch 18 loss 0.19361460208892822
Epoch 19 loss 0.18283094465732574
Epoch 20 loss 0.17267245054244995
Epoch 21 loss 0.16309267282485962
Epoch 22 loss 0.1540582925081253
Epoch 23 loss 0.14554187655448914
Epoch 24 loss 0.13750937581062317
Epoch 25 loss 0.12993453443050385
Epoch 26 loss 0.12278727442026138
Epoch 27 loss 0.11604902893304825
Epoch 28 loss 0.10969187319278717
Epoch 29 loss 0.10368959605693817


In [44]:
print(y[0, :])

tensor([ 0.4395, -0.7543, -1.1628,  2.0784, -0.1075, -0.6689,  2.8065,  0.2962,
        -1.5878, -0.8216])


In [45]:
print(y_pred[0, :])

tensor([ 0.4243, -0.7883, -1.1590,  2.0786, -0.1146, -0.6894,  2.8130,  0.3624,
        -1.6275, -0.8242])
