# Lab 1. PyTorch and ANNs

This lab is a warm up to get you used to the PyTorch programming environment used
in the course, and also to help you review and renew your knowledge
of Python and relevant Python libraries.
The lab must be done individually. Please recall that the
University of Toronto plagarism rules apply.

By the end of this lab, you should be able to:

1. Be able to perform basic PyTorch tensor operations.
2. Be able to load data into PyTorch
3. Be able to configure an Artificial Neural Network (ANN) using PyTorch
4. Be able to train ANNs using PyTorch
5. Be able to evaluate different ANN configuations

## Part 1. Python Basics [3 pt]

The purpose of this section is to get you used to the
basics of Python, including working with functions, numbers,
lists, and strings.

Note that we **will** be checking your code for clarity and efficiency.

If you have trouble with this part of the assignment, please review http://cs231n.github.io/python-numpy-tutorial/

### Part (a) -- 1pt

Write a function `sum_of_cubes` that computes the sum of cubes up to `n`. If the input to `sum_of_cubes` invalid (e.g. negative or non-integer `n`), the function should print out `"Invalid input"` and return `-1`.

In [None]:
def sum_of_cubes(n):

  if  n < 0 or n % 1 !=0:   # Check if the input is negative or non-integer
    print("Invalid input")
    return -1

  else:                     # Compute sum of cubes for integer n
    sum = 0
    for i in range (1, n+1):
      sum += i**3

    return sum

In [None]:
n = 3
result = sum_of_cubes(n)
if result != -1:
  print(f"The sum of cubes up to {n} is: {result}")

The sum of cubes up to 3 is: 36


In [None]:
n = -1
result = sum_of_cubes(n)
if result != -1:
  print(f"The sum of cubes up to {n} is: {result}")

Invalid input


### Part (b) -- 1pt

Write a function `word_lengths` that takes a sentence (string), computes the length of each word in that sentence, and returns the length of each word in a list. You can
assume that words are always separated by a space character `" "`.

Hint: recall the `str.split` function in Python.
If you arenot sure how this function works, try
typing `help(str.split)` into a Python shell, or check out https://docs.python.org/3.6/library/stdtypes.html#str.split

In [None]:
help(str.split)

In [None]:
def word_lengths(sentence):

  if isinstance(sentence, str):  # Check if the input sentence is a string
    words = sentence.split()
    lengths = []
    for word in words:
      length = len(word)
      lengths.append(length)
    return lengths

  else:                         # Invalid input if the input is not a string
    print("Invalid input")
    return -1

In [None]:
sentence = "welcome to APS360!"
result = word_lengths(sentence)
if result != -1:
  print(result)

[7, 2, 7]


In [None]:
sentence = "machine learning is so cool"
result = word_lengths(sentence)
if result != -1:
  print(result)

[7, 8, 2, 2, 4]


### Part (c) -- 1pt

Write a function `all_same_length` that takes a sentence (string),
and checks whether every word in the string is the same length.
You should call the function `word_lengths` in the body
of this new function.


In [None]:
def all_same_length(sentence):

  lengths = word_lengths(sentence) # get the lengths of each word

  if lengths != -1:                # check if input it valid
    for length in lengths[1:]:
      if length != lengths[0]:
        return False               # return false if every word in the sentence is not the same length

    return True

In [None]:
sentence = "all same length"
all_same_length(sentence)

False

In [None]:
sentence = "hello world"
all_same_length(sentence)

True

## Part 2. NumPy Exercises [5 pt]

In this part of the assignment, you'll be manipulating arrays
usign NumPy. Normally, we use the shorter name `np` to represent
the package `numpy`.

In [None]:
import numpy as np

### Part (a) -- 1pt

The below variables `matrix` and `vector` are numpy arrays. Explain what you think `<NumpyArray>.size` and `<NumpyArray>.shape` represent.

In [None]:
matrix = np.array([[1., 2., 3., 0.5],
                   [4., 5., 0., 0.],
                   [-1., -2., 1., 1.]])
vector = np.array([2., 0., 1., -2.])

In [None]:
matrix.size

12

In [None]:
# I think matrix.size represents how many elements are in the matrix/numpy array. In the example, there are 12 elements in the matrix

In [None]:
matrix.shape

(3, 4)

In [None]:
# I think matrix.shape represents the dimension of the matrix/numpy array. The first element of the tuple represents the number of rows in the matrix/numpy array
# The second element of the tuple represents the number of columns in the matrix/numpy array. In the example, there are 3 rows and 4 columns in the matrix.

In [None]:
vector.size

4

In [None]:
# I think vector.size represents how many elements are in the vector/numpy array. In the example, there are 4 elements in the vector.

In [None]:
vector.shape

(4,)

In [None]:
# I think vector.shape represents the dimension of the vector/numpy array. Since this vector is a 1D array and not a 2D array like the matrix.
# The first element of the tuple represents the number of columns in the vector.
# The second element of the tuple is blank since a vector will always just have 1 row.

### Part (b) -- 1pt

Perform matrix multiplication `output = matrix x vector` by using
for loops to iterate through the columns and rows.
Do not use any builtin NumPy functions.
Cast your output into a NumPy array, if it isn't one already.

Hint: be mindful of the dimension of output

In [None]:
if matrix.shape[1] != vector.shape[0]:
  print("Incompatible dimensions for matrix multiplication.")
  output = None

else:
  rows = matrix.shape[0]
  columns = matrix.shape[1]

  output_array = []

  for i in range(rows):
    rows = 0

    for j in range(columns):
      rows += matrix[i, j] * vector[j]

    output_array.append(rows)

  output = np.array(output_array)

In [None]:
output

array([ 4.,  8., -3.])

### Part (c) -- 1pt

Perform matrix multiplication `output2 = matrix x vector` by using
the function `numpy.dot`.

We will never actually write code as in
part(c), not only because `numpy.dot` is more concise and easier to read/write, but also performance-wise `numpy.dot` is much faster (it is written in C and highly optimized).
In general, we will avoid for loops in our code.

In [None]:
output2 = np.dot(matrix, vector)

In [None]:
output2

array([ 4.,  8., -3.])

### Part (d) -- 1pt

As a way to test for consistency, show that the two outputs match.

In [None]:
if np.array_equal(output, output2):
    print("The two outputs match.")
else:
    print("The outputs do not match.")

The two outputs match.


### Part (e) -- 1pt

Show that using `np.dot` is faster than using your code from part (c).

You may find the below code snippit helpful:

In [None]:
import time

# record the time before running code 1
start_time1 = time.time()

rows = matrix.shape[0]
columns = matrix.shape[1]

output_array = []

for i in range(rows):
  rows = 0

  for j in range(columns):
    rows += matrix[i, j] * vector[j]

  output_array.append(rows)

output1 = np.array(output_array)

# record the time after code 1 is run
end_time1 = time.time()

# record the time before running code 2
start_time2 = time.time()

output2 = np.dot(matrix, vector)

# record the time after code 2 is run
end_time2 = time.time()

# compute the difference
diff1 = end_time1 - start_time1
diff2 = end_time2 - start_time2

print(f"Using a for loop took {diff1} seconds.\n")
print(f"Using np.dot took {diff2} seconds.\n")

# check if np.dot took more or less time than using a for loop
if diff2 < diff1:
  print("np.dot took less time than using a for loop")
else:
  print("np.dot took more time than using a for loop")

Using a for loop took 0.0003948211669921875 seconds.

Using np.dot took 0.00010704994201660156 seconds.

np.dot took less time than using a for loop


## Part 3. Images [6 pt]

A picture or image can be represented as a NumPy array of “pixels”,
with dimensions H × W × C, where H is the height of the image, W is the width of the image,
and C is the number of colour channels. Typically we will use an image with channels that give the the Red, Green, and Blue “level” of each pixel, which is referred to with the short form RGB.

You will write Python code to load an image, and perform several array manipulations to the image and visualize their effects.

In [None]:
import matplotlib.pyplot as plt

### Part (a) -- 1 pt

This is a photograph of a dog whose name is Mochi.

Load the image from its url into the variable `img` using the `plt.imread` function.

Hint: You can enter the URL directly into the `plt.imread` function as a Python string.

In [None]:
import PIL, urllib
img = (np.array(PIL.Image.open(urllib.request.urlopen("https://drive.google.com/uc?export=view&id=1oaLVR2hr1_qzpKQ47i9rVUIklwbDcews"))))/255.0

### Part (b) -- 1pt

Use the function `plt.imshow` to visualize `img`.

This function will also show the coordinate system used to identify pixels.
The origin is at the top left corner, and the first dimension indicates the Y (row) direction,
and the second dimension indicates the X (column) dimension.

In [None]:
plt.imshow(img)

### Part (c) -- 2pt

Modify the image by adding a constant value of 0.25 to each pixel in the `img` and
store the result in the variable `img_add`. Note that, since the range for the pixels
needs to be between [0, 1], you will also need to clip img_add to be in the range [0, 1]
using `numpy.clip`. Clipping sets any value that is outside of the desired range to the
closest endpoint. Display the image using `plt.imshow`.

In [None]:
img_add = np.clip(img+0.25, 0, 1)
plt.imshow(img_add)

### Part (d) -- 2pt

Crop the **original** image (`img` variable) to a 130 x 150 image including Mochi's face. Discard the alpha colour channel (i.e. resulting `img_cropped` should **only have RGB channels**)

Display the image.

In [None]:
img_cropped = img[20:150, 20:170, 0:3]
plt.imshow(img_cropped)

## Part 4. Basics of PyTorch [6 pt]

PyTorch is a Python-based neural networks package. Along with tensorflow, PyTorch is one of the most popular machine learning libraries.

PyTorch, at its core, is similar to Numpy in a sense that they both
try to make it easier to write codes for scientific computing
achieve improved performance over vanilla Python by leveraging highly optimized C back-end. However, compare to Numpy, PyTorch offers much better GPU support and provides many high-level features for machine learning. Technically, Numpy can be used to perform almost every thing PyTorch does. However, Numpy would be a lot slower than PyTorch, especially with CUDA GPU, and it would take more effort to write machine learning related code compared to using PyTorch.

In [None]:
import torch

### Part (a) -- 1 pt

Use the function `torch.from_numpy` to convert the numpy array `img_cropped` into
a PyTorch tensor. Save the result in a variable called `img_torch`.

In [None]:
img_torch = torch.from_numpy(img_cropped)

### Part (b) -- 1pt

Use the method `<Tensor>.shape` to find the shape (dimension and size) of `img_torch`.

In [None]:
img_torch.shape

torch.Size([130, 150, 3])

### Part (c) -- 1pt

How many floating-point numbers are stored in the tensor `img_torch`?

In [None]:
img_torch.numel()

58500

### Part (d) -- 1 pt

What does the code `img_torch.transpose(0,2)` do? What does the expression return?
Is the original variable `img_torch` updated? Explain.

In [None]:
img_torch.transpose(0,2)

In [None]:
print(img_torch.transpose(0,2).shape)
print(img_torch.shape)

torch.Size([3, 150, 130])
torch.Size([130, 150, 3])


`img_torch.transpose(0,2) swaps the 0th and 2nd dimension of the tensor. It returns the transpose of img_torch which is still a tensor but now has the shape [3, 150, 130]. The original variable img_torch is not updated because we did not assign img_torch.transpose(0,2) to img_torch, it is still a tensor with size [130, 150, 3].`

### Part (e) -- 1 pt

What does the code `img_torch.unsqueeze(0)` do? What does the expression return?
Is the original variable `img_torch` updated? Explain.

In [None]:
img_torch.unsqueeze(0)

In [None]:
print(img_torch.unsqueeze(0).shape)
print(img_torch.shape)

torch.Size([1, 130, 150, 3])
torch.Size([130, 150, 3])


`img_torch.unsqeeze(0) inserts a dimension of size 1 into the 0th index of the tensor. The expression returns a modified version of img_torch with shape [1, 130, 150, 3]. The original variable img_torch is not updated because we did not assign img_torch.unsqueeze(0) to img_torch, img_torch still has shape [130, 150, 3].`

### Part (f) -- 1 pt

Find the maximum value of `img_torch` along each colour channel? Your output should be a one-dimensional
PyTorch tensor with exactly three values.

Hint: lookup the function `torch.max`.

In [None]:
max_img_torch = torch.max(torch.max(img_torch, dim=0)[0], dim=0)[0]
print(f"The maximum values of img_torch along each colour channel are {max_img_torch[0]}, {max_img_torch[1]}, and {max_img_torch[2]}.")

The maximum values of img_torch along each colour channel are 0.8941176470588236, 0.788235294117647, and 0.6745098039215687.


## Part 5. Training an ANN [10 pt]

The sample code provided below is a 2-layer ANN trained on the MNIST dataset to identify digits less than 3 or greater than and equal to 3. Modify the code by changing any of the following and observe how the accuracy and error are affected:

- number of training iterations
- number of hidden units
- numbers of layers
- types of activation functions
- learning rate

Select at least three different options from the list above. For each option, please select two-three different parameters and provide a table.


In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
import matplotlib.pyplot as plt # for plotting
import torch.optim as optim

torch.manual_seed(1) # set the random seed

# define a 2-layer artificial neural network
class Pigeon(nn.Module):
    def __init__(self):
        super(Pigeon, self).__init__()
        self.layer1 = nn.Linear(28 * 28, 30)
        self.layer2 = nn.Linear(30, 1)
    def forward(self, img):
        flattened = img.view(-1, 28 * 28)
        activation1 = self.layer1(flattened)
        activation1 = F.relu(activation1)
        activation2 = self.layer2(activation1)
        return activation2

pigeon = Pigeon()

# load the data
mnist_data = datasets.MNIST('data', train=True, download=True)
mnist_data = list(mnist_data)
mnist_train = mnist_data[:1000]
mnist_val   = mnist_data[1000:2000]
img_to_tensor = transforms.ToTensor()


# simplified training code to train `pigeon` on the "small digit recognition" task
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(pigeon.parameters(), lr=0.005, momentum=0.9)

for (image, label) in mnist_train:
    # actual ground truth: is the digit less than 3?
    actual = torch.tensor(label < 3).reshape([1,1]).type(torch.FloatTensor)
    # pigeon prediction
    out = pigeon(img_to_tensor(image)) # step 1-2
    # update the parameters based on the loss
    loss = criterion(out, actual)      # step 3
    loss.backward()                    # step 4 (compute the updates for each parameter)
    optimizer.step()                   # step 4 (make the updates for each parameter)
    optimizer.zero_grad()              # a clean up step for PyTorch

# computing the error and accuracy on the training set
error = 0
for (image, label) in mnist_train:
    prob = torch.sigmoid(pigeon(img_to_tensor(image)))
    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):
        error += 1
print("Training Error Rate:", error/len(mnist_train))
print("Training Accuracy:", 1 - error/len(mnist_train))


# computing the error and accuracy on a test set
error = 0
for (image, label) in mnist_val:
    prob = torch.sigmoid(pigeon(img_to_tensor(image)))
    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):
        error += 1
print("Test Error Rate:", error/len(mnist_val))
print("Test Accuracy:", 1 - error/len(mnist_val))

Training Iterations | Training Error Rate | Training Accuracy | Test Error Rate | Test Accuracy
-------------------|------------------|------------------|------------------|------------------
1 (original) | 0.036 | 0.964 | 0.079 | 0.921
25 | 0.001 | 0.999 | 0.059 | 0.9410000000000001
50 | 0.0 | 1.0 | 0.059 | 0.9410000000000001


Hidden Units | Training Error Rate | Training Accuracy | Test Error Rate | Test Accuracy
-------------------|------------------|------------------|------------------|------------------
30 (original) | 0.036 | 0.964 | 0.079 | 0.921
100 | 0.03 | 0.97 | 0.077 | 0.923
500 | 0.025 | 0.975 | 0.074 | 0.926


Layers | Training Error Rate | Training Accuracy | Test Error Rate | Test Accuracy
-------------------|------------------|------------------|------------------|------------------
2 (original) | 0.036 | 0.964 | 0.079 | 0.921
3 | 0.045 | 0.955 | 0.079 | 0.921
4 | 0.041 | 0.959 | 0.093 | 0.907


Learning Rate | Training Error Rate | Training Accuracy | Test Error Rate | Test Accuracy
-------------------|------------------|------------------|------------------|------------------
0.005 (original) | 0.036 | 0.964 | 0.079 | 0.921
0.009 | 0.052 | 0.948 | 0.084 | 0.916
0.006 | 0.037 | 0.963 | 0.081 | 0.919
0.004 | 0.033 | 0.967 | 0.084 | 0.916
0.001 | 0.078 | 0.922 | 0.113 | 0.887


Activation Function | Training Error Rate | Training Accuracy | Test Error Rate | Test Accuracy
-------------------|------------------|------------------|------------------|------------------
relu (original) | 0.036 | 0.964 | 0.079 | 0.921
sigmoid | 0.073 | 0.927 | 0.117 | 0.883
tanh | 0.04 | 0.96 | 0.094 | 0.906

### Part (a) -- 3 pt
Comment on which of the above changes resulted in the best accuracy on training data? What accuracy were you able to achieve?

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
import matplotlib.pyplot as plt # for plotting
import torch.optim as optim

torch.manual_seed(1) # set the random seed

# define a 2-layer artificial neural network
class Pigeon(nn.Module):
    def __init__(self):
        super(Pigeon, self).__init__()
        self.layer1 = nn.Linear(28 * 28, 30)
        self.layer2 = nn.Linear(30, 1)
    def forward(self, img):
        flattened = img.view(-1, 28 * 28)
        activation1 = self.layer1(flattened)
        activation1 = F.relu(activation1)
        activation2 = self.layer2(activation1)
        return activation2

pigeon = Pigeon()

# load the data
mnist_data = datasets.MNIST('data', train=True, download=True)
mnist_data = list(mnist_data)
mnist_train = mnist_data[:1000]
mnist_val   = mnist_data[1000:2000]
img_to_tensor = transforms.ToTensor()


# simplified training code to train `pigeon` on the "small digit recognition" task
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(pigeon.parameters(), lr=0.005, momentum=0.9)

iterations = 50
for i in range(iterations):
  for (image, label) in mnist_train:
      # actual ground truth: is the digit less than 3?
      actual = torch.tensor(label < 3).reshape([1,1]).type(torch.FloatTensor)
      # pigeon prediction
      out = pigeon(img_to_tensor(image)) # step 1-2
      # update the parameters based on the loss
      loss = criterion(out, actual)      # step 3
      loss.backward()                    # step 4 (compute the updates for each parameter)
      optimizer.step()                   # step 4 (make the updates for each parameter)
      optimizer.zero_grad()              # a clean up step for PyTorch

# computing the error and accuracy on the training set
error = 0
for (image, label) in mnist_train:
    prob = torch.sigmoid(pigeon(img_to_tensor(image)))
    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):
        error += 1
print("Training Error Rate:", error/len(mnist_train))
print("Training Accuracy:", 1 - error/len(mnist_train))


# computing the error and accuracy on a test set
error = 0
for (image, label) in mnist_val:
    prob = torch.sigmoid(pigeon(img_to_tensor(image)))
    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):
        error += 1
print("Test Error Rate:", error/len(mnist_val))
print("Test Accuracy:", 1 - error/len(mnist_val))

`The change that resulted in the best accuracy on training data was increasing the number of training iterations from 1 to 50. I was able to achieve a training accuracy of 1.0 up from 0.964.`

### Part (b) -- 3 pt


Comment on which of the above changes resulted in the best accuracy on testing data? What accuracy were you able to achieve?

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
import matplotlib.pyplot as plt # for plotting
import torch.optim as optim

torch.manual_seed(1) # set the random seed

# define a 2-layer artificial neural network
class Pigeon(nn.Module):
    def __init__(self):
        super(Pigeon, self).__init__()
        self.layer1 = nn.Linear(28 * 28, 350)
        self.layer2 = nn.Linear(350, 1)
    def forward(self, img):
        flattened = img.view(-1, 28 * 28)
        activation1 = self.layer1(flattened)
        activation1 = F.relu(activation1)
        activation2 = self.layer2(activation1)
        return activation2

pigeon = Pigeon()

# load the data
mnist_data = datasets.MNIST('data', train=True, download=True)
mnist_data = list(mnist_data)
mnist_train = mnist_data[:1000]
mnist_val   = mnist_data[1000:2000]
img_to_tensor = transforms.ToTensor()


# simplified training code to train `pigeon` on the "small digit recognition" task
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(pigeon.parameters(), lr=0.004, momentum=0.9)

iterations = 30
for i in range(iterations):
  for (image, label) in mnist_train:
      # actual ground truth: is the digit less than 3?
      actual = torch.tensor(label < 3).reshape([1,1]).type(torch.FloatTensor)
      # pigeon prediction
      out = pigeon(img_to_tensor(image)) # step 1-2
      # update the parameters based on the loss
      loss = criterion(out, actual)      # step 3
      loss.backward()                    # step 4 (compute the updates for each parameter)
      optimizer.step()                   # step 4 (make the updates for each parameter)
      optimizer.zero_grad()              # a clean up step for PyTorch

# computing the error and accuracy on the training set
error = 0
for (image, label) in mnist_train:
    prob = torch.sigmoid(pigeon(img_to_tensor(image)))
    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):
        error += 1
print("Training Error Rate:", error/len(mnist_train))
print("Training Accuracy:", 1 - error/len(mnist_train))


# computing the error and accuracy on a test set
error = 0
for (image, label) in mnist_val:
    prob = torch.sigmoid(pigeon(img_to_tensor(image)))
    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):
        error += 1
print("Test Error Rate:", error/len(mnist_val))
print("Test Accuracy:", 1 - error/len(mnist_val))

`Based on the results from the table above, increasing the number of training iterations and increasing the number of hidden units both improved the test accuracy from the original code. There was no improvement in the test accuracy between 2 and 3 layers.`

`To get a better result, I changed the number of hidden units from 30 to 350, increased the number of iterations from 1 to 30, kept the activation function as relu, and after testing different values, I slightly decreased the learning rate from 0.005 to 0.004.`

`With these changes, I was able to achive a test accuracy of 0.947.`

### Part (c) -- 4 pt
Which model hyperparameters should you use, the ones from (a) or (b)?

`The model hyperparameters from (a) should be used since the ones for (b) were tuned to this specific test set. The best way to train an artifial neural network is to find the set of optimal weights such that the average loss on the training error is minimized. We are looking for a small loss in the error over all training samples because we don't want to overfit our model to the testing data and have it start memorizing.`