# Tutorial 2: Introduction to PyTorch

## Tensors

A tensor is PyTorch's way of representing a multi-dimensional array. The data it contains can be located on the CPU or GPU. To use PyTorch, first we need to import the `torch` library.

In [1]:
import torch
import numpy as np

## Creating a Tensor

There are multiple ways to create a tensor. It can be created by specifying the data inside, similar to a Numpy array, or it can also be created by converting a numpy array into a tensor.

In [2]:
# TODO: create a tensor using the default pytorch way.
x = torch.tensor([[1, 2, 3], [4, 5, 6]])

# TODO: create a tensor by converting from numpy.
x2 = torch.from_numpy(np.array([[1, 2, 3], [4, 5, 6]]))

assert torch.all(x == x2)
print("x and x2 are equal")

x and x2 are equal


The most important properties of a tensor are the following:
- The tensor's shape: `x.shape`. This is the number of dimensions of the tensor, and how many elements are in each dimension.
- The tensor's datatype: `x.dtype`.
- The tensor's device: `x.device`. Whether the tensor is currently stored on the cpu or gpu, or another device (ex. Mac M2 chip).
- Whether the tensor requires a gradient or not: `x.requires_grad`. Whether PyTorch will automatically calculate gradients with respect to this tensor.

In [3]:
print(x)
print(f"Shape: {x.shape}")
print(f"Datatype: {x.dtype}")
print(f"Device: {x.device}")
print(f"Requires Grad: {x.requires_grad}")

tensor([[1, 2, 3],
        [4, 5, 6]])
Shape: torch.Size([2, 3])
Datatype: torch.int64
Device: cpu
Requires Grad: False


There are many other ways of initializing tensors with specific elements. Here, we list some of them. In most cases, the tensor is created by specifying the desired shape.

In [4]:
# TODO: Create tensors that are: empty, all with zeros, uniform from [0,1], from standard normal dist., all ones, identity matrix, with arange, with linspace
x =  torch.empty((2,3)) # empty: A tensor with no data currently inside. The value it starts with depends on the datatype
print(x) 
x =  torch.zeros((2,3)) # zeros: A tensor with 0s for all elements
print(x)
x =  torch.rand((2,3))# rand: A tensor where every element is uniformly sampled from [0, 1]
print(x)
x =  torch.randn((2,3))# randn: A tensor where every element is sampled from a standard normal distribution
print(x)
x =  torch.ones((2,3))# ones: A tensor with 1s for all elements
print(x)
x =  torch.eye(5) # eye: A tensor with 1s on the diagonals and 0s everywhere else. If the shape is a square (ex. 5x5), this produces an identity matrix
print(x)
x =  torch.arange(0,10,2) # arange: Similar to the python range() function. arange(a, b, c) produces a tensor by taking every c-th element between a and b.
print(x)
x =  torch.linspace(0,1,5) # linspace: Takes `steps` number of steps uniformly between the start and end values
print(x)

# TODO: "LIKE" functions

tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([[0.0555, 0.1664, 0.1115],
        [0.0481, 0.9500, 0.1323]])
tensor([[-1.2932,  0.7978,  0.0331],
        [-0.5080, -1.0042, -0.4427]])
tensor([[1., 1., 1.],
        [1., 1., 1.]])
tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]])
tensor([0, 2, 4, 6, 8])
tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])


In [5]:
x = torch.arange(4)
print(x.bool()) # boolean True/False
print(x.short()) # int16
print(x.long()) # int64
print(x.half()) # float16
print(x.float()) # float32
print(x.double()) # float64

tensor([False,  True,  True,  True])
tensor([0, 1, 2, 3], dtype=torch.int16)
tensor([0, 1, 2, 3])
tensor([0., 1., 2., 3.], dtype=torch.float16)
tensor([0., 1., 2., 3.])
tensor([0., 1., 2., 3.], dtype=torch.float64)


## Tensor Shape Manipulation

It is often very important to get tensors into a desired shape and number of dimensions. To do this, we need to use various functions.
- `x.reshape(new_shape)` converts the shape of `x` to have `new_shape`. This depends on the order of the dimensions. Therefore, if the tensor has shape (10, 20, 30) vs (30, 20, 10), the same reshape can produce different results.
- `x.view(new_shape)` does the same thing as `reshape`, except it ensures that the memory is contiguous. This means that the memory is actually moved around into the new shape, instead of just changing its appearance.
- `x.permute(i1, ..., ik)` permutes the dimensions `i1, ..., ik` in the order specified.
- Other important manipulations include `torch.cat`, `x.unsqueeze`, `x.squeeze`, `x.t()`. You should look these up on the PyTorch documentation https://pytorch.org/docs/stable/torch.html

### Exercise 1

Using the above functions for creating tensors and manipulating tensors, perform the following tasks:
1) Create a tensor `x` of shape (5, 5, 3) with elements drawn from a normal distribution
2) Change its shape into (3, 25) where the 3 comes from the last dimension `Note: You will need multiple functions to achieve this`
3) Change the shape into (1, 3, 25, 1) using `unsqueeze`
4) Create another tensor `y` which is a 3x3 identity matrix.
5) Combine both `x` and `y` into a single tensor `z`.

In [6]:
# Write code here
x=torch.randn(5,5,3)
x=x.permute(2,0,1).reshape(3,25)
x=x.unsqueeze(0).unsqueeze(-1)
y=torch.eye(3)
y_exp=y.unsqueeze(0).unsqueeze(2)
x=x*y_exp
print(x)


tensor([[[[ 1.0045,  0.0000,  0.0000],
          [-0.8090, -0.0000, -0.0000],
          [-1.4689, -0.0000, -0.0000],
          [ 0.1977,  0.0000,  0.0000],
          [-0.6926, -0.0000, -0.0000],
          [ 1.0584,  0.0000,  0.0000],
          [-1.4996, -0.0000, -0.0000],
          [-1.7456, -0.0000, -0.0000],
          [-0.8466, -0.0000, -0.0000],
          [-0.1559, -0.0000, -0.0000],
          [ 0.5697,  0.0000,  0.0000],
          [-0.3770, -0.0000, -0.0000],
          [-1.3671, -0.0000, -0.0000],
          [ 0.7583,  0.0000,  0.0000],
          [ 0.3837,  0.0000,  0.0000],
          [ 0.2422,  0.0000,  0.0000],
          [-0.2971, -0.0000, -0.0000],
          [-1.8993, -0.0000, -0.0000],
          [-1.2882, -0.0000, -0.0000],
          [ 1.6189,  0.0000,  0.0000],
          [ 0.9090,  0.0000,  0.0000],
          [-1.2587, -0.0000, -0.0000],
          [-1.4746, -0.0000, -0.0000],
          [-1.3276, -0.0000, -0.0000],
          [ 0.6553,  0.0000,  0.0000]],

         [[ 0.0000,  0.

## Tensor Mathematics

You can use the normal operators `+, -, *, /` on tensors, as long as their shapes are broadcastable. This means that by simply adding new dimensions of size 1, or repeating a tensor N times along a single dimension, the shapes can be made to be the same. Therefore, tensors of shape `(3, 3, 5)` and `(1, 3, 5)` are broadcastable since the second tensor can be repeated `3` times along the 0th dimension. However, tensors of shape `(3, 3, 5)` and `(1, 4, 6)` cannot be broadcast together. Using these operators will perform the operation elementwise.

Additionally, comparison operators such as `>, <, ==` will be made elementwise, and require the same shape between both tensors. It will return a tensor of booleans.

Some other useful math functions include:
- `torch.matmul(x, y)` performs matrix multiplication on tensors `x` and `y`.
- `torch.dot(x, y)` calculates the dot product of `x` and `y`.
- `torch.bmm(x, y)` performs batch matrix multiplication (given tensors of shape `(B, M, N)` and `(B, N, P)`, returns a tensor of shape `(B, M, P)` obtained by performing matrix multiplication on each matrix contained in dimensions `1` and `2`).
- `torch.sum(x, dim=N)` sums the elements of the tensor along a given dimension. By default, the dimension is collapsed. Therefore, summing a tensor of shape `(A, B ,C)` alone dimension 2 will result in a tensor of shape `(A, B)`. To keep the dimension, use the keyword `keepdim=True`. This will produce a tensor of shape `(A, B, 1)`.

Other math functions to check in the documentation include `torch.sin, torch.cos, torch.max, torch.min, torch.argmax, torch.argmin, torch.abs, torch.norm, torch.clamp, torch.any, and torch.all`.


### Exercise 2: Writing Loss Functions

Recall from tutorial `1` that the Mean-Squared Error (MSE) loss function is defined by ![](https://github.com/OsmanMutlu/rawtext/raw/master/img/Comp541-Lab1-Screenshot7.png). Write a function called `mse_loss` which takes `2` tensors as input, `x` and `y`, and calculates the MSE between them. Assume that they have the same shape.


In [7]:
# Write code here
def mse_loss(x, y):

  loss = torch.sum((x - y) ** 2) / x.numel()


  return loss

## Tensor Indexing

Indexing of tensors follows the same patterns as with numpy. If `x` is a tensor, you can obtain an element of `x` by doing `x[a1, a2, ..., ak]`. To take all elements in a specific dimension, use `:` instead of a number.

To get elements satisfying some condition, you can use `x[some condition involving x]`. For example `x[x < 5]` will return only the elements of `x` which are less than `5`.

In [8]:
x = torch.arange(10)
# todo: show elements of x that are lower then 2 or greater then 6
mask = (x < 2) | (x > 6)
print(x[mask])

# todo: show elements of x that are even
mask = (x % 2 == 0)
print(x[mask])


tensor([0, 1, 7, 8, 9])
tensor([0, 2, 4, 6, 8])


## Neural Network Layers

Now that we know how to work with tensors, we can use layers and actually learn something.

For now, we will work with `Linear` layers. These are the layers of an MLP/Feedforward Network. To define a linear layer, use the following:
    `layer = torch.nn.Linear(num_input_elements, num_output_elements, bias=??)`. Applying this to a tensor is equivalent to calculating `y = x A^T + b` where `A` is a matrix of shape `(num_output_elements, num_input_elements)` and `b` is a vector of shape `(num_output_elements)`. By setting `bias=False` we can disable `b`.

Another important layer is `torch.nn.ReLU()`. This layer applies the nonlinear activation function `ReLU` to each element of the input. Recall from linear algebra that the composition of multiple affine transformations is itself an affine transformation. Therefore, without using `ReLU`, applying many Linear layers is equivalent to just applying a single Linear layer.

If we want to combine multiple layers together into a pipeline, we can use `network = torch.nn.Sequential(layer1, layer2, layer3, ..., layerk)`. Then, running `network(x)` is equivalent to applying `layerk(... (layer3(layer2(layer1(x))))...)`.

Finally, if we want to create a class containing these layers, the class should extend from `torch.nn.Module`. Inside the constructor of this class, the first thing that MUST be done is calling `super().__init__()`. You will receive an error if you do not do this.

When creating a class extending from `torch.nn.Module`, in order to define what the class does, you should implement a function `forward(self, ...)` which takes as input the desired inputs of the network, and outputs the result of the network.

Finally, if `network` is an object of some class extending `torch.nn.Module`, we can place it on a GPU by using `network.to(device)` where `device` is either a string specifying the device, or a `torch.device` value.

### Exercise 3: Creating a MLP Network

Create a class called `NN` which will contain a simple network of layers, satisfying the following conditions.
- The constructor should take as input two integers: `input_size` and `num_classes`.
- There should be two Linear layers. The first layer will take as input `input_size` and output `64`, while the second layer will take as input `64` and output `num_classes`.
- In between the two Linear layers, there should be a `ReLU` layer.
- The input will have shape (B, M, N). Before providing it as input to the network, you should reshape it to have shape `(B, M * N)`.

In [9]:
import torch
import torch.nn as nn

In [10]:
# Write code here

class NN(nn.Module):
  #todo: add parameters here
  def __init__(self, input_size, num_classes):
    super().__init__()
    self.layer1=nn.Linear(input_size, 64)
    self.layer2=nn.Linear(64, num_classes)
    self.relu=nn.ReLU()



  def forward(self, x):

    B=x.shape[0]
    x=x.view(B,-1) # flatten
    x=self.layer1(x)
    x=self.relu(x)
    x=self.layer2(x)

    return x
  

  


## Learning MNIST

Now, we will combine everything we have learned so far in order to train a simple network. The goal of the network will be to take as input an image of a number (0-9), and output the number it thinks the image contains.

This is an example of a classification problem, where the input is the image, and the output is the class (value) that the image belongs to.

### Getting and Setting Up the Data
First, in order to train this model, we will need the data. We will use a dataset called MNIST, which contains black and white images of shape 28x28, along with an integer specifying the value contained in the image.

In [11]:
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from tqdm import tqdm
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# If you are using a Macbook with Apple Silicon, and have set up PyTorch for your Mac, comment out the above line and uncomment the below line
# device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")

BATCH_SIZE = 64

train_dataset = datasets.MNIST(root="dataset/", train=True, transform=transforms.ToTensor(), download=True)
test_dataset = datasets.MNIST(root="dataset/", train=False, transform=transforms.ToTensor(), download=True)
train_loader = DataLoader(dataset=train_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=BATCH_SIZE, shuffle=False)

In [12]:
INPUT_SIZE = 784
NUM_CLASSES = 10

### Exercise 4

Create an NN object using the NN class defined above, and call it `model`. `input_size` should be `INPUT_SIZE` and `num_classes` should be `NUM_CLASSES`. After creating the model, send it to the device specified above.

In [13]:
# Write code here
model = NN(INPUT_SIZE, NUM_CLASSES).to(device)


## Training and Optimizing

In order to train the model, we need to use an optimizer. An optimizer is a class which implements Stochastic Gradient Descent (SGD) or one of its variants, and uses it to update the weights of a model. To use normal SGD, you can create an object of class `torch.optim.SGD`.

However, we will use an optimizer called `Adam`, which is one of the most popular optimizers used today. This is created with `torch.optim.Adam`.

In order to create an optimizer, you will need two things: the parameters of a model, and a learning rate. If `model` is a class extending `torch.nn.Module`, you can call `model.parameters()` to obtain all the parameters of the model.

Given an optimizer called `optimizer`, there are two important functions.
1) `optimizer.zero_grad()`: This removes all previously calculated gradients. You need to call this each time you want to use the optimizer, BEFORE calculating the gradients.
2) `optimizer.step()`: Once the gradients have been calculated, calling this will use the gradients to update all the relevant weights.

Finally, we need to calculate the gradients. In order to do this, we MUST have a tensor containing a single value, which represents the current loss. If `loss` is the tensor containing this value, all the gradients can be calculated using `loss.backward()`.

### Exercise 5
Create an variable called LR and set it equal to `0.001`. Then, create an Adam optimizer called `optimizer`, provide it the parameters of the `model` object created above, and set its learning rate to be `LR`.

In [14]:
# Write code here
import torch.optim as optim


## Running Training

Now, we will train our model on the MNIST dataset.

In [15]:
EPOCHS = 3

for epoch in range(EPOCHS):
    pbar = tqdm(train_loader)
    for idx, (data, targets) in enumerate(pbar):
        source = data.to(device)
        targets = targets.to(device)

        # todo: obtain scores by passing the input data through the model
        scores = model(source)

        # todo 2: obtain the loss by using nn.CrossEntropyLoss
        loss = nn.CrossEntropyLoss()(scores, targets)

        # todo 3: reset the grads of the optimizer, do the backward pass, and perform the optimizer step (w = w - lr * grad(loss))
        optimizer = optim.SGD(model.parameters(), lr=0.01)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if idx % 100 == 0:
            pbar.set_description(f"Loss: {loss.item():.2f}")



Loss: 0.50: 100%|██████████| 938/938 [00:06<00:00, 143.28it/s]
Loss: 0.42: 100%|██████████| 938/938 [00:05<00:00, 164.81it/s]
Loss: 0.35: 100%|██████████| 938/938 [00:05<00:00, 163.98it/s]


## Evaluating Performance

Finally, once we have a trained model, we need to evaluate its performance. To do so, we will use the test set.

In [16]:
def check_accuracy(loader, model):

    num_correct = 0
    num_samples = 0
    model.eval()

    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device)
            y = y.to(device=device)

            scores = model(x)
            _, predictions = scores.max(1)
            num_correct += (predictions == y).sum()
            num_samples += predictions.size(0)

    return num_correct/num_samples

print(f"Accuracy on training set: {check_accuracy(train_loader, model)*100:.2f}")
print(f"Accuracy on test set: {check_accuracy(test_loader, model)*100:.2f}")

Accuracy on training set: 89.66
Accuracy on test set: 90.30
