# Pytorch introduction

Credits are given to the official Introduction to Pytorch from which some parts of this notebook were taken: https://pytorch.org/tutorials/beginner/introyt/introyt1_tutorial.html

If you run this notebook in google colab, set colab to True else False

In [None]:
colab = True

In [None]:
# setting data directory
import os
if colab:
    # define download url
    base = 'https://data.goettingen-research-online.de/api/access/datafile/:persistentId?persistentId=doi:10.25625/9AIY3V'
    folder = '3ZFUWQ'
    download_url = os.path.join(base, folder)

    # define save paths
    save_name_zip = '1_introduction.zip'
    raw_data_folder = 'data/raw_data'
    save_data_folder = 'data/output_data'

    # make data directories
    !mkdir -p $raw_data_folder
    !mkdir -p $save_data_folder

    # download and unzip data
    !wget -O $save_name_zip $download_url
    !unzip $save_name_zip -d $raw_data_folder
    !rm -rf $save_name_zip
    
    home_dir = '/content'
    raw_data_dir = os.path.join(home_dir, 'data/raw_data')
    output_data_dir = os.path.join(home_dir, 'data/output_data')
else:
    exercise_dir = os.path.dirname(os.path.abspath("__file__"))
    danuma_dir = os.path.dirname(os.path.dirname(exercise_dir))
    raw_data_dir = os.path.join(danuma_dir, 'data/raw_data')

### Pytorch basics

Whenever you are looking for a Pytorch function that does something specific or you want to know what a certain function or class does in Pytorch, \
you can always google the command and take a look at the official Pytorch documentation. \
For example, this is the documentation for the torch.Tensor object: https://pytorch.org/docs/stable/tensors.html \
Let's start this intro by importing torch!

In [1]:
import torch

Let’s see a few basic tensor manipulations. First, just a few of the ways to create tensors:

In [None]:
z = torch.zeros(5, 3)
print(z)
print(z.dtype)

Above, we create a 5x3 matrix filled with zeros, and query its datatype to find out that the zeros are 32-bit floating point numbers, which is the default PyTorch. \
What if you wanted integers instead? You can always override the default:

In [None]:
i = torch.ones((5, 3), dtype=torch.int16)
print(i)

You can see that when we do change the default, the tensor helpfully reports this when printed. \
To create a tensor with numbers randomly drawn from a uniform distribution between 0 and 1, you can use the torch.randn function. \
For reproducibility, it is often useful to set a seed so that the same random numbers are drawn when the function is called:

In [None]:
torch.manual_seed(1729)
r1 = torch.rand(2, 2) # returns a tensor filled with random numbers from a uniform distribution on the interval [0, 1)
print('A random tensor:')
print(r1)

r2 = torch.rand(2, 2)
print('\nA different random tensor:')
print(r2) # new values

torch.manual_seed(1729)
r3 = torch.rand(2, 2)
print('\nShould match r1:')
print(r3) # repeats values of r1 because of re-seed

PyTorch tensors perform arithmetic operations intuitively. Tensors of similar shapes may be added, multiplied, etc. Operations with scalars are distributed over the tensor:

In [None]:
ones = torch.ones(2, 3)
print(ones)

twos = torch.ones(2, 3) * 2 # every element is multiplied by 2
print(twos)

threes = ones + twos       # addition allowed because shapes are similar
print(threes)              # tensors are added element-wise
print(threes.shape)        # this has the same dimensions as input tensors

r1 = torch.rand(2, 3)
r2 = torch.rand(3, 2)
# uncomment this line to get a runtime error because shapes do not fit together
# r3 = r1 + r2

Here’s a small sample of the mathematical operations available:

In [None]:
r = (torch.rand(2, 2) - 0.5) * 2 # values between -1 and 1
print('A random matrix, r:')
print(r)

# Common mathematical operations are supported:
print('\nAbsolute value of r:')
print(torch.abs(r))

# ...as are trigonometric functions:
print('\nInverse sine of r:')
print(torch.sin(r))

# ...and statistical and aggregate operations:
print('\nAverage and standard deviation of r:')
print(torch.std_mean(r))
print('\nMaximum value of r:')
print(torch.max(r))

There are also many other useful commands. To give a few examples:

In [None]:
# Instead of uniformly distributed random numbers, there are also other random number generators available, for example:
ints = torch.randint(0, 10, (10,))
print('Random integers:')
print(ints)
gaussians = torch.randn(10)
print('Random numbers from standard normal distribution:')
print(gaussians)

# Instead of getting the maximum or minimum value, it is often useful to get the index of the maximum or minimum value:
r = torch.rand(10)
print('\nA random vector, r:')
print(r)
print('Indice of the maximum value in r:')
print(torch.argmax(r))
print('Indice of the minimum value in r:')
print(torch.argmin(r))

# creates regularly spaced numbers between two values
linspace = torch.linspace(1, 10, 10)
print('\nRegularly spaced numbers between 1 and 10:')
print(linspace)

# numpy arrays can be converted to tensors and vice versa
import numpy as np
a = np.array([1, 2, 3])
t = torch.from_numpy(a) # shares memory with numpy array, as opposed to using 'torch.tensor(a)'
print('\nNumpy array:')
print(a)
print('Tensor from numpy array:')
print(t)
print('Numpy array from tensor:')
print(t.numpy())

Tensors in Pytorch are objects. As you know, objects have attributes and methods. These are often useful to obtain characteristics of tensor. \
In fact, the functionality of many functions is also directly available as a tensor method:

In [None]:
r = torch.rand(10)
print('\nA random vector, r:')
print(r)
print('\nIndice of the maximum value in r:')
print(torch.argmax(r))
print('\nYou can also obtain the indice of the maximum value via a method instead of an external function:')
print(r.argmax())

### Pytorch dimensions

Just as numpy, Pytorch can easily represent multi-dimensional objects such as images:

In [None]:
from PIL import Image # PIL is a Python Imaging Library
import torchvision.transforms as transforms # torchvision offers useful tools for image processing
from IPython.display import display

# Open the image file
image_path = os.path.join(raw_data_dir, '1_introduction/test1.jpg')
image = Image.open(image_path)
display(image)

# Convert to tensor
transform = transforms.ToTensor()
image = transform(image)
print(f'Tensor shape: {image.shape}')
print(f'Tensor type: {image.dtype}')

As you can see, Pytorch represents images as three-dimensional tensors. \
The first dimension represents the red, green and blue values for each pixel. \
The second and third dimension represent the height and width of the image. \
Indexing a tensor works just as in numpy, for example:

In [None]:
print('Get pixel values at the center of the image:')
print(image[:, 180, 320])

print('\nGet only the red-channel of the image')
print(image[0, :, :].shape)
print(image[0, :, :])

transforms.ToTensor() automatically normalized pixel values to the interval (0, 1). \
The actual intensity for pixels is between 0 and 255. \
To get the original unnormalized values, you can simply multiply them with 255:

In [None]:
print('Unnormalized pixel values:')
print(image[:, 180, 320] * 255)

We can also inspect how the individual color channels contribute to the image by setting the other values to zero

In [None]:
image_copy = image.clone()
image_copy[[0, 2], ...] = 0

transform = transforms.ToPILImage()
image_copy = transform(image_copy)
display(image_copy)

You can also add dimensions to an existing tensor:

In [None]:
example_tensor = torch.randn(3, 224, 224)

example_tensor_with_added_dimension = example_tensor.unsqueeze(0)
print('\nAdded dimension at first position:')
print(example_tensor_with_added_dimension.shape)

example_tensor_with_added_dimension = example_tensor.unsqueeze(1)
print('\nAdded dimension at second position:')
print(example_tensor_with_added_dimension.shape)

example_tensor_with_added_dimension = example_tensor[None, None, :, :, :, None]
print('\nIf you want to add dimensions at the beginning or at the end, you can also just index the whole tensor and add None:')
print(example_tensor_with_added_dimension.shape)

Redundant dimensions that only have a size of one can also be removed again:

In [None]:
print('Removing redundant dimensions that only have a size of one can be done with squeeze:')
print(example_tensor_with_added_dimension.squeeze().shape)

Multiple tensors are often stacked together in a batch so that they can be processed together. \
This can be done by adding a new dimension and then stacking the tensor along this dimension:

In [None]:
# create 4 pseudo-images using list comprehension
images = [torch.randn(3, 224, 224) for _ in range(4)]
# add dimension at first position for each of these images, again using list comprehension
images = [image.unsqueeze(0) for image in images]
# stack these images along the first dimension to obtain a batch of images
images = torch.cat(images, dim=0)

print('The first dimension of the images tensor represents the batch:')
print(images.shape)

When performing operations on high-dimensional tensors, it is often helpful to make use of so-called "broadcasting". \
In broadcasting, one tensor that is involved in the operation of interest is implicitly enlarged by duplicating values along other dimensions. \
Then the operation of interest is performed on this enlarged version. Let's have a look at an example:

In [None]:
# we create a pseudo-image first
image = torch.randn(3, 224, 224)
print(f'The shape of the image tensor: {image.shape}')
# we want to multiply the red channel by 0, the green channel by 0, and the blue channel by 1
multiplier = torch.tensor([0, 0, 1])
# We add dimensions to the multiplier tensor to make it compatible with the image tensor
multiplier = multiplier[:, None, None]
print(f'The shape of the multiplier tensor: {multiplier.shape}')
# If we now compute the product of the image tensor and the multiplier tensor, 
# each red, green and blue value of the image will be multiplied by the corresponding value in the multiplier tensor
product = image * multiplier
# All red and green values will be zeros
print('\nThe red and green values are zero:')
print(product[[0, 1], :, :])
# All blue values will be the same as in the original image
print('\nThe blue values are the same as in the original image:')
print(product[2, :, :])

### Pytorch with gpu

Check if a gpu is available:

In [None]:
torch.cuda.is_available()

If cuda is available, you can put tensors on the gpu:

In [None]:
image = torch.randn(3, 224, 224)
image = image.cuda() 
# image = image.to('cuda') # does the same job

When performing computations, make sure that all tensors involved are on the same device (i.e. cuda or cpu):

In [7]:
test_tensor1 = torch.randn(64, 10).cuda()
test_tensor2 = torch.randn(64, 10).cuda()
elementwise_product = test_tensor1 * test_tensor2

Compared to a cpu that performs arithmetic operations one by one, a gpu performs operations simultaneously. \
That means that all 640 operations required to perform the elementwise product are performed at the same time in the previous example. \
This massively speeds up training and inference time for neural networks.

If two tensors are not on the same device and you try to perform operations, this will result in an error:

In [None]:
test_tensor1 = torch.randn(64, 10).cuda()
test_tensor2 = torch.randn(64, 10)
elementwise_product = test_tensor1 * test_tensor2

### Pytorch modules (model components)

Let's take a look at how tensors are being processed by pytorch modules. \
We start by importing the necessary libraries:

In [14]:
import torch.nn as nn # the parent object for PyTorch models

The simplest model component is a linear layer that simply computes a weighted sum of the input. \
For demonstration purposes, the layer is initialized without a bias and with a constant value for all parameters.

In [None]:
linear_layer = nn.Linear(in_features=10, out_features=1, bias=False)
torch.nn.init.constant_(linear_layer.weight, 1.0)
print(linear_layer.weight)
print(f'bias: {linear_layer.bias}')

We basically initialized a linear model where all parameters w1, ..., w10 have a value of 1 and there is no intercept (called bias in the context of machine learning). \
Since the parameter values are all 1, this linear layer just computes the sum of the input. \
We could also initialize multiple linear models when setting out_features > 1. Then multiple weighted sums of the input would be computed. \
We can apply the module to an input tensor by calling it like a function:

In [None]:
input = torch.ones(10)
result = linear_layer(input)
print(result)

If we apply the model on a tensor that also consists only of ones, we of course get 10 as a result. \
The attributes requires_grad of the weights and grad_fn of the result indicate that these tensors will be taken into account when a gradient is computed. \
We will take about this tomorrow, so feel free to ignore this right now.\
\
We can also apply the linear layer on a batch of inputs. This will independently apply the module on each individual input in the batch:

In [None]:
# lets fill the first input of the batch with ones, the second with twos, the third with threes, and so on...
inputs = torch.ones(64, 10)
multiplier = torch.arange(1, 65, 1)
multiplier = multiplier[:, None]
inputs = inputs * multiplier
# we can now pass the inputs to the linear layer
output = linear_layer(inputs)
print(f'Shape of the output: {output.shape}')
print(f'As expected, the first output is 10, the second is 20, the third is 30, and so on: \n{output}')

### Exercises

1. Given a tensor of size 100 with random numbers, obtain a new tensor that contains the 10 largest elements

In [None]:
# tensor of random numbers
torch.manual_seed(1729)
r = torch.rand(100)
print(r)


######### YOUR CODE HERE:

2. Given a tensor of size 64x100, obtain for each row the index of the maximum value

In [None]:
# tensor of random numbers
torch.manual_seed(1729)
r = torch.rand(64, 100)
print(r)


######### YOUR CODE HERE:

3. Replicate the result from the broadcasting example from the Pytorch dimensions section without broadcasting. \
Hint: You can use the torch.tile function to obtain a tensor with the same size of the image and perform elementwise multiplication.

In [None]:
# we create a pseudo-image first
image = torch.randn(3, 224, 224)
print(f'The shape of the image tensor: {image.shape}')
# we want to multiply the red channel by 0, the green channel by 0, and the blue channel by 1
multiplier = torch.tensor([0, 0, 1])
# We add dimensions to the multiplier tensor to make it compatible with the image tensor
multiplier = multiplier[:, None, None]


######### YOUR CODE HERE:

4. Given a tensor of size 10, obtain a new tensor of size 100 where each element in the original tensor is repeated 10 times. \
This is different from repeating the full tensor 10 times which could be done with torch.tile.

In [None]:
r = torch.arange(10)


######### YOUR CODE HERE:

5. What does the torch.flatten function do? Apply the function to a batch of pseudo images. \
Only flatten the images and not the batch dimension!

In [20]:
images = torch.rand(64, 3, 224, 224)


######### YOUR CODE HERE:

6. Define and apply a linear layer with 5 in_features and 2 out_features. Apply the layer and inspect the output shape and values. \
Obtain the same output shape by manually defining a (random) weights matrix and multiplying the inputs with it using matrix multiplication. \
In case you are unfamiliar with the way matrix-vector or matrix-matrix multiplication works, have a look at this visualization: http://matrixmultiplication.xyz/

In [None]:
inputs = torch.rand(8, 5)
print(inputs)


######### YOUR CODE HERE:

7. What does the nn.ReLu module do? Apply the module on a random tensor and inspect the output.

In [None]:
r = torch.randn(100)


######### YOUR CODE HERE:

8. You already applied a linear layer with a specific number of input and output features to a batch of inputs in exercise 5. \
Now your task is to alternately apply mutliple linear layers and relu functions successively to a batch of inputs. \
The three linear layers should have the following input/output feature sizes: 5/10, 10/20 and 20/1. \
Each linear layer except for the last one should be followed by a relu function.

In [None]:
inputs = torch.rand(8, 5)
print(inputs.shape)


######### YOUR CODE HERE:

9. Bonus: We did not yet talk about why it makes sense to stack multiple linears and relu functions behind each other. \
So do not worry about this in too much detail now. However, to provide some food for thought: \
Do you have an idea why it does NOT make sense to simply stack multiple linear layers behind each other? \
How is this counteracted by adding relu functions? \
Justify your answer in words (or a sketch the idea of a small mathematical proof)

Hint: Consider the representational capacity of multiple linear layers stacked behind each other.

In [23]:
######### YOUR ANSWER HERE (code is not necessarily required): 

### further learning sources

If you do not yet feel quite comfortable with tensors (i.e. matrices and vectors) and their operations, I recommend you to have a look at the three blue one brown linear Algebra series (and his channel in general). He offers amazing teaching videos. However, they are probably too extensive to watch and fully understand during the course of this summer school: https://www.youtube.com/watch?v=fNk_zzaMoSs&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab