[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lorenzobasile/DeepLearning2022/blob/main/1_introduction.ipynb)



# Introduction to Labs


Teaching assistant: Lorenzo Basile (lore.basile@outlook.com), PhD student@UniTS

Please, do use the email for any doubt and clarification about labs and assignments!

Labs will be largely based on the notebooks by Marco Zullich (https://github.com/ansuini/DSSC_DL_2022/)


## Tentative structure of the course


*   Lectures
*   Laboratories (probably 7 sessions)
*   Lectures by experts on selected topics (sometimes on Fridays):
    
    *   Graph Neural Networks 
    *   Visual Cortex and DL
    *   Language Models and applications in Biology
    *   Diffusion Models

3 labs will end with the presentation of a homework, with delivery date approximately after 2 weeks. Timely delivery of the homework is not compulsory to pass the exam but will result in a lighter final exam.

## Computational resources

We will not run particularly heavy experiments during the labs, so for most parts you should be able to reproduce the experiments on the CPU of your personal laptop or desktop. However, to avoid issues with library versions and to avoid installing any package (and to take advantage of some hardware acceleration from time to time), we will be running the labs on Google Colab, a service that provides free access GPUs.

For your final projects, since you're likely to be doing something a bit heavier than what we see during the labs (and to gain some more experience with HPC facilities), it is a good idea to switch to the Orfeo cluster of AREA Science Park.


# Introduction to Colab

Colab is a free service provided by Google for ML research. It is based on Jupyter notebooks that run on a remote server, and it provides free (but limited time) GPU acceleration.

To enable GPU or TPU acceleration just go to `Runtime>Change runtime type` and choose from the menu. Please note that GPU usage is limited in time, so avoid requesting one if you do not really need it.

Inside a code cell you can use `!` to run shell commands:

In [None]:
!nvidia-smi    # if you enable GPU acceleration, this command returns information on the GPU
!pip install torch==1.11.0    # just an example, torch is already installed in Colab
!sudo apt-get install gcc    # you can also run sudo commands
!wget https://roboti.us/download/mujoco200_linux.zip    # and download data to a temporary memory
!git clone git@github.com:lorenzobasile/DeepLearning2022.git    # just another example

## Colab file system

By default, Colab accesses a volatile memory that is erased as soon as your process terminates, but you can interface it with your personal Google Drive to read and write data:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# note that if you want this command to be permanent you need to use the magic % instead of !
%cd drive/MyDrive/DeepLearning2022

In [None]:
!ls

# Brief guide to Orfeo cluster

For your projects, you may want to run slightly heavier experiments, that require a GPU for more time than Colab allows you to have for free. To do so, you will be provided access to the Orfeo cluster of AREA Science Park (more details on that later).

Orfeo is very straightforward to use, as it does not differ from a standard linux shell for most tasks. There are just a couple of commands to remember:

In [None]:
! ssh username@ct1-005.area.trieste.it    # to login
! qsub -q dssc_gpu -l nodes=N:ppn=M,walltime=HH:MM:SS -I    # to request access to a GPU computational node

Please note that the second command is asking for `M` cores per node on `N` computational nodes, for the time specified by the walltime argument. **Always remember to ask for a computational node before running any code**.

The flag `-I` means that we want to run interactively (i.e. we want to interact with the computational node through a linux shell). This is not the only possibility, and usually it is advisable to send a job by means of a bash script. To do so, just replace `-I` with the path to the script (for example `script.sh`).

Most libraries that you will need are preinstalled on Orfeo through conda, which can be loaded using:

In [None]:
! module load conda

If you send a bash script, at the end you will see two files in the same folder of your script, named `script.sh.o<JOB NUMBER>`and `script.sh.e<JOB NUMBER>`, containing respectively the `stdout` and `stderr` of your job.

You can track (and terminate if needed) you running jobs by using the commands:

In [None]:
! qstat -u username    # this command will return a list of the current processes you are running
! qdel <JOB NUMBER>    # to kill the selected process, the JOB NUMBER can be obtained reading the qstat information

# Getting started with PyTorch

In [None]:
import torch

## What is PyTorch?

PyTorch (or informally torch) is a Python library specifically built for Deep Learning, that comes with a series of very useful functionalities that make it one of the most popular tools for DL research and application.

Namely, it has many built-in features and modules useful for DL, tensor arithmetic and automatic differentiation features, and it allows for easy GPU acceleration through CUDA.

Another famous library for DL you may have heard of is TensorFlow, which also has a more user-friendly interface called Keras.

## Basic operation with Tensors

The main building block of PyTorch is the `Tensor` class. A torch `Tensor` is the equivalent of NumPy `ndarray` and most of the functionalities are the same as in NumPy.

In [None]:
import numpy as np

x=torch.tensor([[1,2,3],[4,5,6]])
y=np.array([[1,2,3],[4,5,6]])

print("X:", x)
print("Y:", y)



Basic NumPy array features exist for torch tensors:

In [None]:
x.shape, y.shape, x.size()

In [None]:
x.dtype, y.dtype

Note that you can build a tensor through the constructor `torch.Tensor`. In this case, since `torch.Tensor` is an alias for `torch.FloatTensor`, the tensor you create will have type `torch.float32`.

You can convert the dtype of a tensor by using the functions `float()`, `int()` etc.

More info on data types [here](https://pytorch.org/docs/stable/tensors.html).

In [None]:
x=torch.Tensor([[1,2,3], [4,5,6]])
print("Dtype of X:", x.dtype)
x=x.int()
print("Dtype of X:", x.dtype)

And you can create random tensors just like you create random arrays:

In [None]:
x=torch.rand(2,3,2)    # you can also use a list or a tuple for the dimensions
y=np.random.rand(2,3,2)
print("X:", x)
print("Y:", y)

You can easily compute statistics of tensors (such as the sum, mean, max, min, std... of the elements) by either using the methods of the `Tensor` class or using the basic torch functions and using your tensor as input:

In [None]:
x.sum(), torch.sum(x)

In [None]:
x.mean(), torch.mean(x)

In [None]:
x.argmin(), torch.argmin(x)

It is sometimes very useful to specify one or more dimensions to reduce (along which you want to perform your operations):

In [None]:
print(x)
x.mean(dim=0)

In [None]:
x.argmax(dim=1)

In [None]:
x.sum(dim=(0,1))

Tensor slicing works exactly like in NumPy, by means of square brackets:

In [None]:
x[0,1,1]

In [None]:
x[0,1:,1]

In [None]:
x[:,::2,:]

## Linear algebra and tensor reshaping



An operation we will frequently perform in Deep Learning (though often under the hood) is matrix multiplication. In torch, it can be done in many equivalent ways:

In [None]:
x=torch.rand(4,5)
y=x.T    # matrix transposition

print(x@y)
print(x.matmul(y))
print(torch.matmul(x,y))

Please note that the operator for matrix multiplication is `@`, not `*`, which indicates the Hadamard (element-wise) product instead.

In [None]:
x*x

Multiplying a matrix by itself is obviously equivalent to computing its power, and it can be done also by running one of the following commands:

In [None]:
torch.pow(x,2), x**2

As in NumPy, there exists a `dot` function to compute the scalar product between vectors. Note that differently from NumPy, in torch this is **not** equivalent to matrix multiplication, as it is intended to work only with 1D vectors.

In [None]:
v1=x[:,1]
v2=x[:,2]
print(v1.shape, v2.shape)

print(v1.dot(v2))    # in the case of 1D vectors, there is no difference between row and column vectors
print(v1.matmul(v2))
print(v1@v2)

If you want to do something fancier with two vectors, like multiplying a column by a row to obtain a matrix, you need to switch to 2D vectors by reshaping them.

When you reshape a tensor, you can leave one dimension unspecified (using -1), as it can be inferred automatically by torch.

In [None]:
v1=v1.reshape(-1,1)    # column vector
v2=v2.reshape(1,-1)    # row vector

print(v1.shape, v2.shape)
print(v1@v2)

In [None]:
print(v1.dot(v2))    # this doesn't work! dot works only on 1D tensors

Changing the shape of a tensor is a crucial operation in DL. To have an idea of its application, just think of RGB images, commonly used in Computer Vision.

These are $3\times H\times W$ tensors, where H and W stand for height and width of the image (in number of pixels). It is often needed to regard an image as a linearized (flattened array of pixels):

In [None]:
img=torch.rand(3,8,8)
img.reshape(3,64)    # note that reshaping is not in place, so this call does not change the actual shape of img
print(img.shape)

Very often (for instance when you have to pass an image to `matplotlib` for visualization), you need to change the shape of an image to $H\times W \times 3$. You may be tempted to do something like this:

In [None]:
new_img=img.reshape(8,8,3)

This piece of code runs seamlessly, since the dimensions are consistent with the original ones. However, it will not produce the expected behaviour.

In fact, `reshape` only modifies the shape of a tensor, without touching the way data are stored in memory, meaning that you would end up mixing data from different dimensions.

The right way to change the order of dimensions is to use `permute`, which accepts as argument the ordering of dimensions that you desire:

In [None]:
new_img=img.permute(1,2,0)
print(new_img.shape)

# Building Machine Learning models with PyTorch

## Linear regression

By using all the pieces we've seen till now, we can build our first ML model using PyTorch: a linear regressor, whose model is:

$$
y = XW + b
$$

which can also be simplified as:

$$
y = XW
$$

if we incorporate the bias $b$ inside $W$ and add to the $X$ a column of ones to the right.


We start by creating our data. We randomly sample $X$ as a $N\times P$ tensor, meaning that we have 1000 datapoints and 100 features and compute $y$ as:
$$
y=XM+\mathcal{N}(0,I)
$$
where $M$ is a randomly drawn projection vector (shape $P\times 1$, same as our weights).
We are adding some iid gaussian noise on the $y$ to avoid the interpolation regime, in which we could be fitting our data perfectly using a linear model.

In [None]:
import os

N=1000
P=100
X=torch.rand(N,P)
M=torch.rand(P,1)
y=X@M+torch.normal(torch.zeros(N,1),torch.ones(N,1))

if not os.path.exists("./data"):
    os.makedirs("./data")

torch.save(X, "./data/X_reg.pt")
torch.save(y, "./data/y_reg.pt")

We can add a column of ones to $X$ to include the bias:

In [None]:
X=torch.cat([X, torch.ones(N,1)], dim=1)

The regression can be fit with classical statistical methods such as Ordinary Least Squares, and the optimal $W$ has the form:

$$
W^*=(X^TX)^{-1}X^Ty
$$


In [None]:
W_star = ((X.T @ X).inverse()) @ X.T @ y

To assess the quality of this fit we can evaluate the Mean Squared Error (MSE) between the original $y$ and the prediction:

In [None]:
torch.nn.functional.mse_loss(X@W_star, y)

## The same linear model, but in PyTorch style

A linear model like the one we saw before is nothing more than an artificial neuron with no activation function.

We will now be exploring the second chunk of PT functionalities, namely the built-in structures and routines supporting the creation of ML models.

We can create the same model we have seen before using torch built-in structures, so we start to see them right away.

Usually, a torch model is a `class` inheriting from `torch.nn.Module`. Inside this class, we'll define two methods:
* the constructor (`__init__`) in which we define the building blocks of our model as class variables;
* the `forward` method, which specifies how the data fed into the model needs to be processed in order to produce the output.

Note for those who already know something about NNs: we don't need to define `backward` methods since we're constructing our model with built-in PT building blocks. PyTorch automatically creates a `backward` routine based upon the `forward` method.

Our model only has one building block (layer) which is a `Linear` layer.
We need to specify the size of the input (i.e. the coefficients $W$ of our linear regressor) and the size of the output (i.e. how many scalars it produces) of the layer. We additionaly request our layer to have a bias term $b$.

The `Linear` layer processes its input as $XW + b$, which is exactly the (first) equation of the linear regressor we saw before.



In [None]:
class LinearRegressor(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.regressor = torch.nn.Linear(in_features=P, out_features=1, bias=True)

    def forward(self, X):
        return self.regressor(X)

We can create an instance of our model and inspect the current parameters by using the `state_dict` method, which prints the building blocks of our model and their current parameters. Note that `state_dict` is essentially a dictonary indexed by the names of the building blocks which we defined inside the constructor (plus some additional identifiers if a layer has more than one set of parameters).

In [None]:
lin_reg = LinearRegressor()

for param_name, param in lin_reg.state_dict().items():
    print(param_name, param)

We can update the parameters via `state_dict` and re-using the same OLS estimates we obtained before.

Note that torch is thought of for Deep Learning: it does not have the routines to solve different ML problems (just use `sklearn` for this).

Next time, we'll see how we can unleash gradient-based iterative training routines in torch and compare the results w.r.t. the OLS estimators.

In [None]:
state_dict = lin_reg.state_dict()
state_dict["regressor.weight"] = W_star[:P].T
state_dict["regressor.bias"] = W_star[P]
lin_reg.load_state_dict(state_dict)

In [None]:
X_lin_reg = X[:,:P]
predictions_lin_reg = lin_reg(X_lin_reg) # equivalent to lin_reg.foward(X_lin_reg)
print(torch.nn.functional.mse_loss(predictions_lin_reg, y))

## Towards gradient-based training of our model

### Definition of a loss function

One key element that we need to train any neural network is a loss function, i.e. a function that quantifies how *good* is our fit to the data and that is differentiable w.r.t. the weights and biases of the network.

We saw some examples of common loss functions in the previous lecture, and all the main losses used in Deep Learning are already implemented and available in PyTorch, to cite some:

*   `torch.nn.MSELoss`
*   `torch.nn.CrossEntropyLoss`
*   `torch.nn.BCELoss`
*   `nn.KLDivLoss`

You can also define your own custom loss function, and as long as you use built-in torch functions to compute it (and you keep it differentiable), you should be fine.

For example, you could build your own MSE loss like this:


In [None]:
def mseloss(output, target):
    loss = torch.mean((output - target)**2)
    return loss

### Definition of a DataLoader object

To train any PyTorch model, it is useful to handle data through a `DataLoader` object. A `DataLoader` is an iterable wrapped around a `Dataset` object that allows to easily run through your data in batches.

Starting from a set of `Tensor`s representing features and labels, it is easy to define the `Dataset` and its corresponding `DataLoader`:

In [None]:
dataset=torch.utils.data.TensorDataset(X,y)

In [None]:
dataloader=torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=True)

In [None]:
X_0, y_0=next(iter(dataloader))

In [None]:
print(X_0, y_0)

For any specific need, you can build your own `Dataset` class. To make it work properly, you always have to implement three functions: `__init__`, `__len__` and `__getitem__`. More info on this [here](https://pytorch.org/docs/stable/data.html).