# Exercise 2
## Authors: E. Vercesi; A. Dei Rossi, S. Huber*, L. Scarciglia.

In this exercise session you are going to learn the basics of PyTorch and pandas. We will start with PyTorch.

[PyTorch](https://pytorch.org/) is a Python library for scientific computing (as much as NumPy), but which can additionally run on GPUs. 
Hence, this is the computing library of choice for Deep Learning applications. 
PyTorch is developed by Meta. You might have also heard of its main competitor TensorFlow (Google). Although both have basically the same functionalities, in this course we would like you to stick to PyTorch (assignments made using TensorFlow won't be evaluated).

If you haven't done Exercise 1 on NumPy yet, we highly encourage to do it first: NumPy and PyTorch offer a vast overlap of functionalities, so understanding NumPy first is going to boost greatly your understanding of PyTorch.
To begin with, make sure you have installed it. If not, please do so (by typing `conda/pip install torch` from your environment, or using the GUI of your IDE). Also make sure that the version installed is 2.8.

In addition to that, also install [pandas](https://pandas.pydata.org/docs/getting_started/install.html). We will use it at the end of the notebook.

In [1]:
import torch  # If you see errors, use conda or pip to install torch in your virtual environment.
import numpy as np

torch.manual_seed(42)  # manual seed is to ensure repeatability of random numbers. 

<torch._C.Generator at 0x7da8f292f1b0>

## Create tensors

Tensors are like NumPy arrays, but they can live in GPUs.

1. Create a tensor out of a Python list [1, 2, 3].
2. Create a tensor out of a NumPy array [[2, 3, 4], [4, 3, 2]] (see method [`.from_numpy()`](https://pytorch.org/docs/stable/generated/torch.from_numpy.html)).
3. Convert the tensor of point 2 back to a NumPy array. (see method [`.numpy()`](https://pytorch.org/docs/stable/generated/torch.Tensor.numpy.html)).

In [2]:
## 1: create a tensor out of a Python list
list1 = [1,2,3]
tensor1 = torch.tensor(list1)
print (tensor1)

## 2: create a tensor out of a NumPy array
arr1 = np.array ([[2,3,4],[4,3,2]])
tensor2 = torch.from_numpy(arr1)
print (tensor2)

## 3: Convert the tensor back to NumPy.
t= torch.tensor ([[2,3,4], [4,3,2]])
arr2 = t.numpy ()
print (arr2)

tensor([1, 2, 3])
tensor([[2, 3, 4],
        [4, 3, 2]])
[[2 3 4]
 [4 3 2]]


1. Check the `.dtype` attribute of the above created tensors. 
2. Create a tensor of size 3 with values [1, 2, 3] but forcing the dtype to be float64.

In [3]:
## 1: check the dtype attribute
print("1: ", tensor1.dtype)
print("2: ", tensor2.dtype)
print("3: ", arr2.dtype)

## 2: create [1, 2, 3] with dtype float64
t2 = torch.tensor ([1,2,3] , dtype=torch.float64)
print("4: ", t2.dtype)

1:  torch.int64
2:  torch.int64
3:  int64
4:  torch.float64


##### PyTorch also offers some more advanced functions that can be used to create well-known matrices:

1. Create an identity matrix of size (5, 5) (see [`torch.eye()`](https://pytorch.org/docs/stable/generated/torch.eye.html)).
2. Create a matrix of all zeros of size (3, 4) (see [`torch.zeros()`](https://pytorch.org/docs/stable/generated/torch.zeros.html).
3. Create a matrix of all ones of size (2, 3) (see [`torch.ones()`](https://pytorch.org/docs/stable/generated/torch.ones.html).
4. Given a tensor of size (3, 2) of your choice, create a matrix of the same size (3, 2) filled with ones (equivalently zeros) (see [`torch.zeros_like()`](https://pytorch.org/docs/stable/generated/torch.zeros_like.html).
5. Create a matrix of size (3, 4) filled with numbers from 0 to 11 inclusive (same as in NumPy). Try both [`torch.arange()`](https://pytorch.org/docs/stable/generated/torch.arange.html) and [`torch.linspace()`](https://pytorch.org/docs/stable/generated/torch.linspace.html).

In [4]:
## 1: torch.eye()
m1= torch.eye(5)
print("m1:\n", m1)

## 2: torch.zeros()
m2 = torch.zeros(3,4)
print("m2: \n", m2)

## 3: torch.ones()
m3 = torch.ones(2,3)
print("m3: \n", m3)

## 4: torch.zeros_like()
t3= torch.tensor ([[1,2],[2,3],[3,4]])
m4= torch.zeros_like(t3)
print("m4: \n", m4)
## 5:
## torch.arange()
t4=torch.arange (0,6)
m5=t4.reshape (2,3)     #funzione per cambiare forma perche torch.arange restituisce uno di dimensione 1d
print ("m5: \n", m5)

## torch.linspace()
t5=torch.linspace(0,6 , steps=6)
m6=t5.reshape (2,3)
print ("m6: \n:", m6)

m1:
 tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]])
m2: 
 tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
m3: 
 tensor([[1., 1., 1.],
        [1., 1., 1.]])
m4: 
 tensor([[0, 0],
        [0, 0],
        [0, 0]])
m5: 
 tensor([[0, 1, 2],
        [3, 4, 5]])
m6: 
: tensor([[0.0000, 1.2000, 2.4000],
        [3.6000, 4.8000, 6.0000]])


#### Random tensors

As in NumPy, you have a big choice of random distributions to sample you arrays from.
Create the same random arrays you did in NumPy in Exercise 1:
1) Create a random tensor of size 4 of uniform floating point numbers in the interval [0, 1). (see [`torch.rand`](https://pytorch.org/docs/stable/generated/torch.rand.html)).
torch rand ti mette gia numeri copresi tra 0 e 1 

2) Create a random tensor of size (3, 2) of uniform floating point numbers in the interval [0, 5). (hint: generate numbers in the interval [0, 1) and scale them up by 5).
3) Create a random tensor of size (2, 1, 2) of integers in the interval [10, 20]. (see [`torch.randint`](https://pytorch.org/docs/stable/generated/torch.randint.html), caraful with border conditions).
ATTENZIONE: Utilizzando torch.randint low è incluso [, high è escluso (mi riferisco al range di numeri che inserisco. 
   
4) Create a random tensor of size 10 over the normal distribution, mean 3 and std dev 2. (see [`torch.normal`](https://pytorch.org/docs/stable/generated/torch.normal.html)).

In [5]:
## 1: torch.rand()
t6 = torch.rand(4)
print ("t6:\n", t6)

## 2: torch.rand() scaled up to [0, 5)
m7 = torch.rand (3,2)
m7=m7*5
print ("m7: \n", m7)

## 3: torch.randint()
m8= torch.randint(10,20,(2,1,2))
print ("m8: \n", m8)

## 4: torch.normal()
t7 = torch.normal(mean=3.0, std=2.0, size=(10,))
print("t7:\n", t7)


t6:
 tensor([0.8823, 0.9150, 0.3829, 0.9593])
m7: 
 tensor([[1.9522, 3.0045],
        [1.2829, 3.9682],
        [4.7039, 0.6659]])
m8: 
 tensor([[[18, 14]],

        [[10, 14]]])
t7:
 tensor([3.7531, 2.6384, 3.7861, 3.8654, 0.2746, 5.7129, 4.3376, 1.5846, 2.3466,
        2.4424])


## Working with tensors' dimensions

In this section we will learn how to manipulate tensor's dimensions. Notice that they are extremely similar to NumPy methods: hence, if you have done exercise 1, this section should be quite straightforward.

### Access elements and slicing 

Create an identity matrix of size (4, 4) and access 
1. The element in position [0, 0].
2. The last element.
3. Element in position [2, 3].
Check that the returned elements are what you expect.

In [6]:
# Create the identity matrix
m9 = torch.eye(4)

## 1: access element in [0, 0]
print("elemento 0,0: ", m9[0][0])
## 2: access element in [3, 3]
print("elemento 0,0: ", m9[3][3])
## 3: access element in [2, 3]
print("elemento 0,0: ", m9[2][3])

elemento 0,0:  tensor(1.)
elemento 0,0:  tensor(1.)
elemento 0,0:  tensor(0.)


### Slicing

1. Create a random tensor of size (3, 4) of integers in the interval [5, 10].
2. Print the second row.
3. Print the third column.
4. Print the sub-matrix spanning from the second to the third row, from the second to the third column.

In [7]:
## 1: create a random tensor of size [3, 4].
mm1=torch.randint(5,10, (3,4))
## 2: print the second row.
print ("seconda riga: ", mm1[2,:])
## 3: print the third column.
print ("terza colonna: ", mm1[:,3])
## 4: sub-matrix
print("sub-matrix: ", mm1[1:3,1:3])

seconda riga:  tensor([7, 9, 7, 8])
terza colonna:  tensor([6, 7, 8])
sub-matrix:  tensor([[5, 6],
        [9, 7]])


### Access tensors' dimensions

1. Create a tensor $v$ of size (3, 4, 2, 4, 1) of random floats in [0, 1).
2. Print its shape. You can use both `.shape` and [`.size()`](https://docs.pytorch.org/docs/stable/generated/torch.Tensor.size.html), try them both.
3. Print its third dimension's size (2 in our example). Check [`.size()`](https://docs.pytorch.org/docs/stable/generated/torch.Tensor.size.html) function.
4. Print the number of dimensions of our vector (5 in our example). Check [`.dim()`](https://docs.pytorch.org/docs/stable/generated/torch.Tensor.dim.html#torch.Tensor.dim) or [`.ndim`](https://docs.pytorch.org/docs/stable/generated/torch.Tensor.dim.html#torch.Tensor.ndim).

In [8]:
## 1: create a random tensor v of size (3, 4, 2, 4, 1).
t8= torch.rand(3,4,2,4,1)
print ("t8", t8)

## 2: print v's shape using .shape and .size().
print ("t8 shape", t8.shape)
print ("t8 size", t8.size())

## 3: print the size of the third dimension of v.
print("t8 third dimension: ", t8.size(dim=2))

## 4: print the number of dimensions of v.
# Stampa il numero totale di dimensioni (commento di chat)
print("Numero di dimensioni:", t8.dim())
# oppure equivalente (commento di chat)
print("Numero di dimensioni (con .ndim):", t8.ndim)

t8 tensor([[[[[0.7886],
           [0.5895],
           [0.7539],
           [0.1952]],

          [[0.0050],
           [0.3068],
           [0.1165],
           [0.9103]]],


         [[[0.6440],
           [0.7071],
           [0.6581],
           [0.4913]],

          [[0.8913],
           [0.1447],
           [0.5315],
           [0.1587]]],


         [[[0.6542],
           [0.3278],
           [0.6532],
           [0.3958]],

          [[0.9147],
           [0.2036],
           [0.2018],
           [0.2018]]],


         [[[0.9497],
           [0.6666],
           [0.9811],
           [0.0874]],

          [[0.0041],
           [0.1088],
           [0.1637],
           [0.7025]]]],



        [[[[0.6790],
           [0.9155],
           [0.2418],
           [0.1591]],

          [[0.7653],
           [0.2979],
           [0.8035],
           [0.3813]]],


         [[[0.7860],
           [0.1115],
           [0.2477],
           [0.6524]],

          [[0.6057],
           [0.3725

## Permute dimensions

You can invert the order of the dimensions of a tensor. Create a random tensor of integers in the interval [0, 10) of size (2, 3, 4) and permute its dimensions so that the final size is (4, 2, 3). See [`torch.permute`](https://pytorch.org/docs/stable/generated/torch.permute.html).

Commento: qui ssto invertendo l'ordine delle dimensioni, per averlo in dimensione (4,2,3) devo indicargli l'ordine di disposizione di queste dimensioni!! Non direttamente le dimensioni!!

In [9]:
## 1: Create a random tensor. Check its shape (2, 3, 4)
t9= torch.randint (0,10,(2,3,4))
print("t9: ", t9, "\nShape: ", t9.shape)

## 2: Permute its dimensions. Check its shape (4, 2, 3)
t10= torch.permute(t9, (1,2,0))
print("t10: ", t10, "\nShape: ", t10.shape)


t9:  tensor([[[5, 9, 5, 2],
         [0, 8, 8, 7],
         [2, 2, 6, 8]],

        [[2, 7, 6, 3],
         [0, 3, 6, 0],
         [3, 7, 3, 8]]]) 
Shape:  torch.Size([2, 3, 4])
t10:  tensor([[[5, 2],
         [9, 7],
         [5, 6],
         [2, 3]],

        [[0, 0],
         [8, 3],
         [8, 6],
         [7, 0]],

        [[2, 3],
         [2, 7],
         [6, 3],
         [8, 8]]]) 
Shape:  torch.Size([3, 4, 2])


## Squeeze/unsqueeze

If you want increase the number of dimensions of your vector (similar to `np.newaxis`, this might turn useful in the context of broadcasting), you can use [`torch.unsqueeze`](https://pytorch.org/docs/stable/generated/torch.unsqueeze.html). If you want to reduce the number of dimensions of your vector by dropping dimensions of size 1 you can use [`torch.squeeze`](https://pytorch.org/docs/stable/generated/torch.squeeze.html) instead.

1. Create a random tensor uniform in [0, 1) of size (2, 2). Insert a new dimension so that the final shape is (2, 1, 2).

Torch.unsqueeze aumenta gia di una dimensione il tensore,con dim=0,1..n io gli sto dicendo dove inserirla!!

2. Add a dimension to the tensor of point 1, so that the final shape is (2, 1, 2, 1). Try to use negative indices as the argument of `torch.unsqueeze()`.
3. Turn the tensor back to its original shape (2, 2) by using `torch.squeeze()`.

In [10]:
## 1: Create a tensor of size (2, 2). Unsqueeze it so that its final shape is (2, 1, 2)
t10= torch.rand(2,2)
print(t10)
t11=torch.unsqueeze(t10, dim=1) 
print(t11)
## 2: Add an additional dimension to the tensor so that its shape is (2, 1, 2, 1). Use negative indices
t12=torch.unsqueeze(t11, dim=-1) #con dim =-1 mi riferisco all'ultima posizione del tensore, t=-2 alla penultima ecc...
print(t12)
## 3: Turn the tensor back to shape (2, 2)
t13= torch.squeeze(t12, dim=-1)
t13=torch.squeeze(t13 , dim=1)
print(t13)

tensor([[0.2709, 0.9295],
        [0.6115, 0.2234]])
tensor([[[0.2709, 0.9295]],

        [[0.6115, 0.2234]]])
tensor([[[[0.2709],
          [0.9295]]],


        [[[0.6115],
          [0.2234]]]])
tensor([[0.2709, 0.9295],
        [0.6115, 0.2234]])


## Concatenate and stack

If you have two tensors of compatible sizes, you can merge them into a unique tensor along one of their axes.
In order to get some intuition, think about having 2 2-dimensional tensors of size (3, 4). You can merge them along the first axis and get the final shape be (6, 4), or you can merge them along the second axis and get the final shape to be (3, 8), or you can go in 3D stacking one over the other (along the z-axis) and get a shape of (2, 3, 4). 

This is precisely what [`torch.concat`](https://pytorch.org/docs/stable/generated/torch.cat.html#torch.cat) (also called `.cat`) and [`torch.stack`](https://pytorch.org/docs/stable/generated/torch.stack) do. 
You should already be familiar with NumPy `axis` attribute. In PyTorch it is called `dim`. If you try using `axis` instead of `dim`, PyTorch allows you to do so. Though it is not recommended, since it is not written in the official documentation.

1. Concat $v$ and $w$ along the first dimension. Check that the final shape is (6, 4).
2. Concat $v$ and $w$ along the second dimension. Check that the final shape is (3, 8).
3. Concat $v$ and $w$ along a new dimension. Check that the final shape is (2, 3, 4).

In [11]:
v = torch.randint(0, 10, (3,4))
w = torch.randint(0, 10, (3,4))

## 1: concat along first dimension.
y = torch.cat((v,w) , dim=0)
print("y", y)
print(y.shape)

## 2: concat along second dimension.
z= torch.cat((v,w), dim=1)
print("z: ", z)
print(z.shape)
## 3: concat along new dimension.  per aggiungere una nuova dimensione devo usare stack, dim indica sempre dove aggiungere la nuova dimensione
g= torch.stack((v,w), dim=0)
print("g: ", g)
print(g.shape)

y tensor([[5, 4, 8, 1],
        [7, 9, 2, 6],
        [8, 2, 8, 2],
        [1, 0, 0, 6],
        [5, 0, 6, 9],
        [2, 4, 2, 0]])
torch.Size([6, 4])
z:  tensor([[5, 4, 8, 1, 1, 0, 0, 6],
        [7, 9, 2, 6, 5, 0, 6, 9],
        [8, 2, 8, 2, 2, 4, 2, 0]])
torch.Size([3, 8])
g:  tensor([[[5, 4, 8, 1],
         [7, 9, 2, 6],
         [8, 2, 8, 2]],

        [[1, 0, 0, 6],
         [5, 0, 6, 9],
         [2, 4, 2, 0]]])
torch.Size([2, 3, 4])


### Reshape and View

PyTorch lets you reshape your tensors, by keeping the same data, but re-arranging its dimensions.

Create a random vector of size (2, 3, 4), and make a new vector of size (6, 4) using [``torch.reshape``](https://docs.pytorch.org/docs/stable/generated/torch.Tensor.reshape.html) and [``torch.view``](https://www.geeksforgeeks.org/python/how-does-the-view-method-work-in-python-pytorch/).

- `torch.reshape` works on all vectors, but it might be less efficient when working on contiguous data.
- `torch.view` only works on contiguous data (see [``torch.is_contiguous``](https://docs.pytorch.org/docs/stable/generated/torch.Tensor.is_contiguous.html)), but it is more efficient.
- 
*Un tensore non contiguo è un tensore che è stato permutato, cioè significa che ha cambiato le sue dimensioni*

In general, ``torch.reshape`` is the safest option to use.

In [12]:
## 1: create a random tensor of size (2, 3, 4)
g1= torch.rand(2,3,4)

## 2: reshape it so that its size is (6, 4) using torch.reshape
g2= torch.reshape (g1, (6,4))
print (g1, "\n", g2)
# Is v altered by reshape?

## 3: reshape v so that its size is (6, 4) using torch.view
# Is v altered by view?
g3=g1.view(6,4)
print (g3)

## 4: as a final step, try to make a tensor not contiguous: e.g., permuting its dimensions
v = torch.rand(2, 3, 4)

# Permute v's dimensions by using torch.permute, so that the final shape is (3, 2, 4). Do it inplace.
v= v.permute(1,0,2)
# Make sure that v is not contiguous anymore.
print ("v is contiguous: ", v.is_contiguous())
# print(v.view(6, 4)) would yield an error.

#v=v.view(6,4)        chiaramenta da errore perchè non è contiguo


tensor([[[0.4505, 0.3881, 0.5073, 0.4701],
         [0.6202, 0.6401, 0.0459, 0.3155],
         [0.9211, 0.6948, 0.4751, 0.1985]],

        [[0.1941, 0.0521, 0.3370, 0.6689],
         [0.8188, 0.7308, 0.0580, 0.1993],
         [0.4211, 0.9837, 0.5723, 0.3705]]]) 
 tensor([[0.4505, 0.3881, 0.5073, 0.4701],
        [0.6202, 0.6401, 0.0459, 0.3155],
        [0.9211, 0.6948, 0.4751, 0.1985],
        [0.1941, 0.0521, 0.3370, 0.6689],
        [0.8188, 0.7308, 0.0580, 0.1993],
        [0.4211, 0.9837, 0.5723, 0.3705]])
tensor([[0.4505, 0.3881, 0.5073, 0.4701],
        [0.6202, 0.6401, 0.0459, 0.3155],
        [0.9211, 0.6948, 0.4751, 0.1985],
        [0.1941, 0.0521, 0.3370, 0.6689],
        [0.8188, 0.7308, 0.0580, 0.1993],
        [0.4211, 0.9837, 0.5723, 0.3705]])
v is contiguous:  False


## Broadcasting

Same as in NumPy, also PyTorch tensors allow [broadcasting](https://pytorch.org/docs/stable/notes/broadcasting.html).
When performing element-wise operations (like sums) on two tensors of mismatching sizes, the smaller tensor can adapt to the size of the larger tensor in case these simple rules apply:

- Each tensor has at least one dimension.
- When iterating over the dimension sizes, starting at the trailing (right-most) dimension, the dimension sizes must be
    -  equal
    -  one of them is 1
    -  one of them does not exist.

Let us see an example:

Assume you have $v = [[1, 2, 3], [4, 5, 6]]$ shape (2, 3) and $w=[3, 2, 1]$ shape (3). If we want to perform $v + w$ (element by element sum), it is clear that the dimensions don't match, but with the help of broadcasting we can still do it: $w$ is simply enlarged to reach size (2, 3) by copying itself twice on the first axis. Then, it is possible to perform element by element sum $v+w$.

Let's put broadcasting in practice:

1. Perform the above described example $v+w$ using tensors, check that the result size is (2, 3) and that numbers add up.
2. $r = [[1, 2], [3, 4], [5, 6]]$ and $l=[1, 2, 3]$. Compute $r + l$. It should raise errors. Why?
3. Adjust the size of $l$ in example 2 so that the sum works. What size should $l$ have in order for broadcasting to work on $r + l$?
4. Create random integers tensors $s$ of size (2, 1, 3, 1) and $t$ of size (1, 3, 1, 3). Does broadcasting work here in order to compute $s+t$? In case it does, predict the final shape of the result. 
 
 

In [13]:
## 1: compute v + w

## 2: compute r + l. It doesn't work, why?

## 3: adjust the size of l, and compute r + l

## 4: compute s + t


## PyTorch functions

In this section we are going to learn the basic functions of PyTorch.

### Mean, min, max, sum ...

These functions are quite self-explanatory, and they work the same way as in NumPy. The only detail we ought to pay attention to is the axis we want to perform the function along (in NumPy it was called `axis`, in PyTorch `dim`).

Create a random tensor $v$ of ints of size (3, 2, 4) and print it.

In order to be sure you have understood what is going on, try to predict the result and then check that your prediction is wrong/correct.

1. Compute the min value in the entire tensor.
2. Compute the max value along axis 0.
3. Compute the min along axis 1.
4. Multi-dimensional axes: take the sum over axes (0, 1). 


In [14]:
# Create v of shape (3, 2, 4)


In [15]:
## 1: Compute the min value of v.

## 2: Compute the max along axis 0.

## 3: Compute the min along axis 1.

## 4: Compute the sum over axes (0, 1)

### [`torch.dot`](https://docs.pytorch.org/docs/stable/generated/torch.dot.html#torch.dot), [`torch.matmul`](https://docs.pytorch.org/docs/stable/generated/torch.matmul), *

Unlike NumPy, Torch has a stricter policy on these operands:

- `*`: is the Hadamard product. I.e., element-wise product.
- [`torch.dot`](https://docs.pytorch.org/docs/stable/generated/torch.dot.html#torch.dot): only used to compute the dot product of two 1-dimensional tensors. Remember how confusing the dot product between multi-dimension NumPy vectors is (see Exercise 1)? PyTorch avoids this issue by simply forbidding the dimension of the input tensors to be greater than 1.
- [`torch.matmul`](https://docs.pytorch.org/docs/stable/generated/torch.matmul): or its alias `@` computes the matrix product. Can be used for larger than 2-dimensional tensors (it applies broadcasting, as much as in NumPy). Notice that the complexity of multiplying two $n\times n$ matrices is $O(n^3)$. We will take advantage of its relatively high time-complexity in order to show how much faster are GPUs with respect to CPUs later on in the notebook.

1. Create two random integer tensors $A$ and $B$ of compatible sizes and perform their Hadamard product (element by element product). Try these sizes (predict whether they work or not):
    - $A$ size (3, 4), $B$ size (3, 4).
    - $A$ size (3, 4), $B$ size (4, 4).
    - $A$ size (3, 4), $B$ size (1, 4).
2. Create two random 1-dimensional tensors $v, w$ and compute their dot product. If you can use multiple ways to compute it, check that indeed they return the same value.
3. Create $C$ of size (3, 4) and $D$ of size (4, 3). Compute the matrix product. Are the sizes compatible?
4. Create $E$ of size (3, 3) and $F$ of size (4, 3). Compute the matrix product. Are the sizes compatible? If not, use the transpose operator [`torch.t()`](https://docs.pytorch.org/docs/stable/generated/torch.t.html#torch.t) to adjust the dimensions of one of the two matrices and compute the matrix product.




In [16]:
## 1: Create A, B and perform hadamard product
# A (3,4), B (3,4)

# A (3,4), B (4,4)

# A (3,4), B (1,4)

## 2: Create 1-dimensional tensors v, w and compute their dot product.

## 3: Compute matrix product of C and D.

## 4: adjust dimensions using .T, and compute the matrix product E @ F

## Gradients

One of the useful features of PyTorch (that NumPy doesn't have) is that it is possible to compute automatically the gradient of functions. 
As you will see, the gradient of a function is one of the key ingredients of the backpropagation algorithm, used to train neural nets.
This is also one of the reason why PyTorch is so ubiquitous to neural nets applications.

Assume we have tensor $x = [2], y = [2]$. We have $z = 2x^2 + 3y = [14]$.

We know that $\frac{\delta z}{\delta x} = 4x$, $\frac{\delta z}{\delta y} = 3$. Since we are evaluating the point $x=2, y=2$, we get that the gradient is (8, 3). The gradients are going to be stored in `x.grad` and `y.grad` if we specify the option `requires_grad=True`. We can let PyTorch compute the gradients by invoking `z.backward()`. Check that indeed `x.grad` and `y.grad` hold the desired values.


In [17]:
x = torch.tensor([2], dtype=torch.float64, requires_grad=True)
y = torch.tensor([2], dtype=torch.float64, requires_grad=True)
z = 2 * x*x + 3 * y
z.backward()
print(x.grad)
print(y.grad)

tensor([8.], dtype=torch.float64)
tensor([3.], dtype=torch.float64)


1. Create tensors $s = [1]$ and $t = [1]$, define a new tensor $w = 5s + 6$ and compute its gradient. What is the gradient associated to $t$? (Notice that $w$ does not depend on $t$). 
2. What happens if we try to define an integer tensor with `requires_grad=True`?
3. What happens if we call `numpy()` on a tensor that has `requires_grad=True`?

In [18]:
## 1: gradient of t for w = 5s + 6.

## 2: integer tensor with requires_grad.

## 3: compute .numpy() of a tensor with requires_grad.

In this section we point out a very important feature of gradients, namely that they are *cumulative*! In order to see what does that mean, let's see in practice the example that was given in class:

1. Create tensors $x=[2], y=[3]$ (with flag `requires_grad=True`).
2. Compute $z = x * x + y$ and perform the backward pass.
3. Check that the gradients are as expected: $\frac{\delta z}{\delta x}=2x=4$, $\frac{\delta z}{\delta y} = 1$.
4. Compute $g = xy + 3x$ and perform che backward pass.
5. Check out the gradients: $\frac{\delta g}{\delta x}=y + 3=6$, $\frac{\delta g}{\delta y} = x = 2$.
6. Do you see the expected value? Can you explain why? (hint: gradients are *cumulative*).
7. In order to fix this potential issue, use [`torch.zero_grad()`](https://docs.pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html) in between the computation of $z$ and $g$. Do you observe the expected gradient now?

In [19]:
## 1: create x, y.

## 2: compute z.

## 3: check out gradients of x, y.

## 4: compute g.

## 5: check out gradients of x, y

## 7: Repeat 1-5 using torch.zero_grad()


## Device (GPU vs CPU)

In this section we will learn how do computation using the GPU instead of the CPU: notice that this is the main reason why, in Deep Learning applications, PyTorch is used over NumPy.

By default, tensors are accessed by the CPU. You can check it easily using the [`.device()`](https://pytorch.org/docs/stable/tensor_attributes.html#torch.device) method.
1) Create an identity matrix of size (4, 4) and access its device attribute.

In [20]:
## 1: see .device of a matrix

v = []  # create a tensor

Hence, every time we want to use the GPU, we need to explicitly move the tensors to the desired device. Careful here: your laptop doesn't necessarily have a dedicated GPU. And, even if it has one, it might not be compatible with CUDA (the NVIDIA interface that allows computations to be performed on the GPU).

You can check if CUDA is available on your machine by simply using [`cuda.is_available()`](https://pytorch.org/docs/stable/generated/torch.cuda.is_available.html).

In [21]:
torch.cuda.is_available()

False

If the above returns False, it could be either because you didn't install correctly CUDA, or because you laptop doesn't have a GPU compatible with it. 
If you have a MacBook with Apple Silicon processors, you can still use the device `mps`:

In [22]:
# For mac M1/2/3/* users
torch.backends.mps.is_available()

False

We can set the device to one of these three options:
- `cuda` (if you have a NVIDIA graphics card). Might be `cuda:0` etc if you have more than one.
- `mps` (if you have a MacBook with M1/2/3/* processor)
- `cpu` otherwise

If your laptop doesn't have any of the above-mentioned devices apart from the CPU, you can use Kaggle's notebooks: they offer free hours of GPU per week (they count the hours the kernel is running, not if you are actually using the notebook. Hence, remember to shut the kernel down when you don't use it!!!)

If you are using Kaggle platform for your projects (we recommend you to do that), you have at your disposal 30h/week of free GPUs: in order to activate it, you need to open a notebook, go to settings -> accelerator and you can select a GPU from there. If the GPU options are non-clickable, it is because you have to verify you account using your phone number. Go to home -> your picture (top right border) -> settings -> phone verification. Before the options become actually clickable you will need to wait a few minutes (<5').

In [23]:
device = torch.device('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
device

device(type='cpu')

Finally, move the tensor `v` you created earlier to the most convenient device. Use function [`.to`](https://pytorch.org/docs/stable/generated/torch.Tensor.to.html). Careful: is it an in-place method? Check that the device is indeed correct.

In [24]:
# Move vector v to the correct device. Check it is indeed on the desired device.

You can also create a tensor and send it directly to the correct device. 
1. Create a tensor of ones of size (3, 3) and specify in its constructor the `device` attribute. Check that, indeed, the tensor has been initialized with the correct device.

In [25]:
## 1: Create a tensor and initialize it to the correct device.

### GPUs vs CPUs

Now, we prove empirically that GPUs are much faster than CPUs at doing large calculations.

Create large tensors $G, H$ both of size (15000, 15000). Take their matrix product and measure how long it takes (use [`%%time`](https://ipython.readthedocs.io/en/9.2.0/interactive/magics.html) cell magic notebook function).

In [26]:
device = torch.device('cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu')
device

device(type='cpu')

In [27]:
# Create E and F
G = torch.rand(15000, 15000)
H = torch.rand(15000, 15000)

In [28]:
%%time
G @ H

CPU times: user 1min 43s, sys: 1.42 s, total: 1min 44s
Wall time: 52.4 s


tensor([[3759.8022, 3755.5754, 3744.8298,  ..., 3729.0820, 3764.5232,
         3711.3091],
        [3784.5820, 3759.7922, 3757.6125,  ..., 3750.5540, 3778.2576,
         3745.9832],
        [3794.6638, 3795.0481, 3763.6118,  ..., 3769.0742, 3795.4316,
         3748.4395],
        ...,
        [3794.7727, 3776.9773, 3771.3870,  ..., 3764.1436, 3790.4336,
         3733.5981],
        [3785.4829, 3765.7024, 3772.3264,  ..., 3755.4990, 3798.6868,
         3737.8606],
        [3789.7456, 3769.3162, 3766.6492,  ..., 3745.2805, 3793.4749,
         3727.2974]])

Move $E$ and $F$ to the most convenient device at your disposal (different from CPU, if possible), and compute the same matrix product.

In [29]:
# Move the tensors to GPU in another cell, so that the time is not counted.
G = G.to(device)
H = H.to(device)

In [30]:
%%time
G @ H

CPU times: user 1min 42s, sys: 1.37 s, total: 1min 44s
Wall time: 52.2 s


tensor([[3759.8022, 3755.5754, 3744.8298,  ..., 3729.0820, 3764.5232,
         3711.3091],
        [3784.5820, 3759.7922, 3757.6125,  ..., 3750.5540, 3778.2576,
         3745.9832],
        [3794.6638, 3795.0481, 3763.6118,  ..., 3769.0742, 3795.4316,
         3748.4395],
        ...,
        [3794.7727, 3776.9773, 3771.3870,  ..., 3764.1436, 3790.4336,
         3733.5981],
        [3785.4829, 3765.7024, 3772.3264,  ..., 3755.4990, 3798.6868,
         3737.8606],
        [3789.7456, 3769.3162, 3766.6492,  ..., 3745.2805, 3793.4749,
         3727.2974]])

Side note: on my laptop (MacBook) I noticed a performance improvement by $\approx\times 10$. On Kaggle the performance improvement is much larger (from >20'' to <<1').

At the end of this task, if you are using Kaggle, remember to shut down your kernel (or Kaggle will continue charging you GPU time).

# A super quick introduction to pandas

[Pandas](https://pandas.pydata.org/) is the equivalent of a Python database (but it is also a "data analysis and manipulation tool"). Or Python's version of Excel, if you prefer.

Install it in your environment (using your IDE's GUI, or by typing in your command line `conda install pandas`).

Next, we will see the most basic usage of pandas: making queries and modifying your data. Keep in mind that pandas does much more than that: see this simple [cheatsheet](https://pandas.pydata.org/docs/user_guide/10min.html) for reference.

It is best practice, in pandas, to use the shortcut `pd` on import.

In [31]:
import pandas as pd

A table in pandas is called `pd.DataFrame`. Single columns, instead, are called `pd.Series` (a `pd.Series` is the pandas equivalent of a `torch.tensor` or `np.array`).
You can create a table using some data in a dictionary, or importing it from any well-formatted file (see, e.g., [`pd.read_csv`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv)).

In [32]:
data = {
    "Name": ["Alice", "Bob", "Charlie", "Diana", "Ethan"],
    "Department": ["HR", "IT", "Finance", "IT", "HR"],
    "Salary": [50000, 60000, 55000, 65000, 52000],
    "Years_at_Company": [2, 5, 3, 7, 1],
}

# Create a DataFrame
df = pd.DataFrame(data, index=['a', 'b', 'c', 'd', 'e'])
df

Unnamed: 0,Name,Department,Salary,Years_at_Company
a,Alice,HR,50000,2
b,Bob,IT,60000,5
c,Charlie,Finance,55000,3
d,Diana,IT,65000,7
e,Ethan,HR,52000,1


As you see, the data is nicely displayed in a table. The additional column (without a title) is the index: namely, the name of every row. By default it is an increasing number, but you can change it to be anything (in our case, we used letters).

## Queries

You can select columns using the usual `[]` notation (this returns a `pd.Series`):

In [33]:
df['Department']

a         HR
b         IT
c    Finance
d         IT
e         HR
Name: Department, dtype: object

Or, multiple columns with the ``[[]]`` notation (this returns a `pd.DataFrame`):

In [34]:
df[['Department','Salary']]

Unnamed: 0,Department,Salary
a,HR,50000
b,IT,60000
c,Finance,55000
d,IT,65000
e,HR,52000


You can select rows (as much as in SQL) according to any condition: it is enough to create a boolean vector with the rows to select, and use the following syntax:

In [35]:
df[[True, True, False, False, True]]

Unnamed: 0,Name,Department,Salary,Years_at_Company
a,Alice,HR,50000,2
b,Bob,IT,60000,5
e,Ethan,HR,52000,1


You can create boolean vectors according to any condition using element-wise boolean operators:

In [36]:
df['Salary'] > 53000

a    False
b     True
c     True
d     True
e    False
Name: Salary, dtype: bool

Putting everything together, write a query to select the names of the people that earn more than 53000.

In [37]:
# Write a query to select the names of the people that earn more than 53000.
df

Unnamed: 0,Name,Department,Salary,Years_at_Company
a,Alice,HR,50000,2
b,Bob,IT,60000,5
c,Charlie,Finance,55000,3
d,Diana,IT,65000,7
e,Ethan,HR,52000,1


# [`pd.loc`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) and [`pd.iloc`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html) functions

Pandas offers additional ways to select rows and columns:

- [`pd.loc`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html): allows you to select rows and columns by the index value.
- [`pd.iloc`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html): allows you to select rows and columns by the integer position. It is actually deprecated.

You can access the second row by either using the index value `'b'` (and function `.loc`) or by integer position 1 (and function `.iloc`) as in the following example.

In [38]:
df.loc['b']

Name                  Bob
Department             IT
Salary              60000
Years_at_Company        5
Name: b, dtype: object

In [39]:
df.iloc[1]

Name                  Bob
Department             IT
Salary              60000
Years_at_Company        5
Name: b, dtype: object

If you read more carefully the documentation of the two functions, you will see that, indeed, both [`pd.loc`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) and [`pd.iloc`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html) allow more flexibility like selecting intervals, arrays of booleans and so on.

## Apply functions

Anothe common feature of pandas, is applying functions element-wise to columns: see below for an example.

In [40]:
# Double every salary:
df['Salary'].apply(lambda x: x * 2)

a    100000
b    120000
c    110000
d    130000
e    104000
Name: Salary, dtype: int64

What is [`lambda`](https://www.w3schools.com/python/python_lambda.asp) above?

As a final exercise, write a query for the names of the employees with a salary above 51000 and below 56000 using the `.apply` operator.

In [41]:
# Write a query to select the names of the people that earn a salary between 51000 and 56000. Use the .apply() operator.
df


Unnamed: 0,Name,Department,Salary,Years_at_Company
a,Alice,HR,50000,2
b,Bob,IT,60000,5
c,Charlie,Finance,55000,3
d,Diana,IT,65000,7
e,Ethan,HR,52000,1
