# Tensors

**Note :** to use this notebook in Google Colab, create a new cell with
the following line and run it.

``` shell
!pip install git+https://gitlab.in2p3.fr/jbarnier/ateliers_deep_learning.git
```

Tensors are one of the basic data structures in pytorch. Basically they
are numerical arrays that can be processed by different type of devices
(CPU, GPU…).

A tensor can be created from a Python list.

In [None]:
import torch

python_list = [3.0, 5.0, -4.0]
x = torch.tensor(python_list)
x

Computations on tensors are *vectorized*, which means that operations
are performed on the entire tensor at once. For example, adding a value
to a tensor will add it to each of its elements.

In [None]:
x = torch.tensor([3.0, 5.0, -4.0])
print(x + 4)

In [None]:
y = torch.tensor([1.0, 2.0, 3.0])
print(x - y)

Pytorch provides numerous functions to compute on tensors. In general
they can be called either as a pytorch function or as a tensor method.

In [None]:
print(x.mean())
print(torch.mean(x))

**Exercise 1**

A temperature in Fahrenheit can be converted in Celsius with the
following formula:

$$T_{\textrm{Celsius}} = (T_{\textrm{Farhenheit}} - 32) \times \frac{5}{9}$$

Create a Python function called `fahrenheit_to_celsius` which takes a
Fahrenheit temperatures tensor as input and returns its value in
Celsius. Apply the function to a tensor with the values
`[0, 32, 50, 100]`.

## Tensors gradients

When creating a tensor, if we specify `requires_grad=True` then every
object created by applying a `torch` operation to it will itself be a
tensor which keeps track of the functions to apply to compute the
gradient of these operations.

In [None]:
w1 = torch.tensor(1.0, requires_grad=True)
w1_add = w1 + 4
w1_add


In [None]:
w2 = torch.tensor(2.0, requires_grad=True)
step1 = w2 + 4
step2 = torch.square(step1)
print(step1)
print(step2)


Input tensor gradients can then be computed by applying the `backward()`
method to an output tensor. They are stored as the `grad` attribute of
the input tensors.

In [None]:
w3 = torch.tensor(1.0, requires_grad=True)
w3_mult = 2 * w3 + 1
w3_mult.backward()
print(w3.grad)


This result can be read as “the gradient value of the function which
computes `w3_mult` from `w3`, when `w3` equals 1”. Here the function is
`2*w3 + 1`, so its gradient is always 2.

If several functions are applied to a tensor, pytorch will keep track of
them and compute the gradient value of the original tensor accordingly.

In [None]:
step2.backward()
print(w2.grad)

This result can be seen read as “the gradient value of the function
which computes `step2` from `w2`, when `w2` equals 2”. Here the function
is `(w2 + 4)²`, so its gradient function is `2*w2 + 8`, and the value of
this function when `w2` equals 2 is 12.

**Exercise 2**

Using tensors, compute the gradient of the function $1 /\log{x}$ when
$x$ is 10.

## Gradient descent

Imagine we have a tensor `x` of 6 numerical values.

In [None]:
x = torch.tensor([-5.0, -2.0, 1.0, 3.0, 5.0, 15.0, 18.0])

Using a predefined function, we plot these values along an axis.

In [None]:
from adl.tensors import plot_points1d

plot_points1d(x)

Now suppose we want to find the value of a parameter `w` for which the
sum of the squared distances between the values of `x` and `w` is
minimal.

For example, we could start with an arbitrary value of `w` at zero.

In [None]:
plot_points1d(x, 0)

To see if our current value of `w` is minimal, we can compute the
gradient of the function that computes the sum of squared distances
between `x` values and `w` when `w` equals 0.

We already saw how to do that with pytorch:

In [None]:
w = torch.tensor(0.0, requires_grad=True)
y = (x - w).square().sum()
y.backward()
print(w.grad)


As we will do this computation several times, we will create a new
function `eval_w_squared`.

In [None]:
def eval_w_squared(value):
    w = torch.tensor(value, requires_grad=True)
    y = (x - w).square().sum()
    y.backward()
    print(f"sum of squared distances: {y}")
    print(f"gradient: {w.grad}")


eval_w_squared(0.0)

So the sum of squared distances when $w=0$ is 613.0, and the gradient
value of the sum of squared distances function when $w=0$ is -70. This
tells us two things:

1.  we are not at an optimum value (because the gradient is not equal to
    0)
2.  as the gradient is negative, this means that if we increase `w` a
    little bit, the value of the sum of squared distances will decrease

We want our sum of squared distances to be minimal, so we will try with
a greater `w` value, say 2.

In [None]:
plot_points1d(x, 2)
eval_w_squared(2.0)

When `w` equals 2, the sum of squared distances is lower, but the
gradient value is still negative, so we try again with a greater value.

In [None]:
plot_points1d(x, 6)
eval_w_squared(6.0)

The sum of squared distances is lower, but the gradient is now positive.
This means that if we want our sum of squared distances to be lower, we
have to decrease the value of `w`. Let’s try with 5:

In [None]:
plot_points1d(x, 5)
eval_w_squared(5.0)

Now our gradient is equal to 0, this means that we are at a local
optimum. In fact, 5 is the mean of `x`, which is by definition the value
which minimizes the sum of squared distances.

In [None]:
x.mean()

What we just did here, using gradient values to find a minimum value
iteratively, is called a *gradient descent*.

**Exercise 3**

Create a function `eval_w_abs` which is the same as `eval_w_squared`
except that it computes the sum of the absolute values of the
differences between `x` elements and `w`.

Use this function to do a gradient descent and find the value of `w`
that minimizes the sum of the absolute values of differences.

What statistical function could be used to find this value directly?

## Minimizing a loss function

In machine learning or deep learning, a frequent goal is to predict
values from input data by adjusting the parameters of a model.

For example, the two following python lists give the average temperature
by month at the Lyon-Bron weather station in 1924 and in 2024 (source
[infoclimat](https://www.infoclimat.fr/stations-meteo/analyses-mensuelles.php?mois=12&annee=2024)).

In [None]:
lyon1924 = [3.1, 1.3, 7.7, 11.0, 15.7, 18.0, 20.6, 16.7, 16.2, 11.9, 7.3, 3.4]
lyon2024 = [5.3, 8.9, 10.9, 12.5, 15.9, 20.5, 23.3, 24.3, 17.4, 15.8, 8.7, 3.8]

Our objective is to predict a 2024 temperature from a 1924 temperature.
For this prediction we will use a very simple model: we will add a fixed
value to every 1924 temperature in order to be as close as possible to
the ones of 2024.

With a more formal notation:

-   $x$ is our **input data**, *ie* the monthly 1924 temperatures
-   $y$ is the **true values** or **target values** we want to predict,
    *ie* the monthly 2024 temperatures
-   Our model is $y =x + w$, where $w$ is our unique **model parameter**
-   We want $w$ to be the value which minimizes the distance between our
    predictions and the true values

We will start with a $w$ value of 0.

In [None]:
x = torch.tensor(lyon1924)
y = torch.tensor(lyon2024)
w = torch.tensor(0.0, requires_grad=True)

We can compute what our predicted values would be after applying our
model, *ie* after computing $x + w$. These values are called the
*predictions*. We want them to be as close as possible to the target
values in $y$, so we compute a distance between $x + w$ and $y$ by
summing the squared values of the distance between their elements.

We call this distance the **loss** function, the one we want to
minimize.

Here is the loss value for $w = 0$.

In [None]:
y_pred = x + w
loss = torch.sum(torch.square(y_pred - y))
print(f"loss: {loss}")

As we have already seen, with pytorch we can apply `backward` to our
loss result and `w` will then have a new `grad` attribute: this
attribute is the gradient value of our loss function when `w` equals 0.

In [None]:
loss.backward()
print(f"Gradient value: {w.grad.item()}")  # type: ignore

As we will repeat them several times, we will put these three steps
(computing the predictions, the loss value and the gradient) in a
function.

In [None]:
def eval_weight(w):
    w = torch.tensor(w, requires_grad=True)
    y_pred = x + w
    loss = torch.sum(torch.square(y_pred - y))
    loss.backward()
    print(f"loss: {loss}, gradient: {w.grad.item()}")  # type: ignore


eval_weight(0.0)

As seen above, the gradient value gives the direction in which $w$ must
go for the loss to raise. In this case, the gradient is negative so if
we decrease $w$, the loss will increase. As we want to minimize the
loss, we want to go **in the opposite direction** of the gradient, and
thus we want to increase $w$.

Let’s try with $w = 2$.

In [None]:
eval_weight(2.0)

The gradient is still negative, so to minimize the loss we will have to
increase $w$.

In [None]:
eval_weight(3.0)

This time the gradient is positive, so to lower the loss we will have to
decrease $w$.

In [None]:
eval_weight(2.8)

If we continue this process, we will be closer and closer from the value
of $w$ for which the loss is minimal. In fact we could have computed
this value directly by getting the mean of the differences between $x$
and $y$:

In [None]:
torch.mean(y - x)

## Automating the gradient descent process

Until now we did the gradient descent “manually”, by selecting new
values based on the sign of the gradient at the current value. We will
now see how to automate this process a bit more.

As a convention, the prediction phase will be defined in a function
called `forward`, which takes our input data as argument (here our `x`
tensor) and applies transformative operations to compute the predicted
values.

To reuse our previous temperatures example, we define our input data
`x`, our true values `y`, our model parameter `w`, and a `forward`
method which computes predictions by applying our model, *ie* by
computing $x + w$.

In [None]:
lyon1924 = [3.1, 1.3, 7.7, 11.0, 15.7, 18.0, 20.6, 16.7, 16.2, 11.9, 7.3, 3.4]
lyon2024 = [5.3, 8.9, 10.9, 12.5, 15.9, 20.5, 23.3, 24.3, 17.4, 15.8, 8.7, 3.8]

x = torch.tensor(lyon1924)
y = torch.tensor(lyon2024)

w = torch.tensor(0.0, requires_grad=True)


def forward(x):
    return x + w


Next we define our loss function, *ie* a measure of “distance” between
our predicted values and the true values. This loss function can be
defined manually (as we did previously), but we can also use predefined
loss functions provided by pytorch. For example, our loss could use
`torch.nn.MSELoss`, which computes the mean squared error.

In [None]:
loss_fn = torch.nn.MSELoss()

As we did above, the basis of a training step will be to apply `forward`
to `x` to compute predictions given the current `w` value, compute the
corresponding loss, and then call `backward` to compute the gradient of
the loss function given `w`.

In [None]:
y_pred = forward(x)
loss = loss_fn(y_pred, y)
loss.backward()
print(f"loss: {loss}, gradient for w: {w.grad.item()}")  # type: ignore

To complete this step and make it a real “training”, we will have to
adjust the value of $w$ in the direction opposite to its gradient. The
simplest way to do it is to substract from $w$ its gradient value
multiplied by a factor called the step size, or **learning rate**.

In [None]:
step_size = 0.3
w.data = w.data - step_size * w.grad  # type: ignore

To run the training process, we have to apply these operations a certain
number of times called **epochs**: we can use a `for` loop to do this.
Note that at the end of each training step we have to “reset” the
gradient of `w` by calling `w.grad.zero_`.

In [None]:
epochs = 5
for epoch in range(epochs):
    y_pred = forward(x)
    loss = loss_fn(y_pred, y)
    loss.backward()
    w.data = w.data - step_size * w.grad  # type: ignore
    print(
        f"epoch: {epoch}, loss: {loss:.3f}, gradient: {w.grad.item():.3f}, w: {w.data.item():.4f}"  # type: ignore
    )
    w.grad.zero_()  # type: ignore

So here is the complete code of our training process. If we run it for a
few epochs we can see that it converges towards the $w$ value that
minimizes the loss.

In [None]:
# Raw data
lyon1924 = [3.1, 1.3, 7.7, 11.0, 15.7, 18.0, 20.6, 16.7, 16.2, 11.9, 7.3, 3.4]
lyon2024 = [5.3, 8.9, 10.9, 12.5, 15.9, 20.5, 23.3, 24.3, 17.4, 15.8, 8.7, 3.8]

# Input data tensor
x = torch.tensor(lyon1924)
# True values tensor
y = torch.tensor(lyon2024)

# Loss function
loss_fn = torch.nn.MSELoss()

# Number of training steps
epochs = 10
# Learning rate
step_size = 0.3

# Model parameter
w = torch.tensor(0.0, requires_grad=True)


# Method to apply our model, ie compute predicted values from input data
def forward(x):
    return x + w


# Training process
for epoch in range(epochs):
    # Compute predictions
    y_pred = forward(x)
    # Compute loss (distance between predictions and targets)
    loss = loss_fn(y_pred, y)
    # Backpropagate to compute parameters gradient
    loss.backward()
    # Adjust parameter value
    w.data = w.data - step_size * w.grad  # type: ignore
    print(
        f"epoch: {epoch}, loss: {loss:.3f}, gradient: {w.grad.item():.3f}, w: {w.data.item():.4f}"  # type: ignore
    )
    # Reset parameter gradient
    w.grad.zero_()  # type: ignore

**Exercise 4**

We have two python lists which give the measured diameters and
perimeters of a certain number of circles.

Use Pytorch to run a training process to find the best value of the
parameter `w` which allows to predict the perimeters from the diameters.
The model to compute the predicted values will $x = y \times w$.

*Hint*: you can use a step size of 0.01.

In [None]:
diameters = [1.4, 2.5, 2.0, 4.8, 4.7, 5.2, 1.3, 2.1, 8.3, 7.4]
perimeters = [4.4, 7.9, 6.3, 15.1, 14.8, 16.3, 4.1, 6.6, 26.1, 23.2]