# Deep dive into backpropagation

In [None]:
import torch

<img src="assets/regression_1d.drawio.svg" width="500">

In [None]:
model = torch.nn.Linear(
    in_features=1,
    out_features=1,
    bias=True
)
model.state_dict() # print the weights and biases of the model

In [None]:
x = torch.randn(1)
x

In [None]:
w = 3.0
b = 2.0

y_true = w * x + b
y_true

In [None]:
loss_fn = torch.nn.MSELoss()

### Forward pass

In [None]:
y_pred = model(x)
y_pred

In [None]:
error = loss_fn(y_pred, y_true)
error

Let's manually compute the error and compare it with the error computed above:

In [None]:
# The loss is the mean squared error between the predicted and true values
(y_pred - y_true)**2

$$
E =
\left(
\underbrace{
f(
\overbrace{
w \cdot x + b
}^{\sigma}
)
}_{y}
- y^*
\right)^2
$$

$$
\begin{align}
E &= \left( y - y^* \right)^2 \\
y &= f(\sigma) \\
\sigma &= w \cdot x + b \\
\end{align}
$$

### Backward pass

#### Computation $\frac{\partial E}{\partial w}$ with PyTorch autograd

In [None]:
error.backward()

In [None]:
# Get gradients
for name, param in model.named_parameters():
    if param.requires_grad:
        print(f"Gradient for {name}: {param.grad}")

In [None]:
model.state_dict()

This is our reference computation. We will use it to validate the manual computation that follows.

#### Manual computation of $\frac{\partial E}{\partial w}$

Let's rewrite the computation of $\frac{\partial E}{\partial w}$ manually.

Using the chain rule:

$$
\frac{\partial E}{\partial w} =
\frac{\partial E}{\partial \color{green}{\sigma}} ~
\frac{\partial \color{green}{\sigma}}{\partial w} ~
$$

knowing that:

$$
\begin{align}
\frac{\partial E}{\partial \color{green}{\sigma}} &= 2 (\sigma - y^*) \\
\frac{\partial \color{green}{\sigma}}{\partial w} &= x \\
\end{align}
$$

we can write:

$$
\frac{\partial E}{\partial w} = 2(\sigma - y^*) \cdot x
$$

Let's apply this formula to the previous example:

In [None]:
sigma = y_pred

grad_E_w = 2 * (sigma - y_true) * x
grad_E_w

Ok, we obtain the same result than with PyTorch autograd.

#### Manual computation of $\frac{\partial E}{\partial b}$

Let's rewrite the computation of $\frac{\partial E}{\partial b}$ manually.

Using the chain rule:

$$
\frac{\partial E}{\partial b} =
\frac{\partial E}{\partial \color{green}{\sigma}} ~
\frac{\partial \color{green}{\sigma}}{\partial b} ~
$$

knowing that:

$$
\begin{align}
\frac{\partial E}{\partial \color{green}{\sigma}} &= 2 (\sigma - y^*) \\
\frac{\partial \color{green}{\sigma}}{\partial b} &= 1 \\
\end{align}
$$

we can write:

$$
\frac{\partial E}{\partial w} = 2(\sigma - y^*)
$$

Let's apply this formula to the previous example:

In [None]:
f = torch.nn.functional.tanh

sigma = y_pred

grad_b = 2 * (sigma - y_true)
grad_b

Ok, we obtain the same result than with PyTorch autograd.