# Autograd

Sofar, we have seen how to make calculations with torch, and how to build a datagenerator. 

So, in theory we have enough knowledge to deliver the data in batches to our machine learning model to perform calculations on the data.

But how to adjust the weights? How does the model learn which weights should be adjusted in which direction?

In [2]:
import torch

w = torch.tensor([2.], requires_grad=True)
b = torch.tensor([6.], requires_grad=True)

We start with two tensors `a` and `b`.

We create a new tensor $Q$ with a calculation:
$$
Q = w x + b
$$

In [3]:
x = torch.tensor([1.0, 2.0, 3.0])

In [8]:
Q = w * x + b
Q

tensor([ 8., 10., 12.], grad_fn=<AddBackward0>)

This gives a certain outcome. But how do we know if this is correct? For that, we need 

- some sort of ground truth.
- a way to calculate the error

A common way to calculate the error is the Mean Square Error:

In [5]:
def mse(y: torch.Tensor, yhat: torch.Tensor) -> torch.Tensor:
    return ((y - yhat)**2).mean()

In [10]:
y = 4 * x + 1
y

tensor([ 5.,  9., 13.])

In [11]:
loss = mse(y, Q)

During training, we need the gradients of the error with respect to the parameters. This means we want:

$$
\frac{\partial \mathcal{L}}{\partial w}
$$
and
$$
\frac{\partial \mathcal{L}}{\partial b}
$$

In [12]:
loss.backward()

We could calculate the derivatives by hand, which is tedious, especially if you have many nested calculations. But because our two parameters `w` and `b` where marked with `requires_grad=True`, the gradient was tracked.

In [14]:
w.grad, b.grad

(tensor([1.3333]), tensor([2.]))

Typically, we would adjust the weights by a certain factor, the learning rate. Typically this is set to `1e-3` , but it can be as big as `1e-1` and as small as `1e-5`. 

It can even vary during training: you start with `1e-1`, and if the improvement of the learning slows down you decrease the learning rate with a certain factor, e.g. to `1e-2`.

In [15]:
learning_rate = 1e-1
w = w - learning_rate * w.grad 
b = b - learning_rate * b.grad
w, b

(tensor([1.8667], grad_fn=<SubBackward0>),
 tensor([5.8000], grad_fn=<SubBackward0>))

After the adjustment of the weights, the training continues:

- make a prediction
- calculate the loss
- calculate the gradients
- adjust the weights with respect to the error with a certain rate

And this is how the model learns!

# From zero to deep learning
What would this look like with a real model?
Let's download a preptrained resnet18 model, see the [docs](https://pytorch.org/vision/main/models.html) for more models. 

In [16]:
import torch, torchvision
torch.hub.set_dir("../../models/")
model = torchvision.models.resnet18(pretrained=True)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

I will show how the learning is done from start to end. We will encounter:

- the prediction
- the loss
- the optimizer

While there is a lot more to say about picking loss and optimizers, the defaults often work pretty decent. So, use MSE for regression problems, and use either an `SGD` or `Adam` as an optimization algorithm. That is enough for now.

We made a dummy image and dummy labels for prediction. Note that the Width, Height and RGB Channels of images are often in the order (W, H, C), but torch uses the (C, W, H) convention.

We can make a prediction:

In [17]:
yhat = model(data)
yhat.shape

torch.Size([1, 1000])

resnet18 is trained on 1000 classes, so the output are thousand numbers: one for every class it considers.

We can calculate a loss:

In [19]:
loss = mse(yhat, labels)
loss.backward()

Because the resnet has many many weights, and we also have different ways to optimize the weights, we need to pick an [optimizer](https://pytorch.org/docs/stable/optim.html).

For now, we will pick the SGD from the [list of optimizer available](https://pytorch.org/docs/stable/optim.html#torch.optim.Optimizer)



In [20]:
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

With `.step()`, all gradients are calculated and all trainable parameters are adjusted.

In [21]:
optim.step()

After this, we can train another batch, and adjust weights, etc etc.