# 2 - A GENTLE INTRODUCTION TO TORCH.AUTOGRAD

$\texttt{torch.autograd}$ is PyTorch’s automatic differentiation engine that powers neural network training. In this section, you will get a conceptual understanding of how autograd helps a neural network train.

## 2.1 Background

Neural networks (NNs) are a collection of nested functions that are executed on some input data. These functions are defined by parameters (consisting of weights and biases), which in PyTorch are stored in tensors.

Training a NN happens in two steps:

$\textbf{Forward Propagation}$: In forward prop, the NN makes its best guess about the correct output. It runs the input data through each of its functions to make this guess.

$\textbf{Backward Propagation}$: In backprop, the NN adjusts its parameters proportionate to the error in its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients), and optimizing the parameters using gradient descent.

## 2.2 Usage in PyTorch

Let’s take a look at a single training step. For this example, we load a pretrained resnet18 model from $\texttt{torchvision}$. We create a random data tensor to represent a single image with 3 channels, and height & width of 64, and its corresponding $\texttt{label}$ initialized to some random values. Label in pretrained models has shape (1,1000).

In [1]:
import torch
from torchvision.models import resnet18, ResNet18_Weights
model = resnet18(weights=ResNet18_Weights.DEFAULT)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to C:\Users\Acer~/.cache\torch\hub\checkpoints\resnet18-f37072fd.pth
100%|█████████████████████████████████████████████████████████████████████████████| 44.7M/44.7M [00:35<00:00, 1.31MB/s]


Next, we run the input data through the model through each of its layers to make a prediction. This is the $\textbf{forward pass}$.

In [2]:
prediction = model(data) # forward pass

We use the model’s prediction and the corresponding label to calculate the error ($\texttt{loss}$). The next step is to backpropagate this error through the network. Backward propagation is kicked off when we call $\texttt{.backward}$() on the error tensor. Autograd then calculates and stores the gradients for each model parameter in the parameter’s $\texttt{.grad}$ attribute.

In [3]:
loss = (prediction - labels).sum()
loss.backward() # backward pass

Next, we load an optimizer, in this case SGD (Stochastic Gradient Descent) with a learning rate of 0.01 and momentum (https://towardsdatascience.com/stochastic-gradient-descent-with-momentum-a84097641a5d) of 0.9. We register all the parameters of the model in the optimizer.

In [4]:
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

Finally, we call $\texttt{.step()}$ to initiate gradient descent. The optimizer adjusts each parameter by its gradient stored in $\texttt{.grad}$.

In [5]:
optim.step() #gradient descent

At this point, you have everything you need to train your neural network. For more detalis about how autograd work visit the final sections of https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html