# Gentle Introduction to Autograd

## Background

Neural Networks are a collection of nested functions that are executed on some input data. These functions are defined by parameters (weights and biases).

Training happens in two steps: </br>
**Forward Propagation** : In this step, the NN makes its best guess about the correct output. It runs the input data through each of its functions to make this guess. </br>
**Backward Propagation** : In this step,the NN adjusts its parameters proportionate to the error in its guess. It does this by traversing backwards from the output.

# Usage in Pytorch
Loading a pretrained `resnet18` model from `torchvision`.

In [2]:
import torch, torchvision
model = torchvision.models.resnet18(pretrained=True)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

In [3]:
prediction = model(data) #forward pass

We use the model’s prediction and the corresponding label to calculate the error (loss). The next step is to backpropagate this error through the network. Backward propagation is kicked off when we call .backward() on the error tensor. Autograd then calculates and stores the gradients for each model parameter in the parameter’s `.grad` attribute.


In [4]:
loss = (prediction - labels).sum()
loss.backward() # backward pass

Next, we load an optimizer, in this case `SGD` with a learning rate of 0.01 and momentum of 0.9. We register all the parameters of the model in the optimizer.

In [5]:
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

In [6]:
optim.step() #gradient descent

# Differentiation in Autograd

In [7]:
import torch
a = torch.tensor([2.,3.], requires_grad=True)
b = torch.tensor([6.,4.], requires_grad=True)

We create another tensor `Q` from `a` and `b`.</br>
$Q = 3a^{3} - b^{2}$

In [8]:
Q = 3*a**3 - b**2

let's assume `a` and `b` to be parameters of an NN, and `Q` to be the error. In NN training, we want gradient of the error wrt parameters, ie.

$\frac{\partial Q}{\partial a} = 9a^{2}$
$\frac{\partial Q}{\partial b} = -2b$

When we call `.backward()` on `Q`, autograd calculates these gradients and stores them in the respective tensors `.grad` attribute.

We need to explicitly pass a gradient argument in `Q.backward()` because it is a vector. `gradient` is a tensor of the same shape as `Q`, and it represents the gradient of Q w.r,t itself i.e

$\frac{dQ}{dQ} = 1$

Equivalently, we can also aggregate Q into a scalar and call backward implicitly, like `Q.sum().backward()`.

In [9]:
external_grad = torch.tensor([1., 1.])
Q.backward(gradient=external_grad)

Gradients are now deposited in `a.grad` and `b.grad`

In [10]:
print(9*a**2 == a.grad)
print(-2*b == b.grad)

tensor([True, True])
tensor([True, True])
