# A introduction of torch.autograd

## Background

## Usage in PyTorch

In [None]:
import torch, torchvision

model = torchvision.models.resnet18(pretrained=True)

data = torch.rand(1, 3, 64, 64)

labels = torch.rand(1, 1000)

In [None]:
prediction = model(data) # forward pass

In [None]:
loss = (prediction - labels).sum()
loss.backward() # backward pass

In [None]:
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

In [None]:
optim.step() # gradient descent

## Differentiation in autograd

In [1]:
import torch
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

In [2]:
a, b

(tensor([2., 3.], requires_grad=True), tensor([6., 4.], requires_grad=True))

In [3]:
a.shape, b.shape

(torch.Size([2]), torch.Size([2]))

In [4]:
Q = 3* a**3 - b**2
Q, Q.shape, type(Q)

(tensor([-12.,  65.], grad_fn=<SubBackward0>), torch.Size([2]), torch.Tensor)

Let’s assume a and b to be parameters of an NN, and Q to be the error. 

In NN training, we want gradients of the error w.r.t. parameters, i.e.

$$
\frac{\partial Q}{\partial a} = 9a^2 \\
\frac{\partial Q}{\partial b} = -2b 
$$

When we call `.backward()` on Q, autograd calculates these gradients and stores them in the respective tensors’ `.grad` attribute.

We need to **explicitly** pass a `gradient` argument in `Q.backward()` because it is a **vector**. 

`gradient` is a **tensor** of the _same shape_ as Q, and it represents **the gradient of Q w.r.t. itself**, i.e.

$$
\frac{d Q}{d Q} = 1
$$

Equivalently, we can also aggregate Q into a **scalar** and call backward implicitly, like `Q.sum().backward()`.

In [5]:
external_grad = torch.tensor([1.5, 1.9])
Q.backward(gradient=external_grad)

Gradients are now deposited in `a.grad` and `b.grad`

In [6]:
# check if the gradients collected are correct
print(9*a**2, a.grad)
print(9*a**2 == a.grad)
print(-2*b, b.grad)
print(-2*b == b.grad)

tensor([36., 81.], grad_fn=<MulBackward0>) tensor([ 54.0000, 153.9000])
tensor([False, False])
tensor([-12.,  -8.], grad_fn=<MulBackward0>) tensor([-18.0000, -15.2000])
tensor([False, False])


In [None]:
c = torch.tensor([2., 3.], requires_grad=True)
d = torch.tensor([6., 4.], requires_grad=True)

L = 3* c**3 - d**2

external_grad2 = torch.tensor([1.5, 0.5])
L.backward(gradient=external_grad2)

print(9*a**2, a.grad)
print(9*a**2 == a.grad)
print(-2*b, b.grad)
print(-2*b == b.grad)

In [None]:
type(L)

In [26]:
import torch 

x = torch.tensor([1., 2.], requires_grad=True)

x

tensor([1., 2.], requires_grad=True)

In [27]:
y1 = 2*x[0]**2 + x[1]
y2 = 3*x[0] + 4*x[1]**3

y = torch.tensor([y1, y2])

y, type(y)

(tensor([ 4., 35.]), torch.Tensor)

In [30]:
external_grad3 = torch.tensor([1., 1.])
y.backward(torch.ones(y.shape))

x.grad

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

## Optional Reading - Vector Calculus using `autograd`

Mathematically, if you have a vector valued function $\overrightarrow{y}=f(\overrightarrow{x})$, then the gradient of \overrightarrow{y}  with respect to \overrightarrow{x} is a **Jacobian matrix** $J$:

$$
J = \frac{\partial \overrightarrow{y}}{\partial \overrightarrow{x}}
= 
\begin{bmatrix}
\frac{\partial \overrightarrow{y}}{\partial x_1} &
\dots & 
\frac{\partial \overrightarrow{y}}{\partial x_n} 
\end{bmatrix}
=
\begin{bmatrix}
\frac{\partial y_1}{\partial x_1} & \dots & \frac{\partial y_1}{\partial x_n} \\
\vdots & \ddots & \vdots \\
\frac{\partial y_m}{\partial x_1} & \dots & \frac{\partial y_m}{\partial x_n} 
\end{bmatrix}
$$