# Module 2 - A Gentle Introduction to TORCH.AUTOGRAD
`torch.autograd` is PyTorch's automatic differentiation engine that powers neural network training.

## Background 
NNs are a collection of nested functions that are executed on some input. These functions are defined by parameters (consisting of weights and biases), which in PyTorch are stored in tensors.

Training a NN happens in two steps:
Forward Propagation: In forward prop, the NN makes it best guess about the correct output. It runs the input data through each of its functions to make this guess.
Backward Propagation: In backprop, the NN adjusts its parameters proportionally to the error of its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients), and optimizing the parameters using gradient descent.

In [12]:
import torch
from torchvision.models import resnet18, ResNet18_Weights
model = resnet18(weights=ResNet18_Weights.DEFAULT)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

#Next, run input data through the model through each of its layers to make a prediction. This is a forward pass:
prediction = model(data)
# Use the model's prediction and the corresponding label to calculate the error (loss). The next step is to backprop this error through the network. Backprop is kicked off when we call backward() on the error tensor. Autograd then calculates and stores the gradients for each model parameter in the parameter's .grad attribute.
loss = (prediction - labels).sum()
loss.backward()

optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)
optim.step()

In [35]:
torch.set_default_device("mps")
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

Q = 3*a**3 - b**2
print(Q)


tensor([-12.,  65.], device='mps:0', grad_fn=<SubBackward0>)


In [38]:
external_grad = torch.tensor([1., 1.])
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)
Q = 3*a**3 - b**2
Q.backward(gradient=external_grad)
print(9*a**2 == a.grad)
print(-2*b == b.grad)

tensor([True, True], device='mps:0')
tensor([True, True], device='mps:0')


In a NN, parameters that don’t compute gradients are usually called frozen parameters. It is useful to “freeze” part of your model if you know in advance that you won’t need the gradients of those parameters (this offers some performance benefits by reducing autograd computations).

In finetuning, we freeze most of the model and typically only modify the classifier layers to make predictions on new labels. Let’s walk through a small example to demonstrate this. As before, we load a pretrained resnet18 model, and freeze all the parameters.

In [39]:
from torch import nn, optim

model = resnet18(weights=ResNet18_Weights.DEFAULT)

# Freeze all the parameters in the network
for param in model.parameters():
    param.requires_grad = False
    

Let’s say we want to finetune the model on a new dataset with 10 labels. In resnet, the classifier is the last linear layer model.fc. We can simply replace it with a new linear layer (unfrozen by default) that acts as our classifier.


In [41]:
model.fc = nn.Linear(512, 10)
optimizer = optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)
