# Introduction


**What?** Introduction to automatic differentiation in pyTorch



# Import python modules

In [6]:
import torch, torchvision

In [7]:
print("torch version: ",torch.__version__)
print("torchvision version:", torchvision.__version__)

torch version:  1.7.1
torchvision version: 0.8.2


# Background

In [None]:
"""
Forward Propagation: In forward prop, the NN makes its best guess about the correct output. It runs the input data
through each of its functions to make this guess.

Backward Propagation: In backprop, the NN adjusts its parameters proportionate to the error in its guess. It does 
this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients), and optimizing the parameters using gradient descent
"""

# Loading the model

In [None]:
"""
Let’s load a pretrained resnet18 model from torchvision. We create a random data tensor to represent a single 
image with 3 channels, and height & width of 64, and its corresponding label initialized to some random values.
"""

In [9]:
model = torchvision.models.resnet18(pretrained=True)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)

In [25]:
print("Data's shape: ", data.shape)
print("Labels' shape: ", labels.shape)

Data's shape:  torch.Size([1, 3, 64, 64])
Labels' shape:  torch.Size([1, 1000])


# Forward pass

In [26]:
prediction = model(data)
print("Prediction's shape: ", prediction.shape)

Prediction's shape:  torch.Size([1, 1000])


# Backward propagation

In [27]:
loss = (prediction - labels).sum()
print("Prediction's shape: ", loss)
loss.backward()

Prediction's shape:  tensor(-495.5042, grad_fn=<SumBackward0>)


# Train the model

In [None]:
"""
Next, we load an optimizer, in this case SGD with a learning rate of 0.01 and momentum of 0.9. We register 
all the parameters of the model in the optimizer.
Finally, we call .step() to initiate gradient descent. The optimizer adjusts each parameter by its gradient 
stored in .grad.
"""

In [31]:
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

In [32]:
# perform gradient descent step
optim.step()

# References


- https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#sphx-glr-beginner-blitz-autograd-tutorial-py

