<a href="https://colab.research.google.com/github/smlra-kjsce/Pytorch-101/blob/main/Autograd_and_optimizers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Autograd

Autograd is used to calulate the gradients of a tensor. It is a very useful tool when we know that we would require to calulcate the gradients of a tensor. It stores a computational graph of the tensor. It simply works by setting requires_grad=True


In [None]:
x=torch.randn(3,2,requires_grad=True)
print(x)

# In the output, we can see the AddBackward, i.e. the output has been obtained by adding
y = x + 10
print(y)

# In the output, we can see the MulBackward, i.e. the output has been obtained by mutiplication
z = y*y*y
print(z)

# In the output, we can see the MeanBackward, i.e. the output has been obtained by taking average
w = z.mean()
print(w)

tensor([[ 0.5518,  0.7587],
        [ 1.2387,  0.4535],
        [-0.0475, -0.1571]], requires_grad=True)
tensor([[10.5518, 10.7587],
        [11.2387, 10.4535],
        [ 9.9525,  9.8429]], grad_fn=<AddBackward0>)
tensor([[1174.8466, 1245.3203],
        [1419.5282, 1142.3099],
        [ 985.8307,  953.6094]], grad_fn=<MulBackward0>)
tensor(1153.5742, grad_fn=<MeanBackward0>)


To see the gradients of the tensor, we just need to call .backward() method. The gradients are calculated with respect to the original tensor and are stored in original_tensor.grads

In [None]:
x=torch.randn(2,4,requires_grad=True)
print(x)

y=x+2
print(y)

z=y.mean()
print(z)

z.backward()
print(x.grad)

tensor([[-1.0426, -1.5887, -1.3168, -1.5934],
        [ 0.7231, -0.5572, -0.3969,  0.3268]], requires_grad=True)
tensor([[0.9574, 0.4113, 0.6832, 0.4066],
        [2.7231, 1.4428, 1.6031, 2.3268]], grad_fn=<AddBackward0>)
tensor(1.3193, grad_fn=<MeanBackward0>)
tensor([[0.1250, 0.1250, 0.1250, 0.1250],
        [0.1250, 0.1250, 0.1250, 0.1250]])


Note that the last value was a single valued tensor i.e. a sclar(as we computed mean). So there was no need of specifying the vector with respect to whose gradient we needed to calculate. However if the last value would have been a vector, we need to pass a vector of the same dimension as that of the last value to the .grad() function in order for pytorch to know in respect to which values of the vector, it needs to calculate the gradients.

In [None]:
x=torch.randn(2,4,requires_grad=True)
print(x)

y=x+2
print(y)

z=y*y
print(z)

w = torch.randn(2,4)
z.backward(w)
print(x.grad)

tensor([[-0.8952,  0.9157, -0.4119,  1.4888],
        [-0.1434, -0.7998,  1.0399, -2.0248]], requires_grad=True)
tensor([[ 1.1048,  2.9157,  1.5881,  3.4888],
        [ 1.8566,  1.2002,  3.0399, -0.0248]], grad_fn=<AddBackward0>)
tensor([[1.2206e+00, 8.5011e+00, 2.5219e+00, 1.2172e+01],
        [3.4469e+00, 1.4405e+00, 9.2412e+00, 6.1321e-04]],
       grad_fn=<MulBackward0>)
tensor([[-3.6753, -3.5920,  1.4077,  2.7785],
        [ 6.5904,  0.6255, -1.8888,  0.0136]])


Sometimes, we do not require pytorch to track the graidents, so for such times, we can either directly set the requires_grad to false or use x.detach() or wrap the functions in 'with torch.no_grad():'

In [None]:
x=torch.randn(2,4,requires_grad=True)
print(x)

# As we can see, we do not have the grad_fn in the y and z tensors
y = x.detach()
print(y)

with torch.no_grad():
  z = x+1
  print(z)

tensor([[ 0.6527,  0.9514,  0.7132,  0.1135],
        [ 1.1598, -1.6017, -0.1744, -0.3231]], requires_grad=True)
tensor([[ 0.6527,  0.9514,  0.7132,  0.1135],
        [ 1.1598, -1.6017, -0.1744, -0.3231]])
tensor([[ 1.6527,  1.9514,  1.7132,  1.1135],
        [ 2.1598, -0.6017,  0.8256,  0.6769]])


Many a times during training, we need to flush out the gradients, so that they are not accumulated again and again during other epochs. This is achieved simply by using the tensor.grad.zero_() method

In [None]:
x=torch.randn(2,4,requires_grad=True)
print(x)

for i in range(3):
  y = (x+2).mean()
  y.backward()
  print(x.grad)

print("The above adds the gradients again and again and hence is incorrect. The correct one is shown below ")

for i in range(3):
  y = (x+2).mean()
  y.backward()
  print(x.grad)
  x.grad.zero_()     # This flushes out the gradients

tensor([[ 0.4291,  1.9847, -0.7129, -0.7841],
        [-1.3326, -0.3136, -1.0314,  0.8431]], requires_grad=True)
tensor([[0.1250, 0.1250, 0.1250, 0.1250],
        [0.1250, 0.1250, 0.1250, 0.1250]])
tensor([[0.2500, 0.2500, 0.2500, 0.2500],
        [0.2500, 0.2500, 0.2500, 0.2500]])
tensor([[0.3750, 0.3750, 0.3750, 0.3750],
        [0.3750, 0.3750, 0.3750, 0.3750]])
The above adds the gradients again and again and hence is incorrect. The correct one is shown below 
tensor([[0.5000, 0.5000, 0.5000, 0.5000],
        [0.5000, 0.5000, 0.5000, 0.5000]])
tensor([[0.1250, 0.1250, 0.1250, 0.1250],
        [0.1250, 0.1250, 0.1250, 0.1250]])
tensor([[0.1250, 0.1250, 0.1250, 0.1250],
        [0.1250, 0.1250, 0.1250, 0.1250]])


Having learnt this, now let us implement a small linear regression in pytorch with the recently learnt autograds.

In [None]:
# We import nn for the loss function.
import torch.nn as nn

# Our data points, here we have only one :( 
x = torch.tensor([10.0,5.0])
y = torch.tensor([10.0,5.0])

# Initialize w and b as 1 and 0 respectively
w = torch.ones(1,requires_grad=True)
b = torch.ones(1,requires_grad=True)

# Define the forward pass
def forward(x):
  return w*x+b

# Define hyperparameters i.e. learning rate and epochs
epochs = 100
learning_rate = 0.001

# Build the linear regression loop and train it for sepcified epochs. We use simple Mean squared loss (MSE)
for i in range (epochs):

  # Forward Pass
  y_predicted = forward(x)

  # caluclating Loss
  loss = nn.MSELoss()
  L = loss(y,y_predicted)

  # Calculating gradients for the loss (Backward pass or backpropagation)
  L.backward()

  # Manually Updating weights starts here
  # Since we do not need pytorch to track this updates, as this is not used in backprop, we use it inside no_grad()
  # with torch.no_grad():
    # w -= learning_rate*(w.grad)
    # b -= learning_rate*(b.grad)

  # WARNING: Do not forget to flush out weights
  # w.grad.zero_()
  # b.grad.zero_()
  # Manually updating weights end here

  # Instead of manually updating the weights, we can use optimizer present in pytorch.nn module
  # Here, we have used SGD(Stochastic Gradient Descent)
  optimizer = torch.optim.SGD([w,b],lr=learning_rate)
  optimizer.step()
  optimizer.zero_grad()

  if (i%10 == 0 ):
    print("Epoch " + str(i))
    print("Loss " + str(L))

print(y_predicted)

Epoch 0
Loss tensor(1., grad_fn=<MseLossBackward>)
Epoch 10
Loss tensor(0.1568, grad_fn=<MseLossBackward>)
Epoch 20
Loss tensor(0.1004, grad_fn=<MseLossBackward>)
Epoch 30
Loss tensor(0.0963, grad_fn=<MseLossBackward>)
Epoch 40
Loss tensor(0.0957, grad_fn=<MseLossBackward>)
Epoch 50
Loss tensor(0.0953, grad_fn=<MseLossBackward>)
Epoch 60
Loss tensor(0.0949, grad_fn=<MseLossBackward>)
Epoch 70
Loss tensor(0.0945, grad_fn=<MseLossBackward>)
Epoch 80
Loss tensor(0.0942, grad_fn=<MseLossBackward>)
Epoch 90
Loss tensor(0.0938, grad_fn=<MseLossBackward>)
tensor([9.8048, 5.3858], grad_fn=<AddBackward0>)


A very important point to note in the above code is that while updating the w and b if we write w = w - lr * (w.grad), it won't work. This is becasue this statement will store it in a new 'w'. Hence we use w -= lr * (w.grad)