## Autograd
Handling gradients in Pytorch

In [7]:
import math
import torch

In [3]:
# y = x^2
# dy_dx = 2*x

def dy_dx(x):
    return 2*x

dy_dx(3)

6

In [6]:
# z = sin(y) 
# chain rule -> dz_dx = dz_dy * dy_dx
# dz_dy = cos(y) = cos(x^2)

def dz_dx(x):
    return 2*x*math.cos(x**2)

dz_dx(3)

-5.466781571308061

In [None]:
# -[ylog(y_hat)+(1-y)log(1-y_hat)]

### How to use Autograd

We create a `tensor` with `requires_grad` as `True`. 
This signals pytorch that whenever we perform any operation on this tensor, it will keep its record and calculate its gradient and store it.
And when we need this gradient, it simply returns it.

In [21]:
x = torch.tensor(3.0, requires_grad=True) #requires_grad is False by default

When we do this next ops, torch will create a computation graph, where x is multiplied with x and generates y. Hence,`dy_dx` will be calculated.

In [22]:
y = x**2

In [23]:
x

tensor(3., requires_grad=True)

In [24]:
y

tensor(9., grad_fn=<PowBackward0>)

In [None]:
# y.backward()

In [25]:
x.grad

In [26]:
z = torch.sin(y)
z

tensor(0.4121, grad_fn=<SinBackward0>)

In [27]:
z.backward()

In [28]:
x.grad

tensor(-5.4668)

In [31]:
y.grad

  y.grad


### Training a simple binary classifier manually

In [32]:
# inputs 
x = torch.tensor(6.7) # cgpa
y = torch.tensor(0.0) # not placed

# parameters
w = torch.tensor(1.0) #weight
b = torch.tensor(0.0) #bias

In [33]:
# Binary cross entropy for loss function
def binary_cross_entropy_loss(y_predicted, y_target):
    epsilon = 1e-8 # to prevent log(0)

    y_predicted = torch.clamp(y_predicted, epsilon, 1-epsilon)

    return -(y_target*torch.log(y_predicted) + (1-y_target)*torch.log(1-y_predicted))

In [34]:
# forward pass

# 1. linear transform
z = w * x + b

# 2. sigmoid
y_predicted = torch.sigmoid(z)

# 3. compute loss
loss = binary_cross_entropy_loss(y_predicted, y)

In [35]:
loss

tensor(6.7012)

In [39]:
# backpropagation

# 1. dL/d(y_pred): Loss with respect to the prediction (y_pred)
dloss_dy_pred = (y_predicted - y)/(y_predicted* (1-y_predicted) )
# 2. dy_pred/dz: Prediction (y_pred) with respect to z (sigmoid derivative)
dy_pred_dz = y_predicted * (1 - y_predicted)
# 3. dz/dw and dz/db: z with respect to w and b
dz_dw = x
dz_db = 1
# dz/dw = x|
# dz/db = 1 (bias contributes directly to z)
dL_dw = dloss_dy_pred * dy_pred_dz * dz_dw
dL_db = dloss_dy_pred * dy_pred_dz * dz_db

In [40]:
print(f"Manual gradient of loss wrt weight dw: {dL_dw}")
print(f"Manual gradient of loss wrt bias db: {dL_db}")


Manual gradient of loss wrt weight dw: 6.691762447357178
Manual gradient of loss wrt bias db: 0.998770534992218


### Doing the same work but with coolness of PyTorch

In [41]:
x = torch.tensor(6.7)
y = torch.tensor(0.0)

Note, that here we will keep `requires_grad` as `True` for w and b as we need their gradients actually.

In [42]:
w = torch.tensor(1.0, requires_grad=True)
b = torch.tensor(0.0, requires_grad=True)

In [43]:
z = w*x + b
z

tensor(6.7000, grad_fn=<AddBackward0>)

In [44]:
y_predicted = torch.sigmoid(z)
y_predicted

tensor(0.9988, grad_fn=<SigmoidBackward0>)

In [45]:
loss = binary_cross_entropy_loss(y_predicted, y)
loss

tensor(6.7012, grad_fn=<NegBackward0>)

Now we will simply use `backward()` and the computation graph will do its magic.

In [46]:
loss.backward()

In [47]:
print(w.grad)

tensor(6.6918)


In [48]:
print(b.grad)

tensor(0.9988)


### Be aware of Gradient Accumulation

In [54]:
x = torch.tensor(2.0, requires_grad=True)
x

tensor(2., requires_grad=True)

In [62]:
y = x**2
y

tensor(4., grad_fn=<PowBackward0>)

In [63]:
y.backward()

In [64]:
x.grad

tensor(4.)

after calc grad once, if you do forward pass again and then again backward then the grad of x will increment (accumulate) so we need to clear the gradients before 
running the next forward pass.

In [61]:
x.grad.zero_()

tensor(0.)

In [None]:
# x.requires_grad_(False) 
# use this when grad needs to be turned off