In [77]:
import torch

In PyTorch, when you set requires_grad=True for a tensor, you're telling the
system to track all operations performed on that tensor. This is crucial for automatic differentiation (autograd).

In [78]:
x = torch.tensor(3.0, requires_grad=True)

In [79]:
y = x**2

In [80]:
x

tensor(3., requires_grad=True)

In [81]:
y

tensor(9., grad_fn=<PowBackward0>)

This function initiates the backpropagation process in PyTorch's automatic differentiation engine (Autograd). When you call y.backward(), it computes the gradients of y with respect to all tensors that have requires_grad=True and contributed to the computation of y. In this case, it computes the gradient of y with respect to x.
For y = x**2, the derivative dy/dx is 2x.

In [82]:
y.backward()

After y.backward() is called, x.grad becomes available and stores the computed gradient of y with respect to x.
Since x was 3.0, the gradient 2x evaluates to 2 * 3.0 = 6.0. This is why x.grad outputs tensor(6.).

In [83]:
x.grad

tensor(6.)

In [84]:
#new example
x= torch.tensor(6.0, requires_grad=True)

In [85]:
y = x**2

In [86]:
z = torch.sin(y)

In [87]:
x

tensor(6., requires_grad=True)

In [88]:
y

tensor(36., grad_fn=<PowBackward0>)

In [89]:
z

tensor(-0.9918, grad_fn=<SinBackward0>)

In [90]:
z.backward()

In [91]:
x.grad

tensor(-1.5356)

# **Aam Zindgi**
# **Manual chain rule**

In [92]:
#Input
x = torch.tensor(6.6) #Input feature
y = torch.tensor(0.0) #True Label (binary)

w = torch.tensor(1.0) #(weight)
b = torch.tensor(0) #(bias)

In [93]:
#Binary Cross entropy loss for scaler
def binary_cross_entropy(prediction, target):
  epsilon = 1e-8 #to prevent log(0)
  prediction = torch.clamp(prediction, epsilon, 1-epsilon)
  return -(target*torch.log(prediction) + (1-target)*torch.log(1-prediction))

In [94]:
#Forward pass
z = w*x + b
y_pred = torch.sigmoid(z)
loss = binary_cross_entropy(y_pred, y)

In [95]:
loss

tensor(6.6013)

In [96]:
# Backpropagation

# 1. dL/d(y_pred)
d_loss_dy_pred = (y_pred - y) / (y_pred * (1 - y_pred))

# 2. dy_pred/dz
dy_pred_d_z = y_pred * (1 - y_pred)

# 3. dz/dw and dz/db
dz_d_w = x
dz_d_b = 1 # 1 bias contributes directly to the z

# Chain rule
dL_dw = d_loss_dy_pred * dy_pred_d_z * dz_d_w
dL_db = d_loss_dy_pred * dy_pred_d_z * dz_d_b

In [97]:
print(f"Manual gradient of loss w.r.t dL/dw(weight): {dL_dw}")
print(f"Manual gradient of loss w.r.t dL/db (bias): {dL_db}")

Manual gradient of loss w.r.t dL/dw(weight): 6.591033935546875
Manual gradient of loss w.r.t dL/db (bias): 0.9986414909362793


# **Mentos zindagi**

In [98]:
x = torch.tensor(6.7)
y = torch.tensor(0.0)


In [99]:
w = torch.tensor(1.0, requires_grad=True)
b = torch.tensor(0.0, requires_grad=True)

In [100]:
w

tensor(1., requires_grad=True)

In [101]:
b

tensor(0., requires_grad=True)

In [102]:
z = w*x + b
z

tensor(6.7000, grad_fn=<AddBackward0>)

In [103]:
y_pred = torch.sigmoid(z)
y_pred

tensor(0.9988, grad_fn=<SigmoidBackward0>)

In [104]:
loss = binary_cross_entropy(y_pred, y)
loss

tensor(6.7012, grad_fn=<NegBackward0>)

In [105]:
loss.backward()

In [106]:
x

tensor(6.7000)

In [107]:
print(w.grad)
print(b.grad)

tensor(6.6918)
tensor(0.9988)


# We can use autograd for vector also

In [136]:
x = torch.tensor([1.0,2.0,3.0], requires_grad=True)


In [109]:
y = (x**2).mean()
y

tensor(4.6667, grad_fn=<MeanBackward0>)

In [110]:
y.backward()

In [111]:
x.grad

tensor([0.6667, 1.3333, 2.0000])

In [112]:
"""
In PyTorch, gradients accumulate by default during backpropagation. This means that if you run backward() multiple times without clearing the gradients,
 the new gradients will be added to the existing ones. This is useful in some advanced scenarios,
 like accumulating gradients over mini-batches before taking a single optimization step.
 However, for most standard training loops, you want to calculate the gradients for the current pass only,
 so you need to reset them to zero before each new backward pass. If you don't clear them, the gradients from previous computations will interfere with the current ones,
 leading to incorrect updates of your model's parameters.
"""
x.grad.zero_()

tensor([0., 0., 0.])

# **How to disable grdient tracking**
When you're making predictions with a trained model or evaluating its performance, you don't need to compute gradients. Disabling gradient tracking (torch.no_grad()) saves memory and speeds up computations because PyTorch doesn't build the computation graph. This is crucial for efficient deployment and evaluation.

In [113]:
#forward pass
x= torch.tensor(2.0, requires_grad=True)


In [114]:
x

tensor(2., requires_grad=True)

In [115]:
y = x ** 2
y

tensor(4., grad_fn=<PowBackward0>)

In [116]:
#backward pass
y.backward()

In [117]:
x.grad

tensor(4.)

In [118]:
# disabling gradient tacking

#option1. require_grad_(False)
x.requires_grad_(False)

tensor(2.)

In [119]:
x

tensor(2.)

In [120]:
y = x ** 2

In [121]:
y

tensor(4.)

In [122]:
#now it would give error ,cuz u have disabled the gradint tracking
y.backward()

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

In [123]:
#Option 2.
x= torch.tensor(2.0, requires_grad=True)

The line z = x.detach() creates a new tensor z that is a copy of x but detached from the computation graph. This means that z will not have requires_grad=True, even if x did. Any operations performed on z will not be tracked for gradient computation, effectively disabling gradient tracking for z and its subsequent operations. This is useful when you want to use a tensor's value without tracking its history of computations or contributing to its gradient calculation.

In [124]:
z = x.detach()
z

tensor(2.)

In [125]:
y = x**3

In [126]:
y

tensor(8., grad_fn=<PowBackward0>)

In [127]:
y1 = z **2

In [128]:
y1

tensor(4.)

In [129]:
y1.backward()

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

y1.backward() with respect to z cannot compute the gradient because it has been deteached from the computation graps.

In [130]:
#Option 3. torch.no_grad()

In [132]:
x = torch.tensor(5.0, requires_grad=True)
x

tensor(5., requires_grad=True)

In [134]:
with torch.no_grad(): # if u remove the with statement then u  can backtrack
  y = x ** 2
y

tensor(25.)

In [135]:
y.backward()

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn