##### **Autograd:**
We need **autograd** because it automatically computes gradients (derivatives), which are essential for training neural networks using backpropagation.


**General Steps of Using Autograd in PyTorch**

1. **Create tensors with gradient tracking**

   ```python
   x = torch.tensor(value, requires_grad=True)
   ```

   * Tensors with `requires_grad=True` are tracked by autograd.

2. **Define the computation / forward pass**

   ```python
   y = some_function(x)
   ```

   * PyTorch builds a **computational graph** connecting inputs to outputs.

3. **Compute gradients with `.backward()`**

   ```python
   y.backward()
   ```

   * Applies **chain rule** automatically to compute derivatives.
   * For scalar outputs, you can call `.backward()` directly.

4. **Access the gradients**

   ```python
   x.grad
   ```

   * Gradients of the output with respect to each input are stored in `.grad`.

5. **Use gradients for optimization (optional)**

   * Typically, update model parameters using an optimizer:

     ```python
     x = x - learning_rate * x.grad
     ```

* Example without using autograd:

In [4]:
# diff of x^2
def dy_dx(x):
  return 2*x

print(dy_dx(3))

6


* Using Autograd

In [None]:
import torch
x = torch.tensor(3.0, requires_grad=True)
y = x**2
y.backward()
print(x.grad)

None


* Example without using autograd:

In [None]:
# diff of y = x^2 & z = sin(y)
import math
def dz_dx(x):
    return 2 * x * math.cos(x**2)
dz_dx(4)

-7.661275842587077

* Using Autograd

In [None]:
x = torch.tensor(4.0, requires_grad = True)
y = x ** 2
z = torch.sin(y)
z.backward() # this is performing diff
print(x.grad, y.grad)

# we can only get the gradient of the leaf which is x
# z -> sin -> y -> sqrt -> x


tensor(-7.6613) None


  print(x.grad, y.grad)


* Manual process `BCE` loss backpropagation

In [2]:
import torch
x = torch.tensor([2, 1, 4, 9, 8, 6, 7])
x = torch.clamp(x, min=3, max=7)
print(x)

tensor([3, 3, 4, 7, 7, 6, 7])


In [14]:
import torch

x = torch.tensor(6.7) # input feature
y = torch.tensor(0.0) # True label

w = torch.tensor(1.0)
b = torch.tensor(0.0)

In [15]:
def binary_cross_entropy(prediction, target):
    epsilon = 1e-8
    prediction = torch.clamp(prediction, epsilon, 1-epsilon)
    return -(target*torch.log(prediction) + (1-target)*torch.log(1-prediction))

In [None]:
# Forward pass
z = w * x + b
y_pred = torch.sigmoid(z)
loss = binary_cross_entropy(y_pred, y)
print(loss)

tensor(6.7012)


In [18]:
dl_dypred = (y_pred - y)/(y_pred *(1 - y_pred))
dypred_dz = y_pred *(1 - y_pred)
dz_dw = x
dz_db = 1
dL_dw = dl_dypred * dypred_dz * dz_dw
dL_db = dl_dypred * dypred_dz * dz_db

In [19]:
print(f"Manual Gradient of loss w.r.t weight (dw): {dL_dw}")
print(f"Manual Gradient of loss w.r.t bias (db): {dL_db}")

Manual Gradient of loss w.r.t weight (dw): 6.691762447357178
Manual Gradient of loss w.r.t bias (db): 0.998770534992218


* Using Autograd

In [25]:
x = torch.tensor(6.7)
y = torch.tensor(0.0)

w_new = torch.tensor(1.0, requires_grad=True)
b_new = torch.tensor(0.0, requires_grad=True)

In [26]:
z = w_new * z + b_new
y_pred = torch.sigmoid(z)
print(y_pred)

tensor(0.9988, grad_fn=<SigmoidBackward0>)


In [27]:
loss = binary_cross_entropy(y_pred, y)
print(loss)

tensor(6.7012, grad_fn=<NegBackward0>)


In [28]:
loss.backward()
print(w_new.grad, b_new.grad)

tensor(6.6918) tensor(0.9988)


* Using autograd

In [30]:
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = (x**2).mean()
y.backward()
x.grad

tensor([0.6667, 1.3333, 2.0000])

* Clearing gradients 

In [50]:
x = torch.tensor(2.0, requires_grad=True)

In [51]:
y = x ** 2

In [52]:
y.backward()

In [53]:
# diff newly every time after running forwarpass again and again
print(x.grad)

tensor(4.)


In [None]:
x.grad.zero_() # Clearing gradient

tensor(0.)

* When we don't need gradient tracking?  

In the time of macking predictions