# **PyTorch Autograd**

Autograd is PyTorch's automatic differentiation engine. It keeps track of all operations performed on tensors that have `requires_grad=True`, creating a computation graph dynamically. This graph is used to compute gradients for optimization tasks like backpropagation.

### **How Does It Work?**
1. **Computation Graph:**
   - When you perform operations on tensors, autograd dynamically builds a directed acyclic graph (DAG) where nodes represent operations and edges represent the flow of data.
   - This graph allows autograd to trace how each tensor is derived from others.

2. **Backward Pass:**
   - When you call `.backward()` on a tensor, autograd traverses the graph in reverse order (hence "backpropagation"), computing gradients for all tensors with `requires_grad=True`.

3. **Gradient Storage:**
   - Gradients are stored in the `.grad` attribute of the corresponding tensor.

## **Import Dependencies**

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import torch

import warnings
warnings.filterwarnings('ignore')

## **Calculate Gradient Manually**

**Example-1:**<br>
We want to calculate the gradient of the function:

$$ y = x^2 $$

The derivative of \( y \) with respect to \( x \) is:

$$ \frac{\partial y}{\partial x} = 2x $$

Using Python, we can compute the gradient at a specific value of \( x \) with the following code:


In [3]:
# Function to calculate the gradient of y = x^2
def dy_dx(x):
    """
    Calculate the derivative of y = x^2 with respect to x.

    Parameters:
        x (float or int): The value of x at which the gradient is evaluated.

    Returns:
        float: The gradient (2 * x).
    """
    return 2 * x

# Example usage
x = 3
gradient = dy_dx(x)
print(f"The gradient of y = x^2 at x = {x} is {gradient}.")

The gradient of y = x^2 at x = 3 is 6.


**Example-2:**<br>
We want to calculate the gradient of the function:
$$ y = x^2 $$
$$ z = \sin {y} $$

The derivative of \( z \) with respect to \( x \) is:

$$ \frac{\partial z}{\partial x} = \frac{∂z}{∂y} ⋅ \frac{∂y}{∂x} $$
$$ \frac{\partial z}{\partial x} = \cos{y} ⋅ 2x $$
$$ \frac{\partial z}{\partial x} = \cos{x^2} ⋅ 2x $$

Using Python, we can compute the gradient at a specific value of \( x \) with the following code:

In [3]:
# Function to calculate the gradient of z
import math

def dz_dx(x):
    """
    Calculate the derivative of z with respect to x.

    Parameters:
        y (float or int): The value of x at which the gradient is evaluated.

    Returns:
        float: The gradient.
    """
    return math.cos(x**2) * (2 * x)

# Example usage
x = 3
gradient = dz_dx(x)
print(f"The gradient of z at x = {x} is {gradient:.2f}.")

The gradient of z at x = 3 is -5.47.


## **Calculate Gradient using PyTorch**

In [4]:
# Example-1
# Define a tensor with gradient tracking enabled
x = torch.tensor(3.0, requires_grad=True)

# Define the function y = x^2
y = x**2

# Print the values of x and y
print("x:", x)
print("y:", y)

# Perform backpropagation to compute the gradient
y.backward()

# Print the gradient of y with respect to x
print("Gradient (dy/dx):", x.grad)

x: tensor(3., requires_grad=True)
y: tensor(9., grad_fn=<PowBackward0>)
Gradient (dy/dx): tensor(6.)


In [5]:
# Example-2
# Define a tensor with gradient tracking enabled
x = torch.tensor(3.0, requires_grad=True)

# Define the function y = x^2
y = x**2

# Define the function z = sin(y)
z = torch.sin(y)

# Print the values of x and y
print("x:", x)
print("y:", y)
print("z:", z)

# Perform backpropagation to compute the gradient
z.backward()

# Print the gradient of y with respect to x
print("Gradient (dy/dx):", x.grad)

x: tensor(3., requires_grad=True)
y: tensor(9., grad_fn=<PowBackward0>)
z: tensor(0.4121, grad_fn=<SinBackward0>)
Gradient (dy/dx): tensor(-5.4668)


## **Manual Gradient of Loss Calculation w.r.t Weight and Bias**

1. Linear Transformation:
$$ z = w \cdot x + b $$
2. Activation:
$$ y_{pred} = σ(z) = \frac{1}{1 + e^{-z}} $$
3. Loss Function (Binary Cross-Entropy Loss):
$$ L = -[y_{target} \cdot \ln(y_{pred}) + (1 - y_{target}) \cdot \ln( - y_{pred})] $$


In [6]:
# Inputs
x = torch.tensor(6.7) # Input feature
y = torch.tensor(0.0) # True Label (Binary)

w = torch.tensor(1.0) # Weight
b = torch.tensor(0.0) # Bias

In [7]:
# Binary Cross-Entropy Loss for scalar
def binary_cross_entropy_loss(prediction, target):
    epsilon = 1e-8
    prediction = torch.clamp(prediction, epsilon, 1-epsilon)
    return -(target * torch.log(prediction) + (1 - target) * torch.log(1 - prediction))

In [10]:
# Forward pass
z = w * x + b # Weighted sum (Linear Transformation)
y_pred = torch.sigmoid(z) # Predicted Probability (Activation)

# Compute binary cross-entropy loss
loss = binary_cross_entropy_loss(y_pred, y)
print(loss)

tensor(6.7012)


In [12]:
# Derivatives:
# 1. dL/d(y_pred): Loss with respect to the prediction (y_pred)
dloss_dy_pred = (y_pred - y) / (y_pred * (1 - y_pred))

# 2. d(y_pred)/dz: Prediction (y_pred) with respect to z (sigmoid derivative)
dy_pred_dz = y_pred * (1 - y_pred)

# 3. dz/dw and dz/db: z with respect to w and b
dz_dw = x # dz/dw = x
dz_db = 1 # dz/db = 1 (bias contributes directly to z)

dL_dw = dloss_dy_pred * dy_pred_dz * dz_dw
dL_db = dloss_dy_pred * dy_pred_dz * dz_db
print(f"Manual Gradient of loss w.r.t weight (dw): {dL_dw:.4f}")
print(f"Manual Gradient of loss w.r.t bias (db): {dL_db:.4f}")

Manual Gradient of loss w.r.t weight (dw): 6.6918
Manual Gradient of loss w.r.t bias (db): 0.9988


## **Automatic Gradient of Loss Calculation w.r.t Weight and Bias using Autograd**

In [14]:
# Inputs
x = torch.tensor(6.7) # Input feature
y = torch.tensor(0.0) # True Label (Binary)

w = torch.tensor(1.0, requires_grad=True) # Weight
b = torch.tensor(0.0, requires_grad=True) # Bias

In [15]:
# Forward pass
z = w * x + b # Weighted sum (Linear Transformation)
y_pred = torch.sigmoid(z) # Predicted Probability (Activation)

# Compute binary cross-entropy loss
loss = binary_cross_entropy_loss(y_pred, y)
print(loss)

tensor(6.7012, grad_fn=<NegBackward0>)


In [16]:
loss.backward()

print(w.grad)
print(b.grad)

tensor(6.6918)
tensor(0.9988)


## **Calculate Gradients for Multiple Inputs**

In [38]:
# Create a PyTorch tensor with multiple inputs
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
print(x)
y = (x ** 2).mean()
print(y)

tensor([1., 2., 3.], requires_grad=True)
tensor(4.6667, grad_fn=<MeanBackward0>)


In [39]:
y.backward()
x.grad

tensor([0.6667, 1.3333, 2.0000])

## **Clearing Gradients**
Gradients can be cleared using the `optimizer.zero_grad()` function when using optimizers. For manually tracking gradients, you can reset the gradients by assigning None to the .grad attribute of the tensor.

In [40]:
x = torch.tensor(6.0, requires_grad=True)
y = (x ** 2)

In [41]:
y.backward()
print(x.grad)
x.grad.zero_()

tensor(12.)


tensor(0.)

## **Disable Gradient Tracking**
n PyTorch, you can disable gradient tracking when gradients are not needed, such as during inference or evaluations, to improve computational efficiency. This is done using the `torch.no_grad() `context manager or by setting `requires_grad=False` for specific tensors.

In [42]:
# Create tensors with gradient tracking enabled
x = torch.tensor(6.7, requires_grad=True)
w = torch.tensor(1.0, requires_grad=True)
b = torch.tensor(0.0, requires_grad=True)

# Perform operations without gradient tracking
with torch.no_grad():
    z = w * x + b  # No gradients will be tracked for this operation
    y_pred = torch.sigmoid(z)

print(f"z: {z}")
print(f"y_pred: {y_pred}")

# Verify that gradients are not tracked
print(f"Requires Grad (z): {z.requires_grad}")  # Output: False

z: 6.699999809265137
y_pred: 0.998770534992218
Requires Grad (z): False
