In [6]:
import torch
import matplotlib.pyplot as plt

1. Automatic Differentiation with torch.autograd

    Forward Propagation:
        Computes the model's output by passing the input data through the network layers. It is often called Forward pass.

    Backward Propagation:
        Calculates the gradients of the loss with respect to the model's parameters using the chain rule, enabling parameter updates to minimize the loss.



1.1. torch.autograd

    We create two tensors x and y with requires_grad=True, indicating that we want to compute gradients for these tensors.
    We perform simple operations on x and y to obtain z.
    Computing Gradients: We call z.backward() to compute the gradients of z with respect to x and y. The gradients are stored in the grad attribute of each tensor.

In the following example:

    The operation is z=x⋅y+y^2

The partial derivative of z
with respect to x is ∂z/∂x = y

The partial derivative of z
with respect to y is ∂z/∂y = x+2y

Given x=2.0
and y=3.0

The gradient of z w.r.t. x is 3.0
The gradient of z w.r.t. y is 2.0+2⋅3.0=8.0

Tensors that require gradients will have their operations tracked by PyTorch's autograd engine, enabling the computation of gradients during backpropagation.

In [7]:
# Create tensors with requires_grad=True
x = torch.tensor([2.0, 5.0], requires_grad=True)
y = torch.tensor([3.0, 7.0], requires_grad=True)

# Perform some operations
z = x * y + y**2

z.retain_grad() #By default intermediate layer weight updation is not shown.

# Compute the gradients
z_sum = z.sum().backward()


print(f"Gradient of x: {x.grad}")
print(f"Gradient of y: {y.grad}")
print(f"Gradient of z: {z.grad}")
print(f"Result of the operation: z = {z.detach()}")

Gradient of x: tensor([3., 7.])
Gradient of y: tensor([ 8., 19.])
Gradient of z: tensor([1., 1.])
Result of the operation: z = tensor([15., 84.])


In [8]:
from torchviz import make_dot

# Visualize the computation graph
dot = make_dot(z, params={"x": x, "y": y, "z" : z})
dot.render("grad_computation_graph", format="png")

'grad_computation_graph.png'

1.3. Detaching Tensors from Computation Graph

The detach() method is used to create a new tensor that shares storage with the original tensor but without tracking operations. When you call detach(), it returns a new tensor that does not require gradients. This is useful when you want to perform operations on a tensor without affecting the computation graph.

In [9]:
# Let's detach z from the computation graph
print("Before detaching z from computation: ", z.requires_grad)
z_det = z.detach()
print("After detaching z from computation: ", z_det.requires_grad)

Before detaching z from computation:  True
After detaching z from computation:  False


1.4. Can Backpropagation be performed when requires_grad=False?

Now the same tensors x and y are created with requires_grad=False.

When attempting to compute the gradients using z.backward(), a RuntimeError is raised because the tensors do not require gradients, and thus do not have a grad_fn.

In this case, since requires_grad=False was used, the computation graph is essentially empty, as no gradients will be tracked.


In [10]:
x = torch.tensor(2.0, requires_grad=False)
y = torch.tensor(3.0, requires_grad=False)


# Perform simple operations
z = x * y + y**2


# Compute the gradients
z.backward()

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

2. Backpropagation in Neural Networks


    The loss is calculate between prediction and target using loss(predcition,target)
    Then backpropagation is performed using loss.backward()
    We update the new weights using optimizer.step()


Quiz questions and answers:

1. What does .backward() method and .grad attributes perform?

- computing gradient; storing gradient correct 

2. What does the retain_graph=True argument in the backward() function do?

- It retains the computation graph after the backward pass for further backward calls.

3. In backpropagation, if you have a non-differentiable operation (like Dropout() or ReLU()) in the computation graph, how does PyTorch compute gradients for other operations?

- Autograd will apply masking for non-differentiable operations (like ReLU, and dropout) and propagate gradients for differentiable operations. 

4. What will happen when calling .backward() on a tensor with more than one element?

- PyTorch will raise a runtime error (refer to section 1.1 and the variable z_sum)

5.  import torch
    Y = torch.tensor([1.0,], requires_grad=True)
    with torch.no_grad():
	    new_tensor = Y*2
	    print(new_tensor.requires_grad, Y.requires_grad)

What will be the output of the above program?

- False, True