In [2]:
import torch
import matplotlib.pyplot as plt
import numpy as np

# 1. Automatic Differentiation with `torch.autograd`

Before proceeding autograd, will understand the basic terms:

- **Forward Propagation**:
  - Computes the model's output by passing the input data through the network layers. It is often called Forward pass.

- **Backward Propagation**:
  - Calculates the gradients of the loss with respect to the model's parameters using the chain rule, enabling parameter updates to minimize the loss.

### 1.1 ```torch.autograd```



- We create two tensors `x` and `y` with `requires_grad=True`, indicating that we want to compute gradients for these tensors.



- We perform simple operations on `x` and `y` to obtain `z`.

- Computing Gradients:
We call `z.backward()` to compute the gradients of `z` with respect to `x` and `y`. The gradients are stored in the `grad` attribute of each tensor.




In the following example:


- The operation is $ z = x \cdot y + y^2 $.
- The partial derivative of $ z $ with respect to $ x $ is $ \frac{\partial z}{\partial x} = y $.
- The partial derivative of $ z $ with respect to $ y $ is $ \frac{\partial z}{\partial y} = x + 2y $.

Given $ x = 2.0 $ and $ y = 3.0 $:

- The gradient of $ z $ w.r.t. $ x $ is $ 3.0 $.
- The gradient of $ z $ w.r.t. $ y $ is $ 2.0 + 2 \cdot 3.0 = 8.0 $.

Tensors that require gradients will have their operations tracked by PyTorch's autograd engine, enabling the computation of gradients during backpropagation.


<img src=https://learnopencv.com/wp-content/uploads/2024/07/Autograd-Computation-Graph-2-2.png height = 500>


The automatic differentiation provided by `torch.autograd` simplifies this process by handling the complex chain rule calculations needed for backpropagation through the entire network.






For $\frac{\partial z}{\partial x}$:

$$\frac{\partial z}{\partial x} = \frac{\partial z}{\partial p} \frac{\partial p}{\partial x} + \frac{\partial z}{\partial q} \frac{\partial q}{\partial x} = 1 \cdot y + 1 \cdot 0 = y$$

For $\frac{\partial z}{\partial y}$:

$$\frac{\partial z}{\partial y} = \frac{\partial z}{\partial p} \frac{\partial p}{\partial y} + \frac{\partial z}{\partial q} \frac{\partial q}{\partial y} = 1 \cdot x + 1 \cdot 2y = x + 2y$$

These equations correspond to the chain rule calculations happening behind the scenes, demonstrating how PyTorch's autograd system computes gradients through the computational graph.

In [6]:
# Create tensors with requires_grad=True
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = torch.tensor([[4.0], [5.0]], requires_grad=True)

# Perform some operations
z = x * y + y**2

z.retain_grad()  # Retain gradients for z

# Compute the gradients
z_sum = z.sum()
z_sum.backward()

print(z)
print(z_sum)
print(f"Gradient of x: {x.grad}")
print(f"Gradient of y: {y.grad}")
print(f"Gradient of z: {z.grad}")
print(f"Result of the operation: z = {z.detach()}")

tensor([[20., 24., 28.],
        [30., 35., 40.]], grad_fn=<AddBackward0>)
tensor(177., grad_fn=<SumBackward0>)
Gradient of x: tensor([9., 9., 9.])
Gradient of y: tensor([[30.],
        [36.]])
Gradient of z: tensor([[1., 1., 1.],
        [1., 1., 1.]])
Result of the operation: z = tensor([[20., 24., 28.],
        [30., 35., 40.]])


In [10]:
x, y, x.shape, y.shape, (x * y).shape, x*y

(tensor([1., 2., 3.], requires_grad=True),
 tensor([[4.],
         [5.]], requires_grad=True),
 torch.Size([3]),
 torch.Size([2, 1]),
 torch.Size([2, 3]),
 tensor([[ 4.,  8., 12.],
         [ 5., 10., 15.]], grad_fn=<MulBackward0>))

In [11]:
y**2

tensor([[16.],
        [25.]], grad_fn=<PowBackward0>)

### 1.2 Gradient Computation Graph

## 1.2. Gradient Computation Graph


A computation graph is a visual representation of the sequence of operations performed on tensors in a neural network, showing how each operation contributes to the final result. It is crucial for understanding and debugging the flow of data and gradients in deep learning models.

[torchviz](https://github.com/szagoruyko/pytorchviz) is a tool used to visualize the computation graph of any PyTorch model.


<img src=https://learnopencv.com/wp-content/uploads/2024/07/Autograd-Operators-Graph-1-1.png height = 500 >
