<a href="https://colab.research.google.com/github/srujanperam/pytorch_beginner_practice/blob/main/Backpropagation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import torch

In [2]:
x = torch.tensor(4.0, requires_grad=True)

In [3]:
x

tensor(4., requires_grad=True)

In [4]:
y = x**2
y

tensor(16., grad_fn=<PowBackward0>)

In [5]:
y.backward()

In [6]:
# prompt: what is y.backward() and why we need to call to get x.grad? what does this has to do with ANN

x.grad

tensor(8.)

In [7]:
print(x.grad)

tensor(8.)


In [17]:
lst = [[2.,3.,1.],[4.,5.,3.],[7.,6.,4.]]
torch_input = torch.tensor(lst, requires_grad=True)

In [18]:
torch_input

tensor([[2., 3., 1.],
        [4., 5., 3.],
        [7., 6., 4.]], requires_grad=True)

In [19]:
torch_input**3

tensor([[  8.,  27.,   1.],
        [ 64., 125.,  27.],
        [343., 216.,  64.]], grad_fn=<PowBackward0>)

In [20]:
torch_input**2

tensor([[ 4.,  9.,  1.],
        [16., 25.,  9.],
        [49., 36., 16.]], grad_fn=<PowBackward0>)

In [21]:
y = torch_input**3 + torch_input**2

In [22]:
y

tensor([[ 12.,  36.,   2.],
        [ 80., 150.,  36.],
        [392., 252.,  80.]], grad_fn=<AddBackward0>)

In [23]:
z= y.sum()
z

tensor(1040., grad_fn=<SumBackward0>)

In [24]:
z.backward()

In [26]:
torch_input.grad

tensor([[ 16.,  33.,   5.],
        [ 56.,  85.,  33.],
        [161., 120.,  56.]])

In [None]:
# prompt: explain torch_input.grad ouput with mathmatical

Okay, let's break down the `torch_input.grad` output with mathematical explanation.

**Background:**

* **Computational Graph:** PyTorch automatically constructs a computational graph when you perform operations on tensors with `requires_grad=True`. This graph tracks the operations and allows for automatic differentiation.
* **Backpropagation:** When you call `z.backward()`, PyTorch performs backpropagation through the computational graph. It calculates the gradients of `z` with respect to each tensor involved in the computation.
* **Gradient:** The gradient represents the rate of change of a function with respect to its input variables. In this case, we are interested in the gradient of `z` with respect to `torch_input`.


**Mathematical Explanation for `torch_input.grad`:**

1. **`y = torch_input**3 + torch_input**2`:**  We have a function `y` defined as a sum of two terms: `torch_input**3` and `torch_input**2`.
2. **`z = y.sum()`:** We are calculating the sum of all elements in `y`, which is our scalar output `z`.
3. **Backpropagation (`z.backward()`):**  We want to compute the gradient of `z` with respect to each element in `torch_input`.

**Let's focus on calculating the gradient of `z` with respect to a single element of `torch_input`, let's say `torch_input[i, j]`:**

* **Chain Rule:** We'll use the chain rule to calculate this gradient. The chain rule states that if we have a function `z = f(y)` and `y = g(x)`, then the derivative of `z` with respect to `x` is `dz/dx = dz/dy * dy/dx`.
* **Gradient of `z` with respect to `y[i, j]`:** Since `z` is the sum of all elements in `y`, the gradient of `z` with respect to `y[i, j]` is simply `dz/dy[i, j] = 1`.
* **Gradient of `y[i, j]` with respect to `torch_input[i, j]`:**
   * `y[i, j] = torch_input[i, j]**3 + torch_input[i, j]**2`.
   * `dy[i, j]/d(torch_input[i, j]) = 3 * torch_input[i, j]**2 + 2 * torch_input[i, j]` (using the power rule of differentiation).
* **Combining the chain rule:**
   `dz/d(torch_input[i, j]) = dz/dy[i, j] * dy[i, j]/d(torch_input[i, j]) = 1 * (3 * torch_input[i, j]**2 + 2 * torch_input[i, j])`.


**In essence, each element of `torch_input.grad` will contain the following value:**

`3 * torch_input[i, j]**2 + 2 * torch_input[i, j]`


**Example:** If `torch_input[0, 0]` is 2, then the corresponding element in `torch_input.grad` will be:
`3 * 2**2 + 2 * 2 = 12 + 4 = 16`.


**In Summary:** `torch_input.grad` provides the gradient of the scalar output `z` with respect to each element in `torch_input`. This gradient is calculated using the chain rule and is essential for updating the `torch_input` values during optimization (e.g., in gradient descent) to minimize the `z` value.