# Automatic Gradient Calculation

To train a neural network, we need to use the algorithm called back propagation. This alogirhtm requires the gradient of the loss function with respect to the parameters of the model. In this notebook, we will see how to calculate the gradient of a function using automatic differentiation.

PyTorch provides a built-in module, `torch.autograd`, for automatic differentiation. We can use it to automatically calculate the gradients of the loss function with respect to the parameters.

In [1]:
import torch

x = torch.ones(5) # input tensor
y = torch.zeros(3) # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w) + b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

## Tensors, Functions and Computational graph

![](https://pytorch.org/tutorials/_static/img/basics/comp-graph.png)

`w` and `b` are the parameters of the model that we want to optimize. We should be able to compute the gradients of the loss function with respect to these variables. To do this, we set the `requires_grad` property of the tensors. 


In [2]:
print('Gradient function for z =', z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)

Gradient function for z = <AddBackward0 object at 0x000001F650ADA350>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x000001F650ADA4A0>


The functions we apply to tensors are an object of class `Function`. The object knows how to compute the function in the forward direction and its derivative during the backward pass. A reference to the backward propagation. The reference to backward propagation function is stored in the `grad_fn` property of a tensor.

## Computing Gradients

$\frac{\partial loss}{\partial w}$ and $\frac{\partial loss}{\partial b}$ can be computed by calling `loss.backward()`. The value of the gradients are stored in the `.grad` property of the respective tensors.

In [3]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.0023, 0.0586, 0.3229],
        [0.0023, 0.0586, 0.3229],
        [0.0023, 0.0586, 0.3229],
        [0.0023, 0.0586, 0.3229],
        [0.0023, 0.0586, 0.3229]])
tensor([0.0023, 0.0586, 0.3229])


## Disabling Gradient Tracking

All tensors have a default of `requires_grad=True`. 

For only forward computational tensors, we can set the `requires_grad` property to `False`. This can be done using the `torch.no_grad()` context manager.
Or, we can also use the `detach()` method on the tensor to create a new tensor that does not require gradients.

In [5]:
z=torch.matmul(x, w) + b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w) + b
print(z.requires_grad)

True
False


In [6]:
z = torch.matmul(x, w) + b
z_det = z.detach()
print(z_det.requires_grad)

False


Those techniques are useful for the following situations:
- To make some parameters in the neural network to be frozen.
- To speed up computations when we are only doing forward pass.

## More on Computational Graphs

Auto gradient keeps a record of data (tensors) and all executed operations (along with the resulting new tensors) in a directed acyclic graph (DAG). This is called a computational graph.

Forward Pass
- run the input tensor through the model
- maintain a record of the operations in the DAG

Backward Pass
- compute the gradients of the loss function with respect to the parameters using back propagation
- the gradients are computed using the chain rule

## Tensor Gradients and Jacobian Products

PyTorch computes the *Jacobian product* rather than the Jacobian matrix itself.

Vector function $\vec{y}=f(\vec{x})$ where $\vec{x}=<x_1, x_2, ... x_n>$ and $\vec{y}=<y_1, y_2, ... y_m>$.

The Jacobian matrix is given by:
$$
J=\begin{pmatrix}
\frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n} \\
\frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n}
\end{pmatrix}
$$

When $v$ is the gradient tensor of the scalar loss function with respect to $\vec{y}$.
$$
v=
\begin{pmatrix}
\frac{\partial loss}{\partial y_1} \\
\frac{\partial loss}{\partial y_2} \\
\vdots \\
\frac{\partial loss}{\partial y_m}
\end{pmatrix}
$$

The **Jacobian product** is $v^T \cdot J$. 
$$
v^T \cdot J=
\begin{pmatrix}
\frac{\partial loss}{\partial y_1} & \frac{\partial loss}{\partial y_2} & \cdots & \frac{\partial loss}{\partial y_m}
\end{pmatrix}
\begin{pmatrix}
\frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n} \\
\frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n}
\end{pmatrix} =
\begin{pmatrix}
\frac{\partial loss}{\partial x_1} \\
\frac{\partial loss}{\partial x_2} \\
\vdots \\
\frac{\partial loss}{\partial x_n}
\end{pmatrix}
$$



This product can be computed using the `backward` method of the tensor.

In [7]:
inp = torch.eye(5, requires_grad=True)
out = (inp+1).pow(2).T
out.backward(torch.ones_like(out), retain_graph=True)
print("First call\n", inp.grad)
out.backward(torch.ones_like(out), retain_graph=True)
print("\nSecond call\n", inp.grad)
inp.grad.zero_()
out.backward(torch.ones_like(out), retain_graph=True)
print("\nCall after zeroing gradients\n", inp.grad)

First call
 tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.],
        [2., 2., 2., 2., 4.]])

Second call
 tensor([[8., 4., 4., 4., 4.],
        [4., 8., 4., 4., 4.],
        [4., 4., 8., 4., 4.],
        [4., 4., 4., 8., 4.],
        [4., 4., 4., 4., 8.]])

Call after zeroing gradients
 tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.],
        [2., 2., 2., 2., 4.]])
