<a href="https://colab.research.google.com/github/pavanraja753/PyTorch_Learning/blob/main/Autograd.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 4.2 Autograd

Conceptually, the forward pass is a standard tensor computation, and the DAG of tensor operations is required only to compute derivatives.

When executing tensor operations, PyTorch can automatically construct on-the-fly the graph of operations to compute the gradient of any quantity with respect to any tensor involved.

- Simpler syntax: one just needs to write the forward pass as a standard sequence of Python operations,

- greater flexibility: since the graph is not static, the forward pass can be dynamically modulated.

A `Tensor` has a Boolean field `requires_grad`, set to `False` by default, which states if `PyTorch` should build the graph of operations so that gradients with respect to it can be computed.


The result of a tensorial operation has this flag to `True` if any of its operand has it to `True`.

In [2]:
import torch

In [3]:
x = torch.tensor([1.0, 2.0])
y = torch.tensor([4.0, 5.0])
z = torch.tensor([7.0, 3.0])

In [4]:
x.requires_grad

False

In [5]:
(x+y).requires_grad

False

In [6]:
z.requires_grad = True
(x+z).requires_grad

True

## Only floating point type tensors can have their gradient computed.

In [7]:
x = torch.tensor([1., 10.],requires_grad=True)

In [8]:
x = torch.tensor([1, 10],requires_grad=True)

RuntimeError: ignored

`torch.autograd.grad(outputs, inputs)` computes and returns the gradient of outputs with respect to inputs.

In [None]:
t = torch.tensor([1., 2., 4.]).requires_grad_()
u = torch.tensor([10., 20.]).requires_grad_()
a = t.pow(2).sum() + u.log().sum()
torch.autograd.grad(a,(t,u))

`inputs` can be a single tensor, but the result is still a `[one element] tuple.`

If `outputs` is a `tuple`, the result is the `sum of the gradients` of its elements.

<br>

The function `Tensor.backward()` accumulates gradients in the `grad` fields of tensors which are not results of operations, the “leaves” in the autograd graph.

In [None]:
x = torch.tensor([ -3., 2., 5. ]).requires_grad_()
u = x.pow(3).sum()
print(x.grad)
u.backward()
print(x.grad)

This function is an alternative to `torch.autograd.grad(...)` and standard for training models.

<br>

- `Tensor.grad()` is useful in context of deep-learning where the main use is gradient descent, because we need to subtract the gradient of a tensor to the tensor itself.
- To do so with `autograd.grad()`, we would have to associate every gradient to its tensor.


**`Tensor.backward()` accumulates the gradients in the grad fields of tensors, so one may have to set them to `zero` before calling it.**


This accumulating behavior is desirable in particular to compute the gradient of a loss summed over several `mini-batches,` or the gradient of a sum of losses.

## So we can run a forward/backward pass on

In [9]:
w1 = torch.rand(5, 5).requires_grad_()
w2 = torch.rand(5, 5).requires_grad_()
x = torch.empty(5).normal_()

In [10]:
x0 = x
x1 = w1 @ x
x2 = x0 + w2 @ x1
x3 = w1 @ (x1 + x2)

q = x3.norm()

q.backward()

- The difference between Tensorflow (as we saw in lecture 4.1. “DAG networks”) and PyTorch here is that variable q actually contains the result of the computation.

- During the tensor operations, PyTorch built all the necessary operations to compute the gradient if needed.

- When calling `q.backward()`, `PyTorch` actually runs this built graph to fill the grad fields of the parameters.

# 4.3 Autograd Machinery

We can visualize the full graph built during a computation.

The `torch.no_grad()` context switches off the autograd machinery, and can be used for operations such as parameter updates


In [11]:
w = torch.empty(10, 784).normal_(0, 1e-3).requires_grad_()
b = torch.empty(10).normal_(0, 1e-3).requires_grad_()

In [None]:
x = torch.randn(1000,784)
y = torch.randn(1000,1)
for  k in range(100):
    y_hat = x @ w.t() + b
    loss = (y_hat - y).pow(2).mean()
    w.grad, b.grad = None, None
    loss.backward()

    with torch.no_grad():

        w -= 0.1*w.grad
        b -= 0.1*b.grad

    print(w)

- The `detach()` method creates a tensor which shares the data, but does not require gradient computation, and is not connected to the current graph.
- This method should be used when the gradient should not be propagated beyond a variable, or to update leaf tensors.

In [None]:
a = torch.tensor(0.5,requires_grad=True)
b = torch.tensor(-0.5, requires_grad=True)

for k in range(100):
    loss = (a - 1)**2 + (b + 1)**2 + (a - b)**2
    torch.autograd.grad(loss, (a,b))
    with torch.no_grad():
        a = a - grad_a
        b = b - grad_b

#print(a,b)