# Intro to Torch Autograd

Autograd *the* core concept of PyTorch.  It allows to keep a history of the
partial derivatives of all operations performed on tensors. Of course, all ML frameworks
do this in some way. The crucial difference is that autograd does it *at runtime*. This allows
to correctly track gradients for conditional executions.

We will have a brief look at how autograd works.

[Autograd Reference Documentation](https://pytorch.org/docs/stable/autograd.html)

In [None]:
import torch

import matplotlib.pyplot as plt
import math

Most tensor factories support the `requires_grad` argument. When set to `True`, it enables
gradient tracking for subsequent operations on the tensor.


In [None]:
a = torch.linspace(0., 2. * math.pi, steps=25, requires_grad=True)
print(a)

Let's perform a computation and plot the result. Never mind the `detach()`, we'll explain
that later.


In [None]:
b = torch.sin(a)
plt.plot(a.detach(), b.detach())

Let’s have a closer look at the tensor b. When we print it,
we see an indicator that it is tracking its computation history:


In [None]:
print(b)

The `grad_fn` indicates the derivative of `sin()` needs to be computed during gradient
backpropagation.

Let's add a few more operations to make this more interesting.

In [None]:
c = 2 * b
print(c)

d = c + 1
print(d)



Finally, let’s compute a single-element output. When you call `.backward()`
on a tensor with no arguments, it expects tensor object to contain only a single element,
as is the case when computing a loss function.


In [None]:
out = d.sum()
print(out)

The `grad_fn` field allows to trace back all operations through the `next_functions` property.


In [None]:
print('d:')
print(d.grad_fn)
print(d.grad_fn.next_functions)
print(d.grad_fn.next_functions[0][0].next_functions)
print(d.grad_fn.next_functions[0][0].next_functions[0][0].next_functions)
print(d.grad_fn.next_functions[0][0].next_functions[0][0].next_functions[0][0].next_functions)
print('\nc:')
print(c.grad_fn)
print('\nb:')
print(b.grad_fn)
print('\na:')
print(a.grad_fn)

We can use the `backward()` method on the output and access the `grad`
property to get the actual derivatives.

Note this is indeed the derivative of $2 \sin(a) +1$!

In [None]:
out.backward()
print(a.grad)
plt.plot(a.detach(), a.grad.detach())


## Turning Autograd Off and On

There are situations where you will need fine-grained control over whether autograd
is enabled. There are multiple ways to do this, depending on the situation.

The simplest is to change the `requires_grad` flag on a tensor directly:

In [None]:
a = torch.ones(2, 3, requires_grad=True)
print(a)

b1 = 2 * a
print(b1)

a.requires_grad = False
b2 = 2 * a
print(b2)

If you only need autograd turned off temporarily, a better way is to use `torch.no_grad()`:


In [None]:
a = torch.ones(2, 3, requires_grad=True) * 2
b = torch.ones(2, 3, requires_grad=True) * 3

c1 = a + b
print(c1)

with torch.no_grad():
    c2 = a + b

print(c2)

c3 = a * b
print(c3)



`torch.no_grad()` can also be used as a function or method decorator:

(There’s a corresponding context manager, `torch.enable_grad()`, for turning autograd on.
It may also be used as a decorator.)

In [None]:
def add_tensors1(x, y):
    return x + y

@torch.no_grad()
def add_tensors2(x, y):
    return x + y


a = torch.ones(2, 3, requires_grad=True) * 2
b = torch.ones(2, 3, requires_grad=True) * 3

c1 = add_tensors1(a, b)
print(c1)

c2 = add_tensors2(a, b)
print(c2)

Finally, you may have a tensor that requires gradient tracking,
but you want a copy that does not. For this we have the Tensor object’s `detach()` method.
It creates a copy of the tensor that is detached from the computation history.

We did this above when we wanted to graph some of our tensors. This is because matplotlib expects a NumPy array as input, and the implicit conversion
from a PyTorch tensor to a NumPy array is not enabled for tensors with `requires_grad=True`.

In [None]:
x = torch.rand(5, requires_grad=True)
y = x.detach()

print(x)
print(y)



## Pitfalls

### Autograd and In-place Operations

In every example in this notebook so far, we’ve used variables to capture the intermediate values of a computation.
Autograd needs these intermediate values to perform gradient computations. For this reason, you must be careful about
using in-place operations when using autograd. Doing so can destroy information you need to compute derivatives in the
`backward()` call. PyTorch will even stop you if you attempt an in-place operation on leaf variable that requires
autograd, as shown below.

In [None]:
a = torch.linspace(0., 2. * math.pi, steps=25, requires_grad=True)
torch.sin_(a)


### Single-element tensors are not python scalars!

This can seemingly lead to memory leaks when gradient tracking is enabled.
This is a common mistake when monitoring model training.


In [None]:
print(out)
accu = 0.0
print(accu)
accu = accu + 2 * out
print(accu)

This can be avoided by using the `item()` method.


In [None]:
print(out)
print(out.item())
accu = 0.0
accu = accu + 2 * out.item()
print(accu)


