# Autograd: automatic differentiation

- `torch.Tensor` is the central class of the package.
- If you set its attribute `.requires_grad` as `True`, it starts to track all operations on it.
- When you finish your computation you can call `.backward()` and have all the gradients computed automatically.
- The gradient for this tensor will be accumulated into `.grad` attribute.

- To stop a tensor from tracking history, you can call `.detach()` to detach it from the computation history, and to prevent future computation from being tracked.
- To prevent tracking history (and using memory), you can also wrap the code block in `with torch.no_grad():`.
- This can be particularly helpful when evaluating a model because the model may have trainable parameters with `requires_grad=True`, but for which we don’t need the gradients.

- There's one more class which is very important for autograd implementation: a `Function`.

- `Tensor` and `Function` are interconnected and build up an acyclic graph, that encodes a complete history of computation.

- Each tensor has a `.grad_fn` attribute that references a `Function` that has created the Tensor (except for Tensors created by the user - their `grad_fn` is None).

- If you want to compute the derivatives, you can call `.backward()` on a `Tensor`.

- If `Tensor` is a scalar (i.e. it holds a one element data), you don't need to specify any arguments to `backward()`, however if it has more elements, you need to specify a gradient argument that is a tensor of matching shape.

In [1]:
import torch
torch.__version__

'1.6.0'

Create a tensor and set `requires_grad=True` to track computation with it

In [16]:
x = torch.ones(2, 2, requires_grad=True)
x

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)

Do some operations

In [17]:
y = x + 2
y

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)

`y` was created as a result of an operation, so it has a `grad_fn`.

In [18]:
y.grad_fn

<AddBackward0 at 0x7f38eb4926a0>

Do more operations on `y`:

In [19]:
z = y * y * 3
z

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>)

In [20]:
out = z.mean()
out

tensor(27., grad_fn=<MeanBackward0>)

`.requires_grad_()` changes an existing Tensor's `requires_grad` flag in-place.

In [9]:
a = torch.randn(2,2)
a = ((a*3)/(a-1))
a

tensor([[ 1.4668, -3.3510],
        [ 5.7669, -3.4547]])

In [10]:
a.requires_grad

False

In [11]:
a.requires_grad_(True)
a.requires_grad

True

In [12]:
b = (a*a).sum()
b

tensor(58.5728, grad_fn=<SumBackward0>)

In [13]:
b.grad_fn

<SumBackward0 at 0x7f38eb495df0>

# Calculating gradient

In [14]:
out

tensor(27., grad_fn=<MeanBackward0>)

In [21]:
out.backward()

In [22]:
x.grad

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

$$
o = \frac{1}{4} \sum_{i} z_{i}
$$

$$
z_{i} = 3(x_{i} + 2)^2
$$

$$
\left. z_{i} \right|_{x_{i}=1} = 27
$$

$$
\frac{\partial o}{\partial x_{i}} = \frac{3}{2}\left( x_{i} + 2 \right)
$$

$$
\left. \frac{\partial o}{\partial x_{i}} \right|_{x_{i}=1} = \frac{9}{2} = 4.5
$$

Generally speaking, torch.autograd is an engine for computing vector-Jacobian product. That is, given any vector v=(v1v2⋯vm)T, compute the product vT⋅J. If v happens to be the gradient of a scalar function l=g(y⃗ ), that is, v=(∂l∂y1⋯∂l∂ym)T, then by the chain rule, the vector-Jacobian product would be the gradient of l with respect to x⃗ :

(Note that vT⋅J gives a row vector which can be treated as a column vector by taking JT⋅v.)

This characteristic of vector-Jacobian product makes it very convenient to feed external gradients into a model that has non-scalar output.

Now let’s take a look at an example of vector-Jacobian product:

In [23]:
x = torch.randn(3, requires_grad=True)

In [24]:
y = x*2
while y.data.norm() < 1000:
    y = y * 2
y

tensor([  26.0017, -251.4094, 1184.7880], grad_fn=<MulBackward0>)

In [25]:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)

In [26]:
x.grad

tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])

Stop autograd from tracking history on Tensors.

In [27]:
x.requires_grad

True

In [28]:
(x**2).requires_grad

True

In [30]:
with torch.no_grad():
    print( (x**2).requires_grad )

False


In [31]:
x.requires_grad

True

In [32]:
y = x.detach()

In [33]:
y.requires_grad

False

In [34]:
x.eq(y)

tensor([True, True, True])

In [35]:
x, y

(tensor([ 0.0254, -0.2455,  1.1570], requires_grad=True),
 tensor([ 0.0254, -0.2455,  1.1570]))