
Autograd: Automatic Differentiation
===================================

Central to all neural networks in PyTorch is the ``autograd`` package.

The ``autograd`` package provides automatic differentiation for all operations
on Tensors. It is a define-by-run framework, which means that your backprop is
defined by how your code is run, and that every single iteration can be
different.


Tensor
--------

``torch.Tensor`` is the central class of the package. If you set its attribute
``.requires_grad`` as ``True``, it starts to track all operations on it. When
you finish your computation you can call ``.backward()`` and have all the
gradients computed automatically. The gradient for this tensor will be
accumulated into ``.grad`` attribute.

In [None]:
import torch

Create a tensor and set `requires_grad=True` to track computation with it. 

In [None]:
x = torch.ones(2, 2, requires_grad=True)
print(x)

It is equivalent to call `Tensor.requires_grad_()`.

In [None]:
x = torch.ones(2, 2).requires_grad_()
print(x)

Do a tensor operation:



In [None]:
y = x + 2
print(y)

Do more operations on ``y``



In [None]:
z = y * y * 3
out = z.mean()
print(z)
print(out)

Gradients
---------
Let's backprop now.
Because ``out`` contains a single scalar, ``out.backward()`` is
equivalent to ``out.backward(torch.tensor(1.))``.



In [None]:
out.backward()

Print gradients d(out)/dx




In [None]:
print(x.grad)

You should have got a matrix of ``4.5``. Let’s call the ``out``
*Tensor* “$o$”.
We have that $o = \frac{1}{4}\sum_i z_i$,
$z_i = 3(x_i+2)^2$ and $z_i\bigr\rvert_{x_i=1} = 27$.
Therefore,
$\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$, hence
$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$.



**Read Later:**

Documentation of ``autograd`` and ``Function`` is at
https://pytorch.org/docs/autograd

