In [0]:
%matplotlib inline

Autograd: automatic derivation mechanism
===================================

The core of all neural networks in PyTorch is the `` autograd`` package.
Let's briefly introduce this package, and then train the first simple neural network.

The `` autograd '' package provides automatic derivation for all operations on tensors.
It is a framework defined at runtime, which means that backpropagation is based on your code to determine how to run, and each iteration can be different.


Examples

Tensor
--------

`` torch.Tensor '' is the core class of this package. If set
`` .requires_grad '' is `` True``, then all operations on the tensor will be tracked.
When the calculation is completed, all gradients are calculated automatically by calling `` .backward () ``,
All gradients of this tensor will automatically accumulate in the `` .grad`` attribute.

To prevent the tensor from tracking the history, you can call the `` .detach () '' method to separate it from the calculation history and prohibit tracking of future calculation records.

To prevent tracking of history (and use of memory), you can wrap the code block in `` with torch.no_grad (): ''.
It is particularly useful when evaluating the model, because the model may have trainable parameters of `requires_grad = True`, but we do not need gradient calculations.

There is another important class `` Function '' in automatic gradient calculation.

`` Tensor`` and `` Function`` are interconnected and build up an acyclic
graph, that encodes a complete history of computation. Each tensor has
a `` .grad_fn`` attribute that references a `` Function`` that has created
the `` Tensor`` (except for Tensors created by the user-their
`` grad_fn is None``).

`` Tensor`` and `` Function`` are connected to each other and generate an acyclic graph, which represents and stores the complete calculation history.
Each tensor has a `` .grad_fn '' attribute, which refers to a `` Function '' that created `` Tensor '' (unless this tensor was manually created by the user, that is, this tensor
`` grad_fn`` is `` None``).

If you need to calculate the derivative, you can call .backward () on Tensor.
If `` Tensor '' is a scalar (that is, it contains an element data), there is no need to specify any parameters for `` backward () '',
But if it has more elements, you need to specify a `` gradient '' parameter to match the shape of the tensor.

*** Translator's Note: In other articles you may see that wrapping Tensor into Variable provides automatic gradient calculation. Variable has been marked as expired in version 0.41, and you can use Tensor directly now it's here:***
(https://pytorch.org/docs/stable/autograd.html#variable-deprecated)

There will be a detailed description later

In [0]:
import torch

Create a tensor and set requires_grad = True to track his calculation history


In [0]:
x = torch.ones(2, 2, requires_grad=True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


Operate on tensors:

In [0]:
y = x + 2
print(y)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


The result `` y `` has been calculated, so, `` grad_fn `` has been automatically generated.

In [0]:
print(y.grad_fn)

<AddBackward0 object at 0x000002004F7CC248>


Perform an operation on y


In [0]:
z = y * y * 3
out = z.mean()

print(z, out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


`` .requires_grad_ `` (...)  can change the `` requires_grad`` attribute of an existing tensor.


If not specified, the default input flag is `` False``.


In [0]:
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x000002004F7D5608>


gradient
---------
Back propagation
Because `` out `` is a scalar, `` out.backward () `` is equal to `` out.backward (torch.tensor (1)) ``.



In [0]:
out.backward()

print gradients d(out)/dx




In [0]:
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


Get the matrix `` 4.5 ``. Call `` out ``
*Tensor* “$o$”.

$o = \frac{1}{4}\sum_i z_i$,
$z_i = 3(x_i+2)^2$ $z_i\bigr\rvert_{x_i=1} = 27$.

,
$\frac{\partial o}{\partial x_i} = \frac{3}{2}(x_i+2)$, 
$\frac{\partial o}{\partial x_i}\bigr\rvert_{x_i=1} = \frac{9}{2} = 4.5$.



Mathematically, if we have a vector-valued function $\vec{y} = f(\vec{x}))$ ， the gradient of $\vec{y}$ with respect to $\vec{x}$ is a Jacobian Matrix (Jacobian matrix)：

$J = \begin{pmatrix} \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}} \\ \vdots & \ddots & \vdots \\ \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}} \end{pmatrix}$

Generally speaking, `torch.autograd` is a tool for calculating vector-Jacobian product. That is, given any vector $v=(v_{1}\;v_{2}\;\cdots\;v_{m})^{T}$ ，calculate $v^{T}\cdot J$ ， if $v$ happens to be the gradient of the scalar function $l=g(\vec{y})$ , which means  $v=(\frac{\partial l}{\partial  y_{1}}\;\cdots\;\frac{\partial l}{\partial  y_{m}})^{T}$，then according to the chain rule, the vector-Jacobian product is the gradient of $\vec{x}$：

$J^{T}\cdot v = \begin{pmatrix} \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{1}} \\ \vdots & \ddots & \vdots \\ \frac{\partial y_{1}}{\partial x_{n}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}} \end{pmatrix} \begin{pmatrix} \frac{\partial l}{\partial y_{1}}\\ \vdots \\ \frac{\partial l}{\partial y_{m}} \end{pmatrix} = \begin{pmatrix} \frac{\partial l}{\partial x_{1}}\\ \vdots \\ \frac{\partial l}{\partial x_{n}} \end{pmatrix}$

(Note that $v^{T}\cdot J$ gives a row vector, which can be treated as a column vector by $ J ^ {T} \cdot v $)

This feature makes it very convenient to return external gradients to models with nonscalar outputs.


Now let's look at an example of vector-Jacobian product

In [0]:
x = torch.randn(3, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
    y = y * 2

print(y)

tensor([ 293.4463,   50.6356, 1031.2501], grad_fn=<MulBackward0>)


In this case, `y` is no longer a scalar. `torch.autograd` cannot directly calculate the complete Jacobian ranks, but if we only want the vector-Jacobian product, we just pass the vector as a parameter to` backward`:

In [0]:
gradients = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(gradients)

print(x.grad)

tensor([5.1200e+01, 5.1200e+02, 5.1200e-02])


If `` .requires_grad = True `` but you do n’t want autograd calculation
Then you can wrap the variable in `` with torch.no_grad () ``:


In [0]:
print(x.requires_grad)
print((x ** 2).requires_grad)

with torch.no_grad():
	print((x ** 2).requires_grad)

True
True
False


** Read later: **

  Official documentation for `` autograd`` and `` Function`` https://pytorch.org/docs/autograd

