# AutoGrad: Automatic Differentiation
The __autograd__ package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that the backpropagation is defined by how the code is run, and that every single iteration can be different.
### Tensor
__torch.Tensor__ is the central class of the package. If we set its attribute __.requires_grad__ as __True__, it starts to track all operations on it. When we finish the computation we can call __.backward()__ and have all the gradients computed automatically. The gradient for this tensor will be accumulated into __.grad__ attribute.

To stop a tensor from tracking history, we can call __.detach()__ to detach it from the computation history, and to prevent future computation from being tracked.

To prevent tracking history (and using memory), we can also wrap the code block in with __torch.no_grad():__. This can be particularly helpful when evaluating a model because the model may have trainable parameters with __requires_grad=True__, but for which we don’t need the gradients.

There’s one more class which is very important for autograd implementation - a __Function__.

__Tensor__ and __Function__ are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a __.grad_fn__ attribute that references a __Function__ that has created the __Tensor__ (except for Tensors created by the user - their __grad_fn__ is __None__).

If we want to compute the derivatives, we can call __.backward()__ on a __Tensor__. If __Tensor__ is a scalar (i.e. it holds a one element data), we don’t need to specify any arguments to __backward()__, however if it has more elements, we need to specify a gradient argument that is a tensor of matching shape.

In [1]:
import torch

Create a tensor and set __requires_grad=True__ to track computation with it

In [2]:
x = torch.ones(2, 2, requires_grad=True)
print(x)

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)


Do a tensor operation

In [3]:
y = x + 2
print(y)

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)


__y__ was created as a result of an operation, so it has a __grad_fn__

In [4]:
print(y.grad_fn)

<AddBackward0 object at 0x000002460A82BA48>


Do more operations on __y__

In [5]:
z = y * y * 3
out = z.mean()
print(z, out)

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)


__.requires_grad_( ... )__ changes an existing Tensor’s __requires_grad__ flag in-place. The input flag defaults to __False__ if not given

In [6]:
a = torch.randn(2, 2)
a = ((a * 3) / (a - 1))
print(a.requires_grad)
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

False
True
<SumBackward0 object at 0x000002460A830D88>


### Gradients

Do backprop now. Because __out__ contains a single scalar, __out.backward()__ is equivalent to __out.backward(torch.tensor(1.))

In [7]:
out.backward()

Print gradients __d(out)/dx__

In [8]:
print(x.grad)

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])


Mathematically, if we have a vector valued function __y⃗ =f(x⃗ )__, then the gradient of __y⃗__  with respect to __x⃗__  is a __Jacobian matrix__.

Generally speaking, __torch.autograd__ is an engine for computing vector-Jacobian product (chain rule).

#### Example of vector-Jacobian product

In [9]:
x = torch.randn(3, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
    y = y * 2
print(y)

tensor([ -942.4899, -1495.9058,   910.5910], grad_fn=<MulBackward0>)


Now in this case __y__ is no longer a scalar. __torch.autograd__ could not compute the full Jacobian directly, but if we just want the vector-Jacobian product, simply pass the vector to backward as argument.

In [10]:
v = torch.tensor([0.1, 1.0, 0.0001], dtype=torch.float)
y.backward(v)
print(x.grad)

tensor([1.0240e+02, 1.0240e+03, 1.0240e-01])


We can also stop autograd from tracking history on Tensors with __.requires_grad=True__ either by wrapping the code block in with __torch.no_grad()__.

In [11]:
print(x.requires_grad)
print((x ** 2).requires_grad)
with torch.no_grad():
    print((x ** 2).requires_grad)

True
True
False


Or by using __.detach()__ to get a new Tensor with the same content but that does not require gradients.

In [12]:
print(x.requires_grad)
y = x.detach()
print(y.requires_grad)
print(x.eq(y))
print(x.eq(y).all())

True
False
tensor([True, True, True])
tensor(True)
