## Autograd: automatic differentiation


`autograd` is the central package for all neural network in PyTorch.

The `autograd` package provide automatic differentiation for all operations on Tensors. It is define-by-run framework, i.e. your backprop is defined by how your code is run and every single iteration can be different.

### Variable
`auotgrad.Variable` is the central class of the package. It is wrapper over Tensor and supports nearly all operations denfined on Tensor. 

Once you finished your computation call `.backward()` on it and have all the gradient computed automatically.

<img src="img/Variable.png">

`.data` attribute refers to raw tensor
`.grad` attribute accumulates the gradient w.r.t. this `Variable`

`Variable` and `Function` are interconneted and build upon an acyclic graph which encodes a complete history of computation.

`.grad_fu` attribute refers to creator `Function` that has created this `Variable`. The `.grad_fu` in user created Variable refers to `None`, such Variable are referred as `leaf` Variable.

Call `.backward()` function on Variable to compute derivative (gradient).
For scaler Variable (which hold single element data) don't need to specify any argument in `.backward()`. For Variable which hold multiple element data specify `grad_output` argument which is a tensor of same shape.

In [4]:
import torch
from torch.autograd import Variable

In [6]:
x = Variable(torch.ones(2,2), requires_grad=True)
x

Variable containing:
 1  1
 1  1
[torch.FloatTensor of size 2x2]

In [8]:
print(x.grad_fn)

None


In [9]:
y = x + 2
y

Variable containing:
 3  3
 3  3
[torch.FloatTensor of size 2x2]

In [11]:
# y was created as a result of operation. so it has grad_fn
y.grad_fn

<AddBackward0 at 0x7f3b0848f710>

In [12]:
z = y * y * 3
z

Variable containing:
 27  27
 27  27
[torch.FloatTensor of size 2x2]

In [13]:
z.grad_fn

<MulBackward0 at 0x7f3b0848f8d0>

In [14]:
out = z.mean()
out

Variable containing:
 27
[torch.FloatTensor of size 1]

In [15]:
out.grad_fn

<MeanBackward1 at 0x7f3b0848f7f0>

### Gradient

In [17]:
print(x.grad)

None


Do backprop `out.backward()` to compute gradient (`x.grad`) d(out)/dx

In [20]:
out.backward()

In [21]:
x.grad

Variable containing:
 4.5000  4.5000
 4.5000  4.5000
[torch.FloatTensor of size 2x2]

<img src="img/autograd.jpg">

In [28]:
x = torch.randn(3)
x = Variable(x, requires_grad=True)
x

Variable containing:
-1.4167
 0.2049
-1.4206
[torch.FloatTensor of size 3]

In [29]:
y = x * 2
y

Variable containing:
-2.8334
 0.4098
-2.8411
[torch.FloatTensor of size 3]

In [30]:
y.data.norm()

4.033368288363117

In [31]:
while y.data.norm() < 1000:
    y = y * 2

print(y)

Variable containing:
-725.3428
 104.9154
-727.3336
[torch.FloatTensor of size 3]



In [59]:
gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)

x.grad

Variable containing:
 2416.7734
 8805.3945
 -863.2974
[torch.FloatTensor of size 3]

 Documentation of Variable and Function is [here](http://pytorch.org/docs/autograd)