# Automatic differentiation
We use automatic differentiation to compute the gradient automatically, saving us the trouble of writing the calculations by hand.

### Elves' execution (everything by hand)
![no graph](./images/no_graph.png)

Elves have a lot of time to waste, since they are basically immortal. If you are too, you are allowed to ignore automatic differentiation...

### The static computational graph (deferred execution)
![static graph](./images/static_graph.png)

The neuron gets compiled into a symbolic graph in which each node represents individual operations (second row), using placeholders for inputs and outputs

### The dynamic computational graph (immediate execution)
![dynamic graph](./images/dynamic_graph.png)

The computational graph is built node by node as the code is eagerly evaluated. It easier to accomplish conditional behavior, since the CG can change during successive forward passes.

In [None]:
import torch
a = torch.tensor([[1., 2], [3, 4]], requires_grad=True)
b = torch.ones((2, 2), requires_grad=True)
a, b

In [None]:
c = a + b
d = b * c
d = d + a
d.retain_grad()

In [None]:
c.requires_grad

In [None]:
from torchviz import make_dot
e = torch.mean(d) + torch.mean(b)
make_dot(e)

In [None]:
e.backward()
# for memory efficiency, the graph is deleted during the backward
# e.backward() # error

# if we want not the graph to be freed, specify retain_graph=True 

In [None]:
b.grad # it is a leaf of the graph

In [None]:
c.grad # it is not a leaf

In [None]:
d.grad # it is not a leaf, but has the attribute retains_grad=True

In [None]:
# Dynamic graph - (define-by-run)
# it reminds of the difference between Python variables and C variables
x = torch.tensor([1., 2.], requires_grad=True)
y = x.sum()

while y.data.norm() < 12:
    y = y * 1.2

In [None]:
make_dot(y)

In [None]:
y.backward()

In [None]:
x.grad

In [None]:
# you cannot get a numpy array from a tensor which requires grad
# x.numpy() # error

# we must detach it from the computational graph
print(x)
print(x.detach())
print(x.detach().numpy())

In [None]:
# both detach() and numpy() are both views on the same storage
x_dn = x.detach().numpy()
x[0] = 99
print(x_dn)

## Resources
[Automatic differentiation in Pytorch](https://openreview.net/pdf?id=BJJsrmfCZ)

[Autograd tutorial](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#sphx-glr-beginner-blitz-autograd-tutorial-py)

[Nice overview on Pytorch (the pictures above are taken from here!)](https://pytorch.org/assets/deep-learning/Deep-Learning-with-PyTorch.pdf)