# Automatic Differentiation
The Deep Learning frameworks like [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/) implement automatic differentiation. In few words a they allow to compute the derivatives of a composite function with respect to the inputs building a computational graph. The computational graph has nodes for each intermediate data and operation. It can be executed in forward mode, to compute the output from the input, and in backward mode, to compute the derivative of each intermediate node with respect to the output. Let's implement the computational graph for the function:
$$
f(x_1,x_2) = ln(x_1^2 + \sqrt{x})
$$
<p align="center">
  <img src="../imgs/autograd.png"/ width=50%>
</p>

By default the gradient is stored only for the leaf nodes, to store the gradient of non leaf nodes one must call the method `tensor.retain_grad()`

In [1]:
import torch

x1 = torch.tensor([2.],dtype=torch.float32, requires_grad=True)
x2 = torch.tensor([3.],dtype=torch.float32, requires_grad=True)

In [2]:
h1 = torch.pow(x1,2)
h2 = torch.sqrt(x2)
h1.retain_grad()
h2.retain_grad()

In [3]:
z = h1 + h2
z.retain_grad()

In [4]:
o = torch.log(z)
o.retain_grad()

In [5]:
o.backward(retain_graph=True)

In [6]:
print(f"do/do = {o.grad}")
print(f"do/dz = {z.grad}")
print(f"do/dh1 = {h1.grad}")
print(f"do/dh2 = {h2.grad}")
print(f"do/dx1 = {x1.grad}")
print(f"do/dx2 = {x2.grad}")

do/do = tensor([1.])
do/dz = tensor([0.1745])
do/dh1 = tensor([0.1745])
do/dh2 = tensor([0.1745])
do/dx1 = tensor([0.6978])
do/dx2 = tensor([0.0504])
