In [1]:
import torch

# Introduction to Autograd

torch.Autograd is a python module whose purpose is to calculate the gradient of functions using the backpropagation method. 

Backpropagation is an algorithm relying on the following theorem :
$$ (f(g(x)))' = g'(x) * f'(g(x)) $$

It means that to compute $(f(g(x)))'$ we only need to know $f',g(x)$ and $g'(x) $. 
It the context of Neural Networks it is useful because they are compostions of simple function such as activation functions and linear/affine functions, so we know $f'$ and by a reccursive call we can have $g(x)$ and $g'(x)$ : this is the backpropagation algorithm.

The job of Autograd is to keep in mind which function depends on which functions to perform backpropagation.

# Computationnal graph

If we need to compute the gradient of a function relying on a tensor t, then we need to 
define t with the parameter requires_grad = True. Let's try to minimise the function $2*(x^2 + y^2)$.

In [4]:
x = torch.tensor(3,dtype=torch.float,requires_grad=True)
y = torch.tensor(2,dtype=torch.float,requires_grad=True)
z = x**2+y**2
t = 2*z

We note :
- $S : (x,y) \rightarrow x+y$ 
- $P : x \rightarrow x^2$
- $M : x \rightarrow 2x$.

Then we have : 
$$
\begin{cases}
u = P(x) \\
v= P(y) \\
z = S(u,v) \\
t = M(z) = M(z(u,v)) = M(z(u(x),u(y)))
\end{cases}
$$
and 
$$
\begin{cases}
x = 3\\
y = 2 \\
u = 9 \\
v = 4
z = 13 \\
t = 26
\end{cases}
$$


To minimise the function we propose to use the gradient descent algorithm. To do that we need to 
compute the gradient of $ M(S(P(x),P(y)))$ with respect to the parameters $x,y$. We do the backpropagation method : 
1) we compute the gradient of $t$ with respect to $z$ at the point $z=13$ : its $2$
2) we compute the gradient of $t$ with respect to $u$ and $v$ at point (u=9,v=4) : 
    $$
        \frac{\partial t(9,4)}{\partial u} = \frac{\partial M(13)}{\partial z} \cdot \frac{\partial z(9,4)}{\partial u} = 2 * 1 $$
    And we have the same result if we do the computation for $v$
3) We compute the gradient of $t$ with respect to $x$ and $y$ :
    $$ t(x,y) = M(S(P(x),P(y))) $$
    $$ 
        \frac{\partial t(3,2)}{\partial x}  \\
        = \frac{\partial (MoS)(9,4)}{\partial u} \cdot  \frac{\partial u(3)}{\partial x} + \frac{\partial (MoS)(9,4)}{\partial v} \cdot  \frac{\partial v(3)}{\partial x} \\
        = \frac{\partial (MoS)(9,4)}{\partial u} \cdot  \frac{\partial u(3)}{\partial x} + 0
        = 
    $$


In [12]:
t = torch.ones((2,2),dtype=torch.float,requires_grad=True)
z = 2*t 
z = -t 
u = z.sum()
u.backward()
print(z.grad)

None


  print(z.grad)


https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html