<img style="float: right;" src="../htwlogo.jpg">

# Gradients with pytorch

**Author**: Dive into Deep Learning, adapted by _Erik Rodner_<br>
**Lecture**: Computer Vision and Machine Learning I

In the following exercise, we will look at automatic differentiation.

https://d2l.ai/chapter_preliminaries/autograd.html

The following code requires pytorch and if you want to get the nice graph visualization you also **need graphviz being installed** (https://graphviz.org/download/)


Let us first define a tensor that can store a corresponding gradient.

In [None]:
import torch
import sys
import os
%load_ext autoreload
%autoreload 2

sys.path.append(os.path.join("..", "utils"))

from torchutils import make_dot
import numpy as np

In [None]:
x = torch.arange(4.0)
x

In [None]:
x.requires_grad_(True)  # Same as `x = torch.arange(4.0, requires_grad=True)`
x.grad  # The default value is None

``x`` is just a single node in the computation graph. So let's define a new variable ``y``
which can be computed using ``x``, i.e. we mathematically we could write $y(x) = 2 x^T x = 2 \sum\limits_{d=1}^D x_i * x_i = 2 \sum\limits_{d=1}^D x_i^2$

In [None]:
y = 2 * torch.dot(x, x) # forward step
y

Computing the gradient $\nabla_x y = ( \frac{\partial y}{\partial x_1}, \frac{\partial y}{\partial x_2}, \frac{\partial y}{\partial x_3}, \frac{\partial y}{\partial x_4} )$ is easy: 
1. backward step and then 
2. accessing ``x.grad``.

Please note that this can only be done with ``x`` storing a proper value.

In [None]:
y.backward()
x.grad

Lets redefine $y(x)$ as $y(x) = \sum_i x_i = x_1 + x_2 + x_3 + x_4$ and recompute the gradient: $\nabla_x y = ( \frac{\partial y}{\partial x_1}, \frac{\partial y}{\partial x_2}, \frac{\partial y}{\partial x_3}, \frac{\partial y}{\partial x_4} )$

In [None]:
x.grad.zero_()
y = x.sum() # forward pass
y.backward()
x.grad

Now, let us move to an example with 3 tensors:
1. ``z`` contains only 1s
2. $y = x^T z = \sum\limits_i x_i * z_i = x_1*z_1 + x_2 * z_2 + x_3 * z_3 + x_4 * z_4$

In [None]:
z = torch.ones(4, requires_grad=True)
y = torch.dot(x, z)

In [None]:
x.grad.zero_()
y.backward()

After the backward operation, ``x.grad`` contains $\nabla_x y$ and ``z.grad`` contains $\nabla_z y$

In [None]:
x.grad

In [None]:
z.grad

We can also visualize the computation graph and see which parts of the graph contain elements to store the gradient.

In [None]:
make_dot(y, locals())

We can also build more complex graphs like this one:
1. $z_i = x_i^2 \enspace \forall i$
2. $y = \sum\limits_i x_i * z_i$

In [None]:
z = x**2
y = torch.dot(x, z)

Look how the variable z is now just an intermediate variable, which is not shown in the following graph. However, it is still part of the computation graph.

In [None]:
make_dot(y, locals())

In [None]:
x.grad.zero_()
z.retain_grad() # this ensures that we can access the gradient, otherwise this would not be possible
y.backward()

In [None]:
x.grad