# `torch.autograd`: Computing derivatives

PyTorch constructs the computation graph as you do operations (dynamic graphs) unlike TensorFlow (static graphs)

Using the computation graph, the chain rule (back propagation) can compute derivatives

Derivatives are available in the leaf nodes

<img src="http://media5.datahacker.rs/2021/01/54-1-1536x735.jpg" width=60%>
(Figure from http://datahacker.rs/004-computational-graph-and-autograd-with-pytorch/)

Links:
- https://datahacker.rs/004-computational-graph-and-autograd-with-pytorch/
- http://colah.github.io/posts/2015-08-Backprop/

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import torch
%matplotlib inline
%config InlineBackend.figure_format='retina'

In [None]:
x = torch.tensor(5.0)

In [None]:
w = torch.tensor(3.0, requires_grad=True)

In [None]:
z = x * w**2
z

In [None]:
z.backward()
print(f'x.grad = {x.grad}')

In [None]:
print(f'w.grad = {w.grad}')

In [None]:
w.grad

Now $\frac{\partial z}{\partial w} = 2 x w$

In [None]:
2 * x * w

### Computing derivatives ... or not
https://pytorch.org/docs/stable/generated/torch.tensor.html#torch.tensor

In [None]:
x = torch.tensor(2.0, requires_grad=True)
y = x*x
print(f'y.requires_grad = {y.requires_grad}')
z = x*y
z.backward()
print(f'dz/dx = {x.grad}')

We can "detach" a variable from the computation graph...
https://pytorch.org/docs/stable/generated/torch.Tensor.detach.html

In [None]:
x = torch.tensor(2.0, requires_grad=True)
y = x*x
y = y.detach() # can't say y.requires_grad = False
print(f'y.requires_grad = {y.requires_grad}')
z = x*y
z.backward()
print(f'dz/dx = {x.grad}')

or

In [None]:
x = torch.tensor(2.0, requires_grad=True)
with torch.no_grad():
    y = x*x
print(f'y.requires_grad = {y.requires_grad}')
z = x*y
z.backward()
print(f'dz/dx = {x.grad}')

### Computation graphs are not trees

Re-using a parameter in multiple places makes the graph not be a tree. It's a DAG.

In [None]:
x = torch.tensor(2.0, requires_grad=True)
y = 3*x
z = x**2
w = y + z + x
w.backward()
x.grad

$\frac{\partial w}{\partial x} = \frac{\partial}{\partial x}(3x + x^2 + x) = 3 + 2x + 1$

In [None]:
3 + 2*x + 1

In [None]:
x = torch.tensor(3.0, requires_grad=True)
y = x**2
z1 = 3*y
z2 = 4*y

In [None]:
z1.backward() # (retain_graph=True)
x.grad

In [None]:
z2.backward()
x.grad?

### Accumulating effect

`.grad` stores the gradient.  Take a look:
https://pytorch.org/docs/stable/generated/torch.autograd.grad.html

In [None]:
x = torch.tensor(3.0, requires_grad=True)
y = x**2
z = 3*y
print(x.grad)

z.backward()
print(x.grad)

#x.grad.zero_()
y = x**2
z = 3*y
z.backward()
print(x.grad)

#### Derivatives of scalars with respect to tensors

In [None]:
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = (x**2).sum()
y.backward()
x.grad

#### Don't do in-place modifications to tensors

But it's fine to do `x = 4 * x`

In [None]:
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
x

In [None]:
x[1] = x[2] + 1
#x = 4*x
x

In [None]:
y = (x**2).sum()
y.backward()

#### Results can be slightly different from what you expect...

Since we're building the graph as computations are being done, functions like `max()` become differentiable

In [None]:
x = torch.tensor([1.0, 2.0, 4.0, 3.0, 0.5], requires_grad=True)
max_x = torch.max(x)
max_x

In [None]:
max_x.backward()
x.grad