# `torch.autograd`: Computing derivatives

PyTorch constructs the computation graph as you do operations (dynamic graphs) unlike TensorFlow (static graphs)

Using the computation graph, the chain rule (back propagation) can compute derivatives

Derivatives are available in the leaf nodes

<img src="http://media5.datahacker.rs/2021/01/54-1-1536x735.jpg" width=60%>
(Figure from http://datahacker.rs/004-computational-graph-and-autograd-with-pytorch/)

Links:
- https://datahacker.rs/004-computational-graph-and-autograd-with-pytorch/
- http://colah.github.io/posts/2015-08-Backprop/

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import torch
%matplotlib inline
%config InlineBackend.figure_format='retina'

In [2]:
x = torch.tensor(5.0)

In [3]:
w = torch.tensor(3.0, requires_grad=True)

In [4]:
z = x * w**2
z

tensor(45., grad_fn=<MulBackward0>)

In [5]:
z.backward()
print(f'x.grad = {x.grad}')

x.grad = None


In [7]:
print(f'w.grad = {w.grad}')

w.grad = 30.0


In [8]:
w.grad

tensor(30.)

Now $\frac{\partial z}{\partial w} = 2 x w$

In [9]:
2 * x * w

tensor(30., grad_fn=<MulBackward0>)

### Computing derivatives ... or not
https://pytorch.org/docs/stable/generated/torch.tensor.html#torch.tensor

In [10]:
x = torch.tensor(2.0, requires_grad=True)
y = x*x
print(f'y.requires_grad = {y.requires_grad}')
z = x*y
z.backward()
print(f'dz/dx = {x.grad}')

y.requires_grad = True
dz/dx = 12.0


We can "detach" a variable from the computation graph...
https://pytorch.org/docs/stable/generated/torch.Tensor.detach.html

In [11]:
x = torch.tensor(2.0, requires_grad=True)
y = x*x
y = y.detach() # can't say y.requires_grad = False
print(f'y.requires_grad = {y.requires_grad}')
z = x*y
z.backward()
print(f'dz/dx = {x.grad}')

y.requires_grad = False
dz/dx = 4.0


or

In [12]:
x = torch.tensor(2.0, requires_grad=True)
with torch.no_grad():
    y = x*x
print(f'y.requires_grad = {y.requires_grad}')
z = x*y
z.backward()
print(f'dz/dx = {x.grad}')

y.requires_grad = False
dz/dx = 4.0


### Computation graphs are not trees

Re-using a parameter in multiple places makes the graph not be a tree. It's a DAG.

In [33]:
x = torch.tensor(2.0, requires_grad=True)
y = 3*x
z = x**2
w = y + z + x
w.backward()
x.grad

tensor(8.)

$\frac{\partial w}{\partial x} = \frac{\partial}{\partial x}(3x + x^2 + x) = 3 + 2x + 1$

In [34]:
3 + 2*x + 1

tensor(8., grad_fn=<AddBackward0>)

In [35]:
x = torch.tensor(3.0, requires_grad=True)
y = x**2
z1 = 3*y
z2 = 4*y

In [36]:
z1.backward() # (retain_graph=True)
x.grad

tensor(18.)

In [37]:
z2.backward()
x.grad

RuntimeError: ignored

### Accumulating effect

`.grad` stores the gradient.  Take a look:
https://pytorch.org/docs/stable/generated/torch.autograd.grad.html

In [29]:
x = torch.tensor(3.0, requires_grad=True)
y = x**2
z = 3*y
print(x.grad)

z.backward()
print(x.grad)

#x.grad.zero_()
y = x**2
z = 3*y
z.backward()
print(x.grad)

None
tensor(18.)
tensor(36.)


#### Derivatives of scalars with respect to tensors

In [20]:
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
y = (x**2).sum()
y.backward()
x.grad

tensor([2., 4., 6.])

#### Don't do in-place modifications to tensors

But it's fine to do `x = 4 * x`

In [21]:
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
x

tensor([1., 2., 3.], requires_grad=True)

In [22]:
x[1] = x[2] + 1
#x = 4*x
x

RuntimeError: ignored

In [30]:
y = (x**2).sum()
y.backward()

#### Results can be slightly different from what you expect...

Since we're building the graph as computations are being done, functions like `max()` become differentiable

In [31]:
x = torch.tensor([1.0, 2.0, 4.0, 3.0, 0.5], requires_grad=True)
max_x = torch.max(x)
max_x

tensor(4., grad_fn=<MaxBackward1>)

In [32]:
max_x.backward()
x.grad

tensor([0., 0., 1., 0., 0.])