In [7]:
import torch

requires_grad = True sets up the computational tracking on the tensor

In [8]:
x = torch.tensor(2.0, requires_grad=True)

Example function 1: $y(x) = 2 x^{4} + x^{3} + 3 x^{2} + 5 x + 1$

In [9]:
y = 2*x**4 + x**3 + 3*x**2 + 5*x + 1
print(y)

tensor(63., grad_fn=<AddBackward0>)


The backward method of the tensor `object y` set by default the derivative of the y function with respect to x evaluated in 2.0 in the property .grad of the `x object`
$$\frac{dy}{dx}=8(x)^3+3(x)^2+6(x)+5 $$
$$\left. \frac{dy}{dx} \right|_{x=2}=8(2)^3+3(2)^2+6(2)+5 = 64+12+12+5 = 93$$

In [10]:
y.backward()
print(x.grad)

tensor(93.)


In [12]:
# prove the solution
x.grad == 8*x**3 + 3*x**2 + 6*x +5

tensor(True)

### Multistep backpropagation

In [13]:
x = torch.tensor([[1.,2,3],[3 ,2 ,1]], requires_grad=True)
print(x)

tensor([[1., 2., 3.],
        [3., 2., 1.]], requires_grad=True)


In [14]:
y = 3*x + 2
print(y)

tensor([[ 5.,  8., 11.],
        [11.,  8.,  5.]], grad_fn=<AddBackward0>)


In [15]:
z = 2*y**2
print(z)

tensor([[ 50., 128., 242.],
        [242., 128.,  50.]], grad_fn=<MulBackward0>)


With gradient=torch.ones_like(x) the backward method retrieves the gradient taking count the shape of the x tensor, with scalars there is no need for gradient parameter

In [16]:
z.backward(gradient=torch.ones_like(x))
print(x.grad)

tensor([[ 60.,  96., 132.],
        [132.,  96.,  60.]])


Each component of the tensor $z$, as function of $x_i$ can be written as:

$$z_i = 2(y_i)^2 = 2(3x_i+2)^2$$

To evaluate  $\frac {\partial z_i}{\partial x_i}$ the Chain Rule can be used: $f(g(x)) = f'(g(x))g'(x)$

$$f(g(x)) = 2(g(x))^2 $$
$$f'(g(x)) = 4g(x) $$
$$g(x) = 3x+2$$ 
$$g'(x) = 3 $$
$$\frac {\partial z_i}{\partial x_i} = 4g(x_i) 3 = 12(3x_i+2) $$

Evaluating the derivative for each component $x_i$ of the tensor $x$:

$$\left. \frac{\partial z_1}{\partial x_i} \right|_{x_i=1} = 12(3(1)+2) = 60$$
$$\left. \frac{\partial z_2}{\partial x_i} \right|_{x_i=2} = 12(3(2)+2) = 96$$
$$\left. \frac{\partial z_3}{\partial x_i} \right|_{x_i=3} = 12(3(3)+2) = 136$$






In [17]:
# prove the solution
x.grad == 12*(3*x+2)

tensor([[True, True, True],
        [True, True, True]])

It's common to use an average before perform the backward pass

In [18]:
x = torch.tensor([[1.,2,3],[3 ,2 ,1]], requires_grad=True)

In [19]:
y = 3*x + 2

In [24]:
z = 2*y**2
print(z)

tensor([[ 50., 128., 242.],
        [242., 128.,  50.]], grad_fn=<MulBackward0>)


In [21]:
out = z.mean()
print(out)

tensor(140., grad_fn=<MeanBackward0>)


In [22]:
out.backward()

In [23]:
print(x.grad)

tensor([[10., 16., 22.],
        [22., 16., 10.]])


Now the single component of the Tensor $out = o$, in terms of $z_i(y_i(x_i))$ can be written as:

$$  o(x_i) = \frac {1} {6}\sum_{i=1}^{6} z_i(y_i(x_i)) $$

Taking count that:

$$ \left(\displaystyle\sum_{i=1}^nf_i(x)\right)^\prime=\displaystyle\sum_{i=1}^nf_i^\prime(x) $$

The derivative of $o$ with respect to $x_i$

$$ \frac {\partial o}{\partial x_i} = \left(\displaystyle \frac {1} {6} \sum_{i=1}^6z_i(y_i(x_i))\right)^\prime=\displaystyle \frac {1} {6} \sum_{i=1}^6\frac {\partial [z_i(y_i(x_i))]} {\partial x_i} = \displaystyle \frac {1} {6} \sum_{i=1}^6\frac {\partial z_i}{\partial y_i} \frac {\partial y_i}{\partial x_i} $$

Pytorch ignores the summation in backward step, then 

$$\frac {\partial o}{\partial x_i} = \frac {1} {6} \frac {\partial z_i}{\partial y_i} \frac {\partial y_i}{\partial x_i}  $$

With 

$$\frac {\partial z_i}{\partial y_i} \frac {\partial y_i}{\partial x_i} = \frac {\partial z_i}{\partial x_i} = 12(3x_i+2) $$

$$\frac {\partial o}{\partial x_i} = \frac {1} {6} 12(3x_i+2) = 2(3x_i+2) $$



In [25]:
# prove the solution
x.grad == 2*(3*x+2)

tensor([[True, True, True],
        [True, True, True]])