# PyTorch MNIST example dissected

In this notebook we'll explore the components of the
[PyTorch MNIST example](https://github.com/pytorch/examples/tree/master/mnist)
one-by-one.

* Part 1: [Loading the data](1_mnist_load.ipynb)
* Part 2: [Model components and forward propagation](2_mnist_model.ipynb)
* Part 3: [Autodiff and backpropagation](3_mnist_backprop.ipynb) <-- **you are here**
* Part 4: [Training the model](4_mnist_train.ipynb)
* Part 5: [Visualizing the results](5_mnist_visualize.ipynb)

## 3 Backpropagation

Before we start training our model, let's explore the auto-differentiation functionality of PyTorch.

In fact, PyTorch has some good [online documentation](https://pytorch.org/docs/stable/notes/autograd.html) on the subject; Below we will focus more on the autograd internals.

In [1]:
import torch

Recall in [Part 2](2_mnist_model.ipynb) parameters of the neural net had a flag `require_grad=True`. It indicates that this tensor will participate in backpropagation:

In [2]:
x = torch.tensor(10., requires_grad=True)
y = torch.tensor(20., requires_grad=True)

z = x * y

print("z =", z, "requires_grad =", z.requires_grad)

z = tensor(200., grad_fn=<ThMulBackward>) requires_grad = True


If at least one operand has `required_grad` flag set, the result will also have `requires_grad=True`. The resulting tensor also has the `grad_fn` property; it holds a link to the operation that produced the tensor.

Now we can propagate the gradient back to `x` and `y` using the `.backward()` method:

In [3]:
z.backward(torch.tensor(2.))

Gradients are stored in the `.grad` field of each tensor:

In [4]:
x.grad, y.grad

(tensor(40.), tensor(20.))

No surprises here, as

$\frac{dz}{dx}\bigr\rvert_{x=2} = 2y = 40$

and

$\frac{dz}{dy}\bigr\rvert_{x=2} = 2x = 20$

Same applies for matrix operations, e.g.

In [58]:
a = torch.tensor([[100., 200.], [300., 400.]], requires_grad=True)
b = torch.tensor([[8., 7.], [6., 5.]], requires_grad=True)
c = torch.eye(2, requires_grad=False) * 3  # <-- Set it to False for a change

d = a.matmul(b) #.matmul(c)

d.backward(torch.tensor([[10., 20.], [30., 40.]]))

print(d)
print(a.grad)
print(b.grad)
print(c.grad)

tensor([[2000., 1700.],
        [4800., 4100.]], grad_fn=<MmBackward>)
tensor([[220., 160.],
        [520., 380.]])
tensor([[10000., 14000.],
        [14000., 20000.]])
None


That is, e.g.

since $d_{11} = a_{11}b_{11}c_{11} + a_{12}b_{21}c_{11} + a_{11}b_{12}c_{21} + a_{12}b_{22}c_{21}$

$\frac{\partial d_{11}}{\partial a_{11}}\bigr\rvert_{A=2} = 2b_{11} + 2b_{21} = 30$

and

$\frac{\partial d_{11}}{\partial b_{11}}\bigr\rvert_{A=2} = 2b_{11} + 2b_{21} = 30$


Note that `c.grad == None` as we've explicitly turned off the gradient propagation for that tensor.

PyTorch allows you to do 