Recommended materials
====

1. Pytorch Official Tutorial \[[Link](https://pytorch.org/tutorials/)\]
2. DeepLearning Zero to All \[[English](https://www.youtube.com/playlist?list=PLlMkM4tgfjnJ3I-dbhO9JTw7gNty6o_2m)\] \[[Korean](https://www.youtube.com/playlist?list=PLQ28Nx3M4JrhkqBVIXg-i5_CVVoS1UzAv)\]
3. Neural Network Programming - Deep Learning with Pytorch \[[English](https://www.youtube.com/playlist?list=PLZbbT5o_s2xrfNyHZsM6ufI0iZENK9xgG)\]

Autograd and Automatic Differentiation
====

[reference](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#sphx-glr-beginner-blitz-autograd-tutorial-py)

The **`autograd`** package provides automatic differentiation for all operations on **`Tensors`**.   
It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.

## Forward Function
![](https://drive.google.com/uc?export=view&id=1h8IKgTjOOci0Q75Y9XyA9X7m49bUgX04)

Lets take a look at this simple feedforward function that takes `p`, `q`, and `s` as input variables and computes the output variable `t`.
```
r = p + q
t = r * s = (p + q) * s
```
  
If we assign 3, 5, 2 to `p`, `q`, and `s`, respectively, then we can get `t` that is equal to 16
```
p = 3  
q = 5  
s = 2  
r = p + q = 8  
t = r * s = (p + q) * s = 8 * 2 = 16
```

This simple process can be easily described using **pytorch**, as follows

In [None]:
import torch

p = torch.tensor(3.)
q = torch.tensor(5.)
s = torch.tensor(2.)
r = p + q
t = r * s

print('r :', r)
print('t :', t)

## Backward Function
![](https://drive.google.com/uc?export=view&id=1kCKhSO3HFcm_bygQBO2fyluvq1Ku5ov0)

The core of deep learning is to train deep neural networks using back-propagated gradients given some outputs (and labels).  
In the case of this feedforward function, we can easily compute each element in backward pass and get gradients for the components that we are interested in,  
e.g. `dt/dp`, `dt/dq`, and `dt/ds`.

```
p = 3  
q = 5  
s = 2  
r = p + q = 8  
t = r * s = (p + q) * s = 8 * 2 = 16

dt/dp = dt/dr * dr/dp = s * 1 = s = 2  
dt/dq = dt/dr * dr/dq = s * 1 = s = 2  
dt/ds = r = 8
```

However, this manual computation of gradients becomes labor-consuming and even intractible when our function is very complex, or a neural net is very **deep**

## Autograd automatically tracks all operations on Tensors and gives us gradients

**`torch.Tensor`** is the central class of the **`pytorch`** package. If you set its attribute **`.requires_grad`** as **`True`**, it starts to track all operations on it. When you finish your computation you can call **`.backward()`** and have all the gradients computed automatically. The gradient for this tensor will be accumulated into **`.grad`** attribute.

In [None]:
import torch

p = torch.tensor([3.], requires_grad=True)
q = torch.tensor([5.], requires_grad=True)
s = torch.tensor([2.], requires_grad=True)
r = p + q
t = r * s

print('r :', r.item())
print('t :', t.item())
# below will invoke errors for now because we didn't perform backpropagte yet
# print('dt/dp :', p.grad.item())  
# print('dt/dq :', q.grad.item())  
# print('dt/ds :', s.grad.item())  

In [None]:
# now do backpropagation starting from output variable t
t.backward()

In [None]:
print('dt/dp :', p.grad.item())
print('dt/dq :', q.grad.item())  
print('dt/ds :', s.grad.item())  