## PyTorch Gradients
### Summary
* Define original equation.
* Sunstitute equation with `x` value.
* Calculate gradients with `o.backward()`.
* Access gradients of the `x` through `x.grad`.

In [1]:
# import 
import torch

### Tensor
If you set its attribute `.requires_grad` as `True`, it starts to track all operations on it.
$$ y_i = 5(x_i + 1)^2 $$

In [2]:
x = torch.ones(2, requires_grad=True)
x

tensor([1., 1.], requires_grad=True)

$$ y_i|_{x_i=1} = 5(1+1)^2 = 5(4) = 20 $$

In [3]:
y = 5 * (x + 1) ** 2
y

tensor([20., 20.], grad_fn=<MulBackward0>)

Backwrd shuld be called only a scaler or with gradient with respect to the variable.
* Let's reduce `y` to a scaler:
$$ o = \frac{1}{2} \sum_{i}y_i$$ 
So we want to reduct `Y` which is what we have `[20, 20]` into a single value, and the easiest way we can do it is to have a mean of the entire value. 

In [4]:
o = (1/2) * torch.sum(y) 
o 

tensor(20., grad_fn=<MulBackward0>)

We get 20 again because when we SUM Y is 20 plus 20 which is 40 and when we divide it by 2 we get (scaler) 20.

1. Recap `y` equation: $$y_i=5(x_i + 1)^2$$
2. Recap `o` equation: $$o = \frac{1}{2} \sum_{i}y_i$$

    We take the mean of `Y` which is a summation of `y` divide by two because there's only two elements. If y had 100 elements with SUM `Y` and divided by 100 becuase there's 100 elements.

3. Substitute `y` into o equation: $$o = \frac{1}{2} \sum_{i}5(x_i + 1)^2$$ 

$$\frac{∂o}{∂x_{i}} = \frac{1}{2}[10(x_i+1)]$$
$$\frac{∂o}{∂x_{i}}|_{x_i=1} =\frac{1}{2}[10(1+1)]=\frac{10}{2}(2)=10$$

When we do `backward` that it calculates the gradients.

In [5]:
o.backward() # calculates the gradients.

How do we access the gradients, we just do `x.grad`.

In [6]:
x.grad

tensor([10., 10.])