In [69]:
import torch

### Automatic Differentiation

Calculating derivatives is the crucial step in all the optimization algorithms that we will use to train deep networks. Modern deep learning frameworks take this work off our plates by offering automatic differentiation (often shortened to autograd).

### Exmplanation based on a simple function

**y = 2x<sup>T</sup>x**, where **x** is an vector

In [70]:
x = torch.arange(4.0)
x

tensor([0., 1., 2., 3.])

Before gradien calculation, we need a place to store it. Because of the calculation complexity in real-life scenarios and how much data needs to be processed and stored - memory management is crucial to not run out of it.
For this, gradiend with the respect to vector **x**  we can store **in that vector**. 

In [71]:
x.requires_grad_(True)
# We can also define this when creating a tensor by x = torch.arange(4.0, requires_grad=True)

# Gradient for now is None by default
print(x.grad)

None


In [72]:
# You can also use "matmul" but the executed algoright vary on the input, while using "dot" you specify wich exactly algorimth you want to use
# dot product = scalar product (pl. iloczyn skalarny)
y = 2 * torch.dot(x, x)
y

tensor(28., grad_fn=<MulBackward0>)

In [73]:
x.grad # Still None
y.backward() # Take the gradient of y with respect to x by calling its backward method - "x" gradient will be now filled
x.grad

tensor([ 0.,  4.,  8., 12.])

In [74]:
# Gradient function with respect to the **x** should be:

# y' = 2 * (x * x)
# (x * x) is essentially (x ** 2)
# y' = 2 * (x ** 2)
# y' = 4 * x

x.grad == 4 * x

tensor([True, True, True, True])

In [75]:
u = x.sum()
u.backward()
u


tensor(6., grad_fn=<SumBackward0>)

In [76]:
x.grad 
# Result - tensor([ 1.,  5.,  9., 13.])
# Because PyTorch does NOT automatically resey the gradient buffer.
# Instead, the new gradient is added to the already-stored gradient. 
# This behavior comes in handy when we want to optimize the sum of multiple objective functions.

tensor([ 1.,  5.,  9., 13.])

In [77]:
# In order to reset the gradient use "grad.zero_()"
x.grad.zero_()
u = x.sum()
u.backward()
x.grad

tensor([1., 1., 1., 1.])