# Automatic Differentiation
Differentiation is a crucial step in nearly all deep learning optimization algorithms.

Deep learning frameworks speed up this work
by automatically calculating derivatives, i.e., *automatic differentiation*.
In practice,
based on our designed model
the system builds a *computational graph*,
tracking which data combined through
which operations to produce the output.
Automatic differentiation enables the system to subsequently backpropagate gradients.


## A Simple Example

**Differentiating the function
$y = 2\mathbf{x}^{\top}\mathbf{x}$
with respect to the column vector $\mathbf{x}$.**

To start, let us create the variable `x` and assign it an initial value.

In [5]:
import torch

x = torch.arange(4.0)
x

tensor([0., 1., 2., 3.])

**Before we even calculate the gradient
of $y$ with respect to $\mathbf{x}$,
we will need a place to store it.**
It is important that we do not allocate new memory
every time we take a derivative with respect to a parameter
because we will often update the same parameters
thousands or millions of times
and could quickly run out of memory.
Note that a gradient of a scalar-valued function
with respect to a vector $\mathbf{x}$
is itself vector-valued and has the same shape as $\mathbf{x}$.


In [6]:
x.requires_grad_(True)  # Same as `x = torch.arange(4.0, requires_grad=True)`
x.grad  # The default value is None

**Now let us calculate $y$.**


In [7]:
y = 2 * torch.dot(x, x)
y

tensor(28., grad_fn=<MulBackward0>)

Since `x` is a vector of length 4,
an dot product of `x` and `x` is performed,
yielding the scalar output that we assign to `y`.

Next, **we can automatically calculate the gradient of `y`
with respect to each component of `x`**
by calling the function for backpropagation and printing the gradient.

In [8]:
y.backward()
x.grad

tensor([ 0.,  4.,  8., 12.])

We know that **The gradient of the function $y = 2\mathbf{x}^{\top}\mathbf{x}$
with respect to $\mathbf{x}$ should be $4\mathbf{x}$.**

Let us quickly verify that our desired gradient was calculated correctly.

In [9]:
x.grad == 4 * x

tensor([True, True, True, True])

**Now let us calculate another function of `x`.**


In [10]:
# PyTorch accumulates the gradient in default, we need to clear the previous values
x.grad.zero_()
y = x.sum()
y.backward()
x.grad

tensor([1., 1., 1., 1.])