# Automatic differentiation with `autograd`:

In machine learning, we <I>train</I> models to get better and as a function of experience. As discussed in the <I>gradient descent</I> notebook, <I>getting better</I> means minimizing a <I>loss function</I>, i.e., a score that answers "how bad is our model?" With machine learning models, we choose loss functions to be differentiable with respect to our parameters. Put simply, this means that for each of the model's parameters, we can determine how much <I>increasing</I> or <I>decreasing</I> it might affect the loss.

<I>MXNet's</I> autograd expedites this work by automatically calculating derivatives. `mxnet.autograd` allows us to take derivatives while writing ordinary imperative code. Let's go through it step by step.

In [2]:
import mxnet as mx
from mxnet import nd, autograd
mx.random.seed(1)

### Attaching gradients:

As a toy example, let's say we are interested in differentiating a function `f = 2 * (x**2)` with respect to parameter x. We can start by assigning an initial value of `x`.

In [3]:
x = nd.array([[1,2],[3,4]])

Once we compute the gradient of `f` with respect to `x`, we'll need a place to store it. In MXNet, we can tell an NDArray that we plan to store a gradient by invoking its `attach_grad()` method.

In [4]:
x.attach_grad()

We can instruct MXNet to start recording by placing code inside a `with autograd.record():` block

In [5]:
with autograd.record():
    y = x*2
    z = y*x

Let's backprop by calling `z.backwar()`. When 

In [6]:
z.backward()

Now, let's see if this is the expected output. Remember that `y = x * 2`, and `z = x * y`, so `z` should be equal to `2 * x * x`. After, doing backprop with `z.backward()`, we expect to get back gradient dz/dx as follows: dy/dx = 2, dz/dx = 4\*x. So, if everything went according to plan `x.grad` should consist of NDArray with the values `[[4, 8],[12, 16]]`. 

In [7]:
print(x.grad)


[[ 4.  8.]
 [12. 16.]]
<NDArray 2x2 @cpu(0)>


In [4]:
#test
mx.nd.sum(x)


[10.]
<NDArray 1 @cpu(0)>