# **Chapter 4 - THE PRELIMINARIES: A CRASHCOURSE**

## **4.3 Automatic Differentiation**

In [126]:
from mxnet import autograd, nd

#### **4.3.1 A simple Example**

- As a toy example, say that we are interested in differentiating the mapping **y = 2x⊤x** with respect to the column vector x.<br> 
  To start, let’s create the variable x and assign it an initial value.

In [133]:
x = nd.arange(4)
x


[0. 1. 2. 3.]
<NDArray 4 @cpu(0)>

In [134]:
x.attach_grad()

In [135]:
with autograd.record():
    y = 2 * nd.dot(x, x)
y    


[28.]
<NDArray 1 @cpu(0)>

In [136]:
y.backward()

- The gradient of the function **y = 2x⊤x** with respect to x should be **_4x_**.<br> 
  Now let’s verify that the gradient produced is correct.

In [139]:
print(x)
print(x.grad)
print(x.grad - 4 * x)


[0. 1. 2. 3.]
<NDArray 4 @cpu(0)>

[ 0.  4.  8. 12.]
<NDArray 4 @cpu(0)>

[0. 0. 0. 0.]
<NDArray 4 @cpu(0)>


In [118]:
with autograd.record():
    y = x.norm()
y.backward()
x.grad


[0.         0.26726124 0.5345225  0.80178374]
<NDArray 4 @cpu(0)>

#### 4.3.2 Backward for Non-scalar Variable

In [156]:
print('x vector : ', x)

with autograd.record(): # y is a vector
    y = x * x
print('y vector : ', y)    
y.backward()
print('x.grad : ', x.grad)

u = x.copy()
u.attach_grad()

with autograd.record(): # v is scalar
    v = (u * u).sum()
print('v scalar : ', v)    
v.backward()
print('u.grad : ', u.grad)

x.grad - u.grad

x vector :  
[0. 1. 2. 3.]
<NDArray 4 @cpu(0)>
y vector :  
[0. 1. 4. 9.]
<NDArray 4 @cpu(0)>
x.grad :  
[0. 2. 4. 6.]
<NDArray 4 @cpu(0)>
v scalar :  
[14.]
<NDArray 1 @cpu(0)>
u.grad :  
[0. 2. 4. 6.]
<NDArray 4 @cpu(0)>



[0. 0. 0. 0.]
<NDArray 4 @cpu(0)>

#### 4.3.3 Detach Computations

In [161]:
with autograd.record():
    y = x * x
    u = y.detach()
    z = u * x
print('x : ', x)
print('u : ', u)
print('z : ', z)

z.backward()

print('x.grad : ', x.grad)
print('u : ', u)

x.grad - u

x :  
[0. 1. 2. 3.]
<NDArray 4 @cpu(0)>
u :  
[0. 1. 4. 9.]
<NDArray 4 @cpu(0)>
z :  
[ 0.  1.  8. 27.]
<NDArray 4 @cpu(0)>
x.grad :  
[0. 1. 4. 9.]
<NDArray 4 @cpu(0)>
u :  
[0. 1. 4. 9.]
<NDArray 4 @cpu(0)>



[0. 0. 0. 0.]
<NDArray 4 @cpu(0)>

- The following backward computes **_∂u2x/∂x_** with u = x instead of **∂x3/∂x**.

- Since the computation of y is still recorded, we can call y.backward() to get **∂y/∂x = 2x**.

In [162]:
y.backward()
print('y : ', y)
print('x.grad : ', x.grad)
print('x : ', x)

x.grad - 2*x

y :  
[0. 1. 4. 9.]
<NDArray 4 @cpu(0)>
x.grad :  
[0. 2. 4. 6.]
<NDArray 4 @cpu(0)>
x :  
[0. 1. 2. 3.]
<NDArray 4 @cpu(0)>



[0. 0. 0. 0.]
<NDArray 4 @cpu(0)>

#### 4.3.4 Attach Gradients to Internal Variables

In [None]:
#### 4.3.4 Attach Gradients to Internal Variables

#### 4.3.5 Head gradients

In [None]:
#### 4.3.5 Head gradients

#### 4.3.6 Computing the Gradient of Python Control Flow

In [None]:
#### 4.3.6 Computing the Gradient of Python Control Flow

#### 4.3.7 Training Mode and Prediction Mode 

In [None]:
#### 4.3.7 Training Mode and Prediction Mode 

#### 4.3.8 Summary


In [None]:
#### 4.3.8 Summary

#### 4.3.9 Exercises

In [None]:
#### 4.3.9 Exercises

____