<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Install-mxnet-with-GPU" data-toc-modified-id="Install-mxnet-with-GPU-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Install <code>mxnet</code> with GPU</a></span></li><li><span><a href="#Data-Manipulation" data-toc-modified-id="Data-Manipulation-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Data Manipulation</a></span><ul class="toc-item"><li><span><a href="#NDArrary" data-toc-modified-id="NDArrary-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>NDArrary</a></span></li><li><span><a href="#Calculation-/-Operation" data-toc-modified-id="Calculation-/-Operation-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Calculation / Operation</a></span></li><li><span><a href="#Broadcast" data-toc-modified-id="Broadcast-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Broadcast</a></span></li><li><span><a href="#Indexing" data-toc-modified-id="Indexing-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Indexing</a></span></li><li><span><a href="#Memory-cost" data-toc-modified-id="Memory-cost-2.5"><span class="toc-item-num">2.5&nbsp;&nbsp;</span>Memory cost</a></span></li><li><span><a href="#Transform-between-NDArray-and-NumPy" data-toc-modified-id="Transform-between-NDArray-and-NumPy-2.6"><span class="toc-item-num">2.6&nbsp;&nbsp;</span>Transform between <code>NDArray</code> and <code>NumPy</code></a></span></li></ul></li><li><span><a href="#Calculate-gradient" data-toc-modified-id="Calculate-gradient-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Calculate gradient</a></span><ul class="toc-item"><li><span><a href="#Quick-example" data-toc-modified-id="Quick-example-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Quick example</a></span></li><li><span><a href="#Training-and-Testing-mode" data-toc-modified-id="Training-and-Testing-mode-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Training and Testing mode</a></span></li><li><span><a href="#get-gradient-of-control-flows" data-toc-modified-id="get-gradient-of-control-flows-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>get gradient of control flows</a></span></li></ul></li><li><span><a href="#Summary" data-toc-modified-id="Summary-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Summary</a></span></li></ul></div>

## Install `mxnet` with GPU

* `conda install -c anaconda cudatoolkit=10.1`

## Data Manipulation
### NDArrary

In [4]:
import mxnet as mx
from mxnet import nd

In [6]:
x = nd.arange(12)
x


[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11.]
<NDArray 12 @cpu(0)>

In [7]:
x.shape

(12,)

In [8]:
x.size

12

In [9]:
X = x.reshape((3,4))
X


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]
<NDArray 3x4 @cpu(0)>

**`-1` means automatically set the shape**

In [10]:
X = x.reshape((-1,4))
X


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]
<NDArray 3x4 @cpu(0)>

In [11]:
X = x.reshape((3,-1))
X


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]
<NDArray 3x4 @cpu(0)>

In [12]:
nd.zeros((2,3,4))


[[[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]

 [[0. 0. 0. 0.]
  [0. 0. 0. 0.]
  [0. 0. 0. 0.]]]
<NDArray 2x3x4 @cpu(0)>

In [13]:
nd.ones((3,4))


[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
<NDArray 3x4 @cpu(0)>

In [15]:
Y = nd.array([[2,1,4,3],[1,2,3,4],[4,3,2,1]])
Y


[[2. 1. 4. 3.]
 [1. 2. 3. 4.]
 [4. 3. 2. 1.]]
<NDArray 3x4 @cpu(0)>

In [16]:
nd.random.normal(0,1,shape = (3,4))


[[ 1.1630785   0.4838046   0.29956347  0.15302546]
 [-1.1688148   1.5580711  -0.5459446  -2.3556297 ]
 [ 0.5414402   2.6785066   1.2546345  -0.54877394]]
<NDArray 3x4 @cpu(0)>

### Calculation / Operation

In [17]:
print(X)
print(Y)


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]
<NDArray 3x4 @cpu(0)>

[[2. 1. 4. 3.]
 [1. 2. 3. 4.]
 [4. 3. 2. 1.]]
<NDArray 3x4 @cpu(0)>


In [18]:
X + Y


[[ 2.  2.  6.  6.]
 [ 5.  7.  9. 11.]
 [12. 12. 12. 12.]]
<NDArray 3x4 @cpu(0)>

In [19]:
X * Y


[[ 0.  1.  8.  9.]
 [ 4. 10. 18. 28.]
 [32. 27. 20. 11.]]
<NDArray 3x4 @cpu(0)>

In [20]:
X / Y


[[ 0.    1.    0.5   1.  ]
 [ 4.    2.5   2.    1.75]
 [ 2.    3.    5.   11.  ]]
<NDArray 3x4 @cpu(0)>

In [21]:
Y.exp()


[[ 7.389056   2.7182817 54.59815   20.085537 ]
 [ 2.7182817  7.389056  20.085537  54.59815  ]
 [54.59815   20.085537   7.389056   2.7182817]]
<NDArray 3x4 @cpu(0)>

In [22]:
nd.dot(X,Y.T)


[[ 18.  20.  10.]
 [ 58.  60.  50.]
 [ 98. 100.  90.]]
<NDArray 3x3 @cpu(0)>

In [23]:
nd.concat(X,Y,dim = 0)


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]
 [ 2.  1.  4.  3.]
 [ 1.  2.  3.  4.]
 [ 4.  3.  2.  1.]]
<NDArray 6x4 @cpu(0)>

In [24]:
nd.concat(X,Y,dim = 1)


[[ 0.  1.  2.  3.  2.  1.  4.  3.]
 [ 4.  5.  6.  7.  1.  2.  3.  4.]
 [ 8.  9. 10. 11.  4.  3.  2.  1.]]
<NDArray 3x8 @cpu(0)>

In [25]:
X == Y


[[0. 1. 0. 1.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
<NDArray 3x4 @cpu(0)>

In [26]:
X.sum()


[66.]
<NDArray 1 @cpu(0)>

In [27]:
X


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]
<NDArray 3x4 @cpu(0)>

**$l_2 \ norm = \sqrt{\sum|a_{ij}|^2} $**

In [30]:
X.norm().asscalar()

22.494442

### Broadcast 

In [31]:
A = nd.arange(3).reshape((3,1))
B = nd.arange(2).reshape((1,2))
A,B

(
 [[0.]
  [1.]
  [2.]]
 <NDArray 3x1 @cpu(0)>,
 
 [[0. 1.]]
 <NDArray 1x2 @cpu(0)>)

In [32]:
A+B


[[0. 1.]
 [1. 2.]
 [2. 3.]]
<NDArray 3x2 @cpu(0)>

### Indexing

In [33]:
X


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]
<NDArray 3x4 @cpu(0)>

In [34]:
X[1:3]


[[ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]
<NDArray 2x4 @cpu(0)>

In [35]:
X[1,2]


[6.]
<NDArray 1 @cpu(0)>

In [36]:
X[1:2,:] = 12
X


[[ 0.  1.  2.  3.]
 [12. 12. 12. 12.]
 [ 8.  9. 10. 11.]]
<NDArray 3x4 @cpu(0)>

### Memory cost

**Results from operations would be assigned to a new address in the memory**

In [37]:
before = id(Y)
Y = Y + X
id(Y) == before

False

**Assign to a specific address**

In [38]:
Z = Y.zeros_like()
before = id(Z)
Z[:] = X + Y
id(Z) == before

True

**Avoid temporary memory waste**

In [39]:
nd.elemwise_add(X,Y,out = Z)
id(Z) == before

True

In [41]:
before = id(X)
X += Y
id(X) == before

True

### Transform between `NDArray` and `NumPy`

**`NumPy` to `NDArray`**

In [42]:
import numpy as np
P = np.ones((2,3))
D = nd.array(P)
D


[[1. 1. 1.]
 [1. 1. 1.]]
<NDArray 2x3 @cpu(0)>

**`NDArray` to `NumPy`**

In [43]:
D.asnumpy()

array([[1., 1., 1.],
       [1., 1., 1.]], dtype=float32)

## Calculate gradient

### Quick example
**Calculate the gradient of $y=2x^Tx$**

In [44]:
from mxnet import autograd,nd
x = nd.arange(4).reshape((4,1))
x


[[0.]
 [1.]
 [2.]
 [3.]]
<NDArray 4x1 @cpu(0)>

**Use `attach_grad()` to get the required memory space**

In [45]:
x.attach_grad()

In [46]:
with autograd.record():
    y = 2 * nd.dot(x.T,x)

**Use `backward()` to get the gradient, notice, if `y` is not a scalar, `MXNet` would sum all elements from `y` to get an scalar variable**

In [47]:
y.backward()

**As we assumed, the gradient of `x` equals to `4x`**

In [50]:
x.grad


[[ 0.]
 [ 4.]
 [ 8.]
 [12.]]
<NDArray 4x1 @cpu(0)>

### Training and Testing mode

In [51]:
print(autograd.is_training())

False


In [53]:
with autograd.record():
    print(autograd.is_training())

True


### get gradient of control flows

In [57]:
def f(a):
    b = a * 2
    while b.norm().asscalar() < 1000:
        b = b*2
    if b.sum().asscalar() > 0:
        c = b
    else:
        c = 100 * b
    return c

f(nd.array([1]))


[1024.]
<NDArray 1 @cpu(0)>

In [58]:
a = nd.random.normal(shape = 1)
a.attach_grad()
with autograd.record():
    c = f(a)
c.backward()

In [59]:
a.grad


[204800.]
<NDArray 1 @cpu(0)>

In [60]:
c / a


[204800.]
<NDArray 1 @cpu(0)>

## Summary
* Use `autograd` module to get the gradient

In [61]:
a = nd.random.normal(shape = (3,4))
a.attach_grad()
with autograd.record():
    c = f(a)
c.backward()

In [65]:
a.grad


[[25600. 25600. 25600. 25600.]
 [25600. 25600. 25600. 25600.]
 [25600. 25600. 25600. 25600.]]
<NDArray 3x4 @cpu(0)>

In [72]:
a.norm().asscalar()

4.5939484

In [63]:
a.grad == c / a.sum()


[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
<NDArray 3x4 @cpu(0)>