# NDArray Tutorial


One of the main object in MXNet is the multidimensional array provided by the package `mxnet.ndarray`, or `mxnet.nd` for short. If you familiar with the scientific computing python package [NumPy](http://www.numpy.org/), `mxnet.ndarray` is similar to `numpy.ndarray` in many aspects. 

## The basic

A multidimensional array is a table of numbers with the same type. For example, the coordinates of a point in 3D space `[1, 2, 3]` is a 1-dimensional array with that dimension has a length of 3. The following picture shows a 2-dimensional array. The length of the first dimension is 2, and the second dimension has a length of 3
```
[[0, 1, 2]
 [3, 4, 5]]
```
The array class is called `NDArray`. Some important attributes of a `NDArray` object are:

- **ndarray.shape** the dimensions of the array. It is a tuple of integers indicating the length of the array in each dimension. For a matrix with `n` rows and `m` columns, the `shape` will be `(n, m)`.  
- **ndarray.dtype** an `numpy` object describing the type of the elements.
- **ndarray.size** the total number(总的数量) of numbers in the array, which equals to the product of the elements of `shape`
- **ndarray.context** the device this array is stored. A device can be the CPU or the i-th GPU.
- **ndarray.handle** the pointer to the according C++ object. Normally we won't need to use this attribute. 

### An example

In [1]:
import mxnet as mx
a = mx.nd.array([[2,3]])
{'shape': a.shape, 'size':a.size, 'data type':a.dtype, 'context':a.context, 'type':type(a)}

{'context': cpu(0),
 'data type': numpy.float32,
 'shape': (1L, 2L),
 'size': 2,
 'type': mxnet.ndarray.NDArray}

### Array Creation 
An array can be created in multiple ways. For example, we can create an array from a regular Python list or tuple by using the `array` function

In [2]:
a = mx.nd.array([1,2,3])  # create a 1-dimensional array with a python list
b = mx.nd.array([[1,2,3], [2,3,4]])  # create a 2-dimensional array with a nested（嵌套） python list 

or even from an `numpy.ndarray` object

In [3]:
import numpy as np
c = np.arange(15).reshape(3,5)
a = mx.nd.array(c)  # create a 2-dimensional array from a numpy.ndarray object

We can specify the element type with the option `dtype`, which accepts a numpy type. In default, `float32` is used. 
#我们能够指定元素的类型和精度

In [4]:
a = mx.nd.array([1,2,3])  # float32 is used in deafult
b = mx.nd.array([1,2,3], dtype=np.int32)  # create an int32 array
c = mx.nd.array([1.2, 2.3], dtype=np.float16)  # create a 16-bit float array
(a.dtype, b.dtype, c.dtype)

(numpy.float32, numpy.int32, numpy.float16)

If we only know the size but not the element values, there are several functions to create arrays with initial placeholder content. 
当我们只知道size而不知道元素值时，我们可以利用下面的方法初始化

In [5]:
a = mx.nd.zeros((2,3))    # create a 2-dimensional array full of zeros with shape (2,3)  
b = mx.nd.ones((2,3))     # create a same shape array full of ones
c = mx.nd.full((2,3), 7)  # create a same shape array with all elements set to 7
d = mx.nd.empty((2,3))    # create a same shape whose initial content is random and depends on the state of the memory

### Printing Arrays
###NDArray是无法直接打印的，我们必须利用函数asnumpy，将NDArray转换成numpy.ndarray,才能打印。
We often first convert `NDArray` to `numpy.ndarray` by the function `asnumpy` for printing. Numpy uses the following layout:
- the last axis is printed from left to right,
- the second-to-last is printed from top to bottom,
- the rest are also printed from top to bottom, with each slice separated from the next by an empty line.

In [6]:
b = mx.nd.ones((2,3))
print(b.asnumpy())
c = mx.nd.zeros((1000,1000))
print(c.asnumpy())

[[ 1.  1.  1.]
 [ 1.  1.  1.]]
[[ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 ..., 
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]
 [ 0.  0.  0. ...,  0.  0.  0.]]


### Copies
### 复制操作，在python里不能用“=”去复制，在python“=”是引用操作，要想复制，必须使用函数完成
Data is *NOT* copied in normal assignment and function arguments passing.

In [7]:
a = mx.nd.ones((2,2))
b = a  # copy by reference 
print(b is a)
def f(x):  # also copy by reference
    print(id(x))
f(a)
print(id(a))

True
140484456318800
140484456318800


####The `copy` method makes a deep copy of the array and its data

In [8]:
b = a.copy()
print (b is a)

False


####We can also use the `copyto` method or the slice operator `[]` to avoid additional memory allocation 
####可以避免额外的内存开销

In [9]:
b = mx.nd.ones(a.shape)
print(id(b))
b[:] = a
print(id(b))
a.copyto(b)
print(id(b))

140484456317456
140484456317456
140484456317456


### Basic Operations
### 基本操作
Arithmetic operators on arrays apply *elementwise*. A new array is created and filled with the result.
##元素级别元素
算上运算是`元素级别`的，一个新的数组被创建并且用结果填充

In [10]:
a = mx.nd.ones((2,3))
b = mx.nd.ones((2,3))
c = a + b  # elementwise plus
d = - c    # elementwise minus
print(d.asnumpy())
e = mx.nd.sin(c**2).T  # elementwise pow and sin, and then transpose
print(e.asnumpy())
f = mx.nd.maximum(a, c)  # elementwise max
print(f.asnumpy())

[[-2. -2. -2.]
 [-2. -2. -2.]]
[[-0.7568025 -0.7568025]
 [-0.7568025 -0.7568025]
 [-0.7568025 -0.7568025]]
[[ 2.  2.  2.]
 [ 2.  2.  2.]]


Simiar to `NumPy`, `*` is used for elementwise multiply, while matrix-matrix multiplication is left for `dot`
####`*`对应元素相乘，`dot`矩阵相乘

In [11]:
a = mx.nd.ones((2,2))
b = a * a
c = mx.nd.dot(a,a)
print(b.asnumpy())
print(c.asnumpy())

[[ 1.  1.]
 [ 1.  1.]]
[[ 2.  2.]
 [ 2.  2.]]


The assignment operators（复制运算符） such as `+=` and `*=` act in place（内部运算） to modify an existing array rather than create a new one.

In [12]:
a = mx.nd.ones((2,2))
b = mx.nd.ones(a.shape)
print(id(b))
b += a
print(id(b))
print(b.asnumpy())

140484456318736
140484456318736
[[ 2.  2.]
 [ 2.  2.]]


### Indexing and Slicing
The slice operator `[]` applies on axis 0. 

In [13]:
a = mx.nd.array(np.arange(6).reshape(3,2))
print(a[:].asnumpy())
a[1:2] = 1
print(a.asnumpy())

[[ 0.  1.]
 [ 2.  3.]
 [ 4.  5.]]
[[ 0.  1.]
 [ 1.  1.]
 [ 4.  5.]]


We can also slice a particular（详细的） axis with the method `slice_axis`

In [14]:
d = mx.nd.slice_axis(a, axis=1, begin=1, end=2)
print d.asnumpy()

[[ 1.]
 [ 1.]
 [ 5.]]


### Shape Manipulation 
### Shape 操作
The shape of the array can be changed as long as（只要） the size remaining the same 

In [15]:
a = mx.nd.array(np.arange(24).reshape(4,6))
print(a.asnumpy())
b = a.reshape((2,3,4))#维度的数目也能改变
print(b.asnumpy())

[[  0.   1.   2.   3.   4.   5.]
 [  6.   7.   8.   9.  10.  11.]
 [ 12.  13.  14.  15.  16.  17.]
 [ 18.  19.  20.  21.  22.  23.]]
[[[  0.   1.   2.   3.]
  [  4.   5.   6.   7.]
  [  8.   9.  10.  11.]]

 [[ 12.  13.  14.  15.]
  [ 16.  17.  18.  19.]
  [ 20.  21.  22.  23.]]]


####Method `concatenate` stacks multiple arrays along the first dimension. (Their shapes must be the same).
####`concatenate`方法将多个数组沿着第一个维度连接起来，这些数组的shape必须相同，否则就无法连接起来了

In [16]:
a = mx.nd.ones((2,3))
b = mx.nd.ones((2,3))*2
c = mx.nd.concatenate([a,b])
print(c.asnumpy())

[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 2.  2.  2.]
 [ 2.  2.  2.]]


### Reduce
### 求和，降低

We can reduce the array to a scalar, or along a particular axis.
我们能降低数组到一个`标量`，或者沿着一个特定的轴，

In [17]:
a = mx.nd.ones((2,3))
b = mx.nd.sum(a)  # sum over all elements
print(b.asnumpy())
c = mx.nd.sum_axis(a, axis=1)  # sum over axis 1
print(c.asnumpy())

[ 6.]
[ 3.  3.]


# Broadcast
# 广播操作
We can also broadcast an array by duplicating（复制）.

In [18]:
a = mx.nd.array(np.arange(6).reshape(6,1))
b = a.broadcast_to((6,2))  # broadcast along axis 1
print(b.asnumpy())
c = a.reshape((2,1,1,3))
d = c.broadcast_to((2,2,2,3))  # broadcast along axes 1 and 2.
print(d.asnumpy())

[[ 0.  0.]
 [ 1.  1.]
 [ 2.  2.]
 [ 3.  3.]
 [ 4.  4.]
 [ 5.  5.]]
[[[[ 0.  1.  2.]
   [ 0.  1.  2.]]

  [[ 0.  1.  2.]
   [ 0.  1.  2.]]]


 [[[ 3.  4.  5.]
   [ 3.  4.  5.]]

  [[ 3.  4.  5.]
   [ 3.  4.  5.]]]]


Broadcast can be applied to operations such as `*` and `+`. 

In [19]:
a = mx.nd.ones((3,1))
b = mx.nd.ones((1,2))
c = a + b #当维度不一样时，将会自动调整维度
print(c.asnumpy())

[[ 2.  2.]
 [ 2.  2.]
 [ 2.  2.]]


## The Advanced 
There are some advanced features in `mxnet.ndarray` which make mxnet different from other libraries. 

### GPU Support

In default operators are executed on CPU. It is easy to switch to another computation resource, such as GPU, if available. The device information is stored in `ndarray.context`. When MXNet is compiled with flag `USE_CUDA=1` and there is at least one Nvidia GPU card, we can make all computations run on GPU 0 by using context `mx.gpu(0)`, or simply `mx.gpu()`. If there are more than two GPUs, the 2nd GPU is represented by `mx.gpu(1)`.

In [20]:
def f():
    a = mx.nd.ones((100,100))
    b = mx.nd.ones((100,100))
    c = a + b
    print('running on ', c.context)
    print(c.asnumpy())
f()  # in default mx.cpu() is used
with mx.Context(mx.gpu()):  # change the default context to the first GPU 统一指明
    f()


('running on ', cpu(0))
[[ 2.  2.  2. ...,  2.  2.  2.]
 [ 2.  2.  2. ...,  2.  2.  2.]
 [ 2.  2.  2. ...,  2.  2.  2.]
 ..., 
 [ 2.  2.  2. ...,  2.  2.  2.]
 [ 2.  2.  2. ...,  2.  2.  2.]
 [ 2.  2.  2. ...,  2.  2.  2.]]
('running on ', gpu(0))
[[ 2.  2.  2. ...,  2.  2.  2.]
 [ 2.  2.  2. ...,  2.  2.  2.]
 [ 2.  2.  2. ...,  2.  2.  2.]
 ..., 
 [ 2.  2.  2. ...,  2.  2.  2.]
 [ 2.  2.  2. ...,  2.  2.  2.]
 [ 2.  2.  2. ...,  2.  2.  2.]]


We can also explicitly specify the context when creating an array

In [21]:
a = mx.nd.ones((100, 100), mx.gpu(0))#分别精确指明
b = mx.nd.ones((100, 100), mx.gpu(0))
c = a + b
print(c.context)

gpu(0)


####Currently MXNet requires two arrays to `sit on the same device for computation`. There are several methods for copying data between devices.

In [22]:
a = mx.nd.ones((100,100), mx.cpu())
b = mx.nd.ones((100,100), mx.gpu())
c = mx.nd.ones((100,100), mx.gpu())
a.copyto(c)  # copy from CPU to GPU
d = b + c
print(d.context)
e = b.as_in_context(c.context) + c  # same to above
print(e.context)

gpu(0)
gpu(0)


### Serialize(序列化) From/To (Distributed) Filesystems  
###保存和加载
There are two ways to save data to (load from) disks easily. The first way uses `pickle`. `NDArray` is pickle compatible.

In [23]:
import pickle as pkl
a = mx.nd.ones((2, 3))
# pack and then dump（倾倒） into disk
data = pkl.dumps(a)
pkl.dump(data, open('tmp.pickle', 'wb'))
# load from disk and then unpack 
data = pkl.load(open('tmp.pickle', 'rb'))
b = pkl.loads(data)
print(b.asnumpy())

[[ 1.  1.  1.]
 [ 1.  1.  1.]]


The second way is to directly dump into disk in binary format by method `save` and `load`. 

In [24]:
# load and save a list
a = mx.nd.ones((2,3))
b = mx.nd.ones((2,3))*2               
mx.nd.save("temp.ndarray", [a,b])
c = mx.nd.load("temp.ndarray")
print(c[0].asnumpy())
print(c[1].asnumpy())

[[ 1.  1.  1.]
 [ 1.  1.  1.]]
[[ 2.  2.  2.]
 [ 2.  2.  2.]]


In [25]:
# load and save a dict
mx.nd.save("temp.ndarray", {'a':a, 'b':b})#保存成字典的形式
c = mx.nd.load("temp.ndarray")
print(c['a'].asnumpy())
print(c['b'].asnumpy())

[[ 1.  1.  1.]
 [ 1.  1.  1.]]
[[ 2.  2.  2.]
 [ 2.  2.  2.]]


The load/save is better than pickle in two aspects
1. The data saved with the Python interface can be used by another lanuage binding. For example, if we save the data in python:
```python
a = mx.nd.ones((2, 3))
mx.save("temp.ndarray", [a,])
```
then we can load it into R:
```R
a <- mx.nd.load("temp.ndarray")
as.array(a[[1]])
##      [,1] [,2] [,3]
## [1,]    1    1    1
## [2,]    1    1    1
```
2. If a distributed filesystem such as Amazon S3 or Hadoop HDFS is set up, we can directly save to and load from it. 
```python
mx.nd.save('s3://mybucket/mydata.ndarray', [a,])  # if compiled with USE_S3=1
mx.nd.save('hdfs///users/myname/mydata.bin', [a,])  # if compiled with USE_HDFS=1
```


### Lazy Evaluation（惰性计算） and Auto Parallelization *

MXNet uses lazy evaluation for better performance. When we run `a=b+1` in python, the python thread just pushs the operation into the backend engine and then returns. There are two benefits for such optimization:
1. The main python thread can continue to execute（执行） other computations once the previous one is pushed. It is useful for frontend languages with heavy overheads. 
2. It is easier for the backend engine to explore further optimization, such as auto parallelization that will be discussed shortly. 

The backend engine is able to resolve（决定） the data dependencies（依赖性） and schedule（安排） the computations correctly. It is transparent to frontend users. We can explicitly call the method `wait_to_read` on the result array to wait the computation finished. Operations that copy data from an array to other packages, such as `asnumpy`, will implicitly（含蓄） call `wait_to_read`. 

In [26]:
import time

def do(x, n):
    """push computation into the backend engine"""
    return [mx.nd.dot(x,x) for i in range(n)]
def wait(x):
    """wait until all results are available"""
    for y in x:
        y.wait_to_read()
        
tic = time.time()
a = mx.nd.ones((1000,1000))
b = do(a, 50)
toc = time.time() - tic
print('time for all computations are pushed into the backend engine: %f sec' % (time.time() - tic))
wait(b)
print('time for all computations are finished: %f sec' % (time.time() - tic))

time for all computations are pushed into the backend engine: 0.002128 sec
time for all computations are finished: 0.820602 sec


Besides analyzing data read and write dependencies, the backend engine is able to schedule computations with no dependency in parallel. For example, in the following codes
```python
a = mx.nd.ones((2,3))
b = a + 1
c = a + 2
d = b * c
```
the second and third sentences can be executed in parallel. 

In [27]:
# run computation on CPU first, and then on GPU
n = 50
a = mx.nd.ones((1000,1000))
b = mx.nd.ones((2000,2000), mx.gpu())
tic = time.time()
c = do(a, n)
wait(c)
d = do(b, n)
wait(d)
print('time for all computations are finished: %f sec' % (time.time() - tic))


time for all computations are finished: 1.437338 sec


In [28]:
# the backend engine will try to parallel the CPU and GPU computation.
tic = time.time()
c = do(a, n)
d = do(b, n)
wait(c)
wait(d)
print('improved parallelization: %f sec' % (time.time() - tic))


improved parallelization: 1.104402 sec


## Current Status

We try our best to keep the NDArray API as the same numpy's. But it is not fully numpy compatible yet. Here we summary some major difference, which we hope to be fixed in a short time. We are also welcome to any contribution.

- Slice and Index. 
    - NDArray can only slice one dimension at each time, namely we cannot use `x[:, 1]` to slice both dimensions.
    - Only continues indexes are supported, we cannot do `x[1:2:3]`
    - boolean indices are not supported, such as `x[y==1]`.
- Lack of reduce functions such as `max`, `min`...

## Futher Readings
- [NDArray API](http://mxnet.dmlc.ml/en/latest/packages/python/ndarray.html) Documents for all NDArray methods.
- [MinPy](https://github.com/dmlc/minpy) on-going project, fully numpy compatible with GPU and auto differentiation supports 