# linear algebra

Intially this was a type along from MXNet UC Bekeley Course on the Apache MXNet youtube channel

I've added to it examples from the mxnet docs for mxnet.ndarray and mxnet.contrib

[docs](https://mxnet.incubator.apache.org/versions/1.8.0/api/python/docs/api/ndarray/ndarray.html#)

In [1]:
from mxnet import nd
import mxnet as mx
print ('mx version ', mx.__version__)

mx version  1.9.0


In MXNet scalars are NDArrays with one element

In [2]:
x = nd.array([3.0])
y = nd.array([2.0])

print('x + y = ', x + y)
print('x - y = ', x - y)
print('x * y = ', x * y)
print('x / y = ', x / y)
print('x ** y = ', x ** y)
print('x ** y = ', nd.power(x,y))

x + y =  
[5.]
<NDArray 1 @cpu(0)>
x - y =  
[1.]
<NDArray 1 @cpu(0)>
x * y =  
[6.]
<NDArray 1 @cpu(0)>
x / y =  
[1.5]
<NDArray 1 @cpu(0)>
x ** y =  
[9.]
<NDArray 1 @cpu(0)>
x ** y =  
[9.]
<NDArray 1 @cpu(0)>


We can convert any NDArray to a python float by calling its asscalar method.  NOTE: this is a bad idea.  The NDArray has to stop doing anything else in order to hand the result and process control back to Python.  Its not done in parallel.

In [3]:
print(x)
print(x.asscalar())


[3.]
<NDArray 1 @cpu(0)>
3.0


# Vectors
Vectors are [1, 3, 4, 2]  as a 1D NDArray

In [4]:
x = nd.arange(5)
print('x = ', x)

x =  
[0. 1. 2. 3. 4.]
<NDArray 5 @cpu(0)>


In [5]:
x[3]


[3.]
<NDArray 1 @cpu(0)>

# Length, dimensionality and shape
The length of a vector is commonly called its *dimension*.  As an ordianry Python array, we can acces the length of an NDArray by calling Python's built-in len() function.

We can also access a vectors length via its `.shape` attribute.  The shape is a tuple that lists it dimensionality along each of its axes.


In [6]:
x.shape

(5,)

The word dimension is overloaded between number of axes and number of elements.  To avoid confusion, when we say 2D array or 3D array, we mean an array with 2 or 3 axes respectively.  But, if we say `n-dimensional` vector, we mean a vector of length `n`.

In [7]:
a = 2
x = nd.array([1,2,3])
y = nd.array([10,20,30])
print('a * x = ', a * x)
print('a * x + y = ', a * x + y)

a * x =  
[2. 4. 6.]
<NDArray 3 @cpu(0)>
a * x + y =  
[12. 24. 36.]
<NDArray 3 @cpu(0)>


# Matrices

Just as vectors generalize scalars from order 0 to order 1, matrices generalize vectors from 1D to 2D.  Matrices, which we'll typically denote with capital letters (A,B,C), are represented in codes as arrays with 2 axes.  Visually, we can draw a matrix as a table, where each entry $a_{ij}$ belongs to the i-th row and j-th column.

\begin{pmatrix}
a_{11} & a_{12} & a_{1m}\\
a_{21} & a_{22} & a_{2m}\\
a_{..} & a_{..} & a_{..}\\
a_{n1} & a_{n2} & a_{nm}
\end{pmatrix}

In [8]:
print(nd.arange(20))
A = nd.arange(20).reshape((2,10)) # NOTE, rows, columns
print('A = ', A)


[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
 18. 19.]
<NDArray 20 @cpu(0)>
A =  
[[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
 [10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]]
<NDArray 2x10 @cpu(0)>


We can access elements $a_{ij}$ by specifying row *i* and column *j*.  Leaving them blank, selects all?

Selecting via `:` also takes all elements in the respective dimension.

We can transpose the matrix with the `T` method.  That is, if $B=A^{T}$ then $b_{ij}=a_{ji}\,\, \forall{i,j}$

In [9]:
print('A[1] = ', A[1])

A[1] =  
[10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]
<NDArray 10 @cpu(0)>


In [10]:
print('A[1,:] = ', A[1,:])

A[1,:] =  
[10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]
<NDArray 10 @cpu(0)>


In [11]:
print('A.T = ', A.T)

A.T =  
[[ 0. 10.]
 [ 1. 11.]
 [ 2. 12.]
 [ 3. 13.]
 [ 4. 14.]
 [ 5. 15.]
 [ 6. 16.]
 [ 7. 17.]
 [ 8. 18.]
 [ 9. 19.]]
<NDArray 10x2 @cpu(0)>


## Subset a matrix

slice url 

https://mxnet.apache.org/versions/master/api/python/docs/api/legacy/ndarray/ndarray.html?highlight=mxnet%20ndarray%20arange#mxnet.ndarray.slice

In [12]:
four_by_four = nd.arange(16).reshape(4,4)
print('four_by_four = ', four_by_four)

four_by_four =  
[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]
 [12. 13. 14. 15.]]
<NDArray 4x4 @cpu(0)>


In [13]:
# when you use the :, its all rows.  But if numbers are on either side its a range, where the end is exclusive.
two_by_two = four_by_four[0:2,1:3]
print('two_by_two = ', two_by_two)

two_by_two =  
[[1. 2.]
 [5. 6.]]
<NDArray 2x2 @cpu(0)>


In [14]:
# you can create a sliced copied
another2x2 = four_by_four.slice(begin=(1,1), end=(3,3))
print('another2x2 = ', another2x2)

another2x2 =  
[[ 5.  6.]
 [ 9. 10.]]
<NDArray 2x2 @cpu(0)>


## indexing

```
[start:stop inclusive : step]
```

In [15]:
x = nd.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int32')
x


[0 1 2 3 4 5 6 7 8 9]
<NDArray 10 @cpu(0)>

In [16]:
# Select elements 1 through 7, and use a step of 2
x[1:7:2]


[1 3 5]
<NDArray 3 @cpu(0)>

# negative indexing

Negative i and j are interpreted as n + i and n + j where n is the number of elements in the corresponding dimension. 

Negative k makes stepping go towards smaller indices.

In [17]:
x[-2:10]


[8 9]
<NDArray 2 @cpu(0)>

In [18]:
x[10:6:-1]


[9 8 7]
<NDArray 3 @cpu(0)>

Anothe example using a matrix and nd arrays to assit with indexing

![img](imgs/ndarray_complex_index.png)

In [19]:
a = nd.arange(50).reshape(10, 5) # Array to be indexed
# b = 9 7 5 3 1  which means start with row 9 and step down to 
b = nd.arange(9, -1, -2) # Indexing array. Start at 9, step down by 2 each time until 1
print('a = ', a)
print('b (indexing) = ', b)
print('nd.arange(a.shape[1]) = ', nd.arange(a.shape[1]))
result = a[b, nd.arange(a.shape[1])]
print('result = ', result)

a =  
[[ 0.  1.  2.  3.  4.]
 [ 5.  6.  7.  8.  9.]
 [10. 11. 12. 13. 14.]
 [15. 16. 17. 18. 19.]
 [20. 21. 22. 23. 24.]
 [25. 26. 27. 28. 29.]
 [30. 31. 32. 33. 34.]
 [35. 36. 37. 38. 39.]
 [40. 41. 42. 43. 44.]
 [45. 46. 47. 48. 49.]]
<NDArray 10x5 @cpu(0)>
b (indexing) =  
[9. 7. 5. 3. 1.]
<NDArray 5 @cpu(0)>
nd.arange(a.shape[1]) =  
[0. 1. 2. 3. 4.]
<NDArray 5 @cpu(0)>
result =  
[45. 36. 27. 18.  9.]
<NDArray 5 @cpu(0)>


# Expand dimensions

[here](https://mxnet.incubator.apache.org/versions/1.8.0/api/python/docs/api/ndarray/ndarray.html#mxnet.ndarray.expand_dims)

Prototype

```
mxnet.ndarray.expand_dims(data=None, axis=_Null, out=None, name=None, **kwargs)
```    

Example

```
Inserts a new axis of size 1 into the array shape For example, given x with shape (2,3,4), then expand_dims(x, axis=1) will return a new array with shape (2,1,3,4).
```

In [20]:
X = nd.array([[0,1],[2,3]])
print(X)
X = X.reshape(1,2,2)
print(X)
X = X.expand_dims(axis=0)
print(X)


[[0. 1.]
 [2. 3.]]
<NDArray 2x2 @cpu(0)>

[[[0. 1.]
  [2. 3.]]]
<NDArray 1x2x2 @cpu(0)>

[[[[0. 1.]
   [2. 3.]]]]
<NDArray 1x1x2x2 @cpu(0)>


# shape_array

[here](https://mxnet.incubator.apache.org/versions/1.8.0/api/python/docs/api/ndarray/ndarray.html#mxnet.ndarray.shape_array)

Returns a 1D int64 array containing the shape of data

prototype

```
nd.shape_array(data=None, out=None, name=None, **kwargs)
```

Example
```
shape_array([[1,2,3,4], [5,6,7,8]]) = [2,4]
```

In [21]:
X = nd.array([[0,1],[2,3]])
print(X)
X = X.reshape(1,2,2)
print(X)
X = X.expand_dims(axis=0)
print(X)
X.shape_array()


[[0. 1.]
 [2. 3.]]
<NDArray 2x2 @cpu(0)>

[[[0. 1.]
  [2. 3.]]]
<NDArray 1x2x2 @cpu(0)>

[[[[0. 1.]
   [2. 3.]]]]
<NDArray 1x1x2x2 @cpu(0)>



[1 1 2 2]
<NDArray 4 @cpu(0)>

# Flatten an Array

In [22]:
# takes a 3,4,.. array and converts to a 2D
X = nd.array([[0,1],[2,3]])
print(X)
X = X.reshape(1,2,2)
print(X)
print('X.flatten()',X.flatten())

Y = nd.array([[0,1],[2,3]])
print(Y)
print('Y.flatten()',Y.flatten())



[[0. 1.]
 [2. 3.]]
<NDArray 2x2 @cpu(0)>

[[[0. 1.]
  [2. 3.]]]
<NDArray 1x2x2 @cpu(0)>
X.flatten() 
[[0. 1. 2. 3.]]
<NDArray 1x4 @cpu(0)>

[[0. 1.]
 [2. 3.]]
<NDArray 2x2 @cpu(0)>
Y.flatten() 
[[0. 1.]
 [2. 3.]]
<NDArray 2x2 @cpu(0)>


# Choosing elements based on index

[docs](https://mxnet.apache.org/versions/1.8.0/api/python/docs/api/ndarray/ndarray.html#mxnet.ndarray.choose_element_0index)

Picks elements from an input array according to the input indices along the given axis.

prototype
```
ndarray.choose_element_0index(data=None, index=None, axis=_Null, keepdims=_Null, mode=_Null, out=None, name=None, **kwargs)
```

In [23]:
X = nd.array([[0,1,2],[3,4,5],[6,7,8]])
print('X.shape ', X.shape)
print(X)
# these are equivalnet
#Y = nd.array([2,2,3]) # it seems if you specify a column > max column, it still picks the last column
Y = nd.array([2,2,2])



X.shape  (3, 3)

[[0. 1. 2.]
 [3. 4. 5.]
 [6. 7. 8.]]
<NDArray 3x3 @cpu(0)>


In [24]:
# these one makes sense. axis 1 is column
# for each row, pick colum specified by index, col=2, col=2, col=3, except no col=3? is it just the last one?
print(nd.choose_element_0index(X, Y)) # if axis is not specified, axis=1
print(nd.choose_element_0index(X, Y, axis=1)) 


[2. 5. 8.]
<NDArray 3 @cpu(0)>

[2. 5. 8.]
<NDArray 3 @cpu(0)>


In [25]:
# For each column, pick row 2,2,2
print(nd.choose_element_0index(X, Y, axis=0)) 
# no axis=2
#print(nd.choose_element_0index(X, Y, axis=2)) 



[6. 7. 8.]
<NDArray 3 @cpu(0)>


In [26]:
X = nd.array([[0,1,2],[3,4,5],[6,7,8]])
print('X.shape ', X.shape)
print(X)

X.shape  (3, 3)

[[0. 1. 2.]
 [3. 4. 5.]
 [6. 7. 8.]]
<NDArray 3x3 @cpu(0)>


In [27]:
Y = nd.array([[1],[0],[2]])
print('Y.shape ', Y.shape)
print(Y)

Y.shape  (3, 1)

[[1.]
 [0.]
 [2.]]
<NDArray 3x1 @cpu(0)>


In [28]:
# for each row, pick col=1,col=0, col=2 and maintain the rows
nd.choose_element_0index(X, Y, axis=1, keepdims=True)


[[1.]
 [3.]
 [8.]]
<NDArray 3x1 @cpu(0)>

# pick is equivalent

[docs](https://mxnet.apache.org/versions/1.8.0/api/python/docs/api/ndarray/ndarray.html#mxnet.ndarray.pick)

In [29]:
X = nd.array([[0,1,2],[3,4,5],[6,7,8]])
print('X.shape ', X.shape)
print(X)

X.shape  (3, 3)

[[0. 1. 2.]
 [3. 4. 5.]
 [6. 7. 8.]]
<NDArray 3x3 @cpu(0)>


In [30]:
Y = nd.array([[1],[0],[2]])
print('Y.shape ', Y.shape)
print(Y)

Y.shape  (3, 1)

[[1.]
 [0.]
 [2.]]
<NDArray 3x1 @cpu(0)>


In [31]:
# picks elements with specified indices along axis 1 and dims are maintained
nd.pick(X, Y, axis=1, keepdims=True)


[[1.]
 [3.]
 [8.]]
<NDArray 3x1 @cpu(0)>

# Tensors
Just as vectors genralize scalars, and matrices generalize vectors, we can increase the number of axes.  When working with images the axes correspond to the height, width and the three RGB color channels.

In [32]:
X = nd.arange(24).reshape((2,3,4))  # two submaxtrices of three rows by four columns
print('X = ', X)
print('X.shape = ', X.shape)


X =  
[[[ 0.  1.  2.  3.]
  [ 4.  5.  6.  7.]
  [ 8.  9. 10. 11.]]

 [[12. 13. 14. 15.]
  [16. 17. 18. 19.]
  [20. 21. 22. 23.]]]
<NDArray 2x3x4 @cpu(0)>
X.shape =  (2, 3, 4)


# Basic properties of tensor artihmetic

Given two tensors `X` and `Y` with the same shape, $\alpha X + Y$ has the same shape (numerical mathematicians call this the `AXPY` operation)

In [33]:
a = 2
x = nd.ones(3)
y = nd.zeros(3)
print('x.shape = ', x.shape)
print('y.shape = ', y.shape)
print('(a * x).shape = ', (a * x).shape)
print('(a * x + y).shape = ', (a * x + y).shape)

x.shape =  (3,)
y.shape =  (3,)
(a * x).shape =  (3,)
(a * x + y).shape =  (3,)


# sums and means
In math we express sums using the $\sum$ symbol.  To express the sum of elements in a vector `u` of length `d`, we can write $\sum_{i=i}^{i=d}{u_{i}}$   In code, we can just call `nd.sum()`

In [34]:
x = nd.arange(12)
x = x.reshape((3,4))
print('x = ', x)
print('nd.sum(x) = ', nd.sum(x))

x =  
[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]
<NDArray 3x4 @cpu(0)>
nd.sum(x) =  
[66.]
<NDArray 1 @cpu(0)>


Sum along a row

In [35]:
print('nd.sum(x,0) = ', nd.sum(x,0))
# or
print('nd.sum(x,0) = ', nd.sum(x,axis=0))


nd.sum(x,0) =  
[12. 15. 18. 21.]
<NDArray 4 @cpu(0)>
nd.sum(x,0) =  
[12. 15. 18. 21.]
<NDArray 4 @cpu(0)>


sum along a column

In [36]:
print('nd.sum(x,0) = ', nd.sum(x,axis=1))
# or
print('nd.sum(x,0) = ', nd.sum(x,1))


nd.sum(x,0) =  
[ 6. 22. 38.]
<NDArray 3 @cpu(0)>
nd.sum(x,0) =  
[ 6. 22. 38.]
<NDArray 3 @cpu(0)>


sum along a row and column

In [37]:
print('nd.sum(x,0) = ', nd.sum(x,(0,1)))


nd.sum(x,0) =  
[66.]
<NDArray 1 @cpu(0)>


# Mean
A related quantity is the mean.  We calculate the mean by dividing the sum by the total number of elements.  In code this is `nd.mean()`

$$mean(u) = \frac{1}{d} \sum_{i=1}^{i=d} u_{i} $$

and

$$mean(A) = \frac{1}{n*m} \sum_{i=1}^{i=m} \sum_{j=1}^{j=n} a_{ij} $$

In [38]:
print('A = ', A)
print('nd.mean(A) = ', nd.mean(A))
print('nd.sum(A) / A.size = ', nd.sum(A) / A.size)

A =  
[[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
 [10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]]
<NDArray 2x10 @cpu(0)>
nd.mean(A) =  
[9.5]
<NDArray 1 @cpu(0)>
nd.sum(A) / A.size =  
[9.5]
<NDArray 1 @cpu(0)>


# Dot products

Given two vectors `u` and `v`, the dot product $u^Tv$ is a sum over the products of the corresponding elements: $$u^Tv = \sum_{i=1}^{i=d} u_i * v_i$$

In [39]:
x = nd.arange(4) + 1.0
y = nd.ones(4)
print('x = ', x)
print('y = ', y)
print('nd.dot(x,y) = ', nd.dot(x,y))
# also equivalent
print('(x * y).sum() = ', (x * y).sum())

x =  
[1. 2. 3. 4.]
<NDArray 4 @cpu(0)>
y =  
[1. 1. 1. 1.]
<NDArray 4 @cpu(0)>
nd.dot(x,y) =  
[10.]
<NDArray 1 @cpu(0)>
(x * y).sum() =  
[10.]
<NDArray 1 @cpu(0)>


Note that we can express the dot product of two vectors `nd.dot(u,v)` equivalently by performing an element-wise multiplicatoins and then a sum:

In [40]:
nd.sum(x * y)


[10.]
<NDArray 1 @cpu(0)>

# Matrix-vector products

\begin{align}
\begin{vmatrix}
a_{11} & a_{12} \\
a_{21} & a_{22}
\end{vmatrix}
\begin{vmatrix} x \\ y \end{vmatrix} = 
\begin{vmatrix}
a_1^T x \\
a_2^T y
\end{vmatrix} = 
\begin{vmatrix}
x a_{11} + y a_{12} \\
x a_{12} + y a_{22}
\end{vmatrix}
\end{align}

$\in$ is member of

$\newcommand{\R}{\mathbb{R}} \R$ is real number. all numbers negative and positive including fractions


So you can think of multiplication by a matrix $\newcommand{\R}{\mathbb{R}} A\in\R^{mxn}$ as trans that projects vectors from $\R^m to \R^n$ 

We can also use matrix-vector products to describe the calculations of each layer in a neural network.  Expressing matrix-vector products in code with `ndarray`, we use the same `nd.dot()` function as for dot products.

In [41]:
print('A = ', A) # a 2x10
print('x = ', x) # a 1x4

# He did this.  It does not work for me.  I look in the online help text
# and nothing is mentioned about truncating rows. ie. if 4 columns it needs 5 rows
# and not three as in his text.  (Another reason to look for the git repo rather than typing along!)
#A = A.reshape((3,4))   

# later he refers to this as 3x4.  He must have had a 2x6?

# My choice
A = A.reshape((5,4))
print('A now is = ', A)


# * is an element by element multiply. it mults X by each row.
# first row is 1*0  2*1 3*2 4*3
# second row is 1*4 2*5 3*6 4*7 etc
print('A*x = ', A*x)

# dot is a row sum. Notice its a sum across a row
print('nd.dot(A,x) = ', nd.dot(A,x))


# 
# He mentioned matlab .* notation is equivalent to nd *

# He also mentions broadcast of repeat entries of X until you get all elements of A.


A =  
[[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
 [10. 11. 12. 13. 14. 15. 16. 17. 18. 19.]]
<NDArray 2x10 @cpu(0)>
x =  
[1. 2. 3. 4.]
<NDArray 4 @cpu(0)>
A now is =  
[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]
 [12. 13. 14. 15.]
 [16. 17. 18. 19.]]
<NDArray 5x4 @cpu(0)>
A*x =  
[[ 0.  2.  6. 12.]
 [ 4. 10. 18. 28.]
 [ 8. 18. 30. 44.]
 [12. 26. 42. 60.]
 [16. 34. 54. 76.]]
<NDArray 5x4 @cpu(0)>
nd.dot(A,x) =  
[ 20.  60. 100. 140. 180.]
<NDArray 5 @cpu(0)>


# Matrix-Matrix multiplication

If you have gotten the hang of dot products and matrix-vector multiplication, then matrix-matrix multiplications should be pretty straight forward.

Say we have two matrices, $\newcommand{\R}{\mathbb{R}} A\in\R^{nxk}\, and\, B\in\R^{kxm}$

\begin{align}
A=\begin{vmatrix}
a_{11} & a_{12} \\
a_{21} & a_{nk}
\end{vmatrix}
B=\begin{vmatrix}
b_{11} & b_{12} \\
b_{21} & b_{km}
\end{vmatrix}
\end{align}


You can think of the matrix-matrix multiplication $AB$ as simply performing `m` matrix-vector products and stiching the results together. 

\begin{align}
AB=\begin{vmatrix}
\cdots & a_1^T & \cdots \\
\cdots & a_2^T & \cdots \\
\cdots & a_n^T & \cdots 
\end{vmatrix}
\begin{vmatrix}
\cdots & \cdots & \cdots \\
b_{1} & b_{2} & b_{m} \\
\cdots & \cdots & \cdots 
\end{vmatrix}
=\begin{vmatrix}
a_1^Tb_1 & a_1^Tb_2 & .. & a_1^Tb_m \\
a_2^Tb_1 & a_2^Tb_2 & .. & a_1^Tb_m \\
a_n^Tb_1 & a_n^Tb_2 & .. & a_n^Tb_m
\end{vmatrix}
\end{align}

In [42]:
A=nd.arange(4).reshape((2,2))
print('A = ', A)
B=nd.arange(1, stop=5, step=1).reshape((2,2))
print('B = ', B)

# This is for properly size matrices
nd.dot(A,B)

# If the dims are not correct, I've seen the transpose operator used
# nd.dot(A,B.T)

A =  
[[0. 1.]
 [2. 3.]]
<NDArray 2x2 @cpu(0)>
B =  
[[1. 2.]
 [3. 4.]]
<NDArray 2x2 @cpu(0)>



[[ 3.  4.]
 [11. 16.]]
<NDArray 2x2 @cpu(0)>

### Multiplication element wise versus normal matrix multiply

### element-wise multiplication

this is matlab .*? or is it .*.? Can't remember. need to test

In [43]:
L=nd.array(nd.array([[1,2],[3,4]]))
#L=nd.array(nd.array([[1,1]]))
L.shape

(2, 2)

In [44]:
R=nd.array(nd.array([[0,-1],[1,0]]))
R


[[ 0. -1.]
 [ 1.  0.]]
<NDArray 2x2 @cpu(0)>

In [45]:
L*R


[[ 0. -2.]
 [ 3.  0.]]
<NDArray 2x2 @cpu(0)>

### Normal multiplication

In [46]:
nd.dot(L,R)


[[ 2. -1.]
 [ 4. -3.]]
<NDArray 2x2 @cpu(0)>

# Norms

All norms must satisfy a handful of properties:

* $  \left\Vert  \alpha A \right\Vert = \begin{vmatrix} \alpha \end{vmatrix} \left\Vert A \right\Vert$
* $  \left\Vert A + B \right\Vert \leq \left\Vert A \right\Vert + \left\Vert B \right\Vert$
* $  \left\Vert A  \right\Vert \geq 0$
* $  \text{If }\forall\,i,j,a_{ij} = 0, then \left\Vert A  \right\Vert = 0$

To calculate the $L_{2}$ norm, we can just call `nd.norm()`

In [47]:
nd.norm(x)


[5.477226]
<NDArray 1 @cpu(0)>

To calculate the $L_1$ norm we can simply perform the absolute value and then sum over elements.

https://mxnet.apache.org/versions/1.6/api/r/docs/api/mx.symbol.sum.html

In [48]:
print("L1 norm = ", nd.sum(nd.sum(x)))
# or
print("L1 norm = ", x.abs().sum())


L1 norm =  
[10.]
<NDArray 1 @cpu(0)>
L1 norm =  
[10.]
<NDArray 1 @cpu(0)>


# SUM

Computes the sum of array elements over given axes.

[docs](https://mxnet.apache.org/versions/1.8.0/api/python/docs/api/ndarray/ndarray.html#mxnet.ndarray.sum)

prototype

```
sum(data=None, axis=_Null, keepdims=_Null, exclude=_Null, out=None, name=None, **kwargs)
```

In [49]:
X = nd.array([[0,1,2],[3,4,5],[6,7,8]])
print('X.shape ', X.shape)
print(X)

X.shape  (3, 3)

[[0. 1. 2.]
 [3. 4. 5.]
 [6. 7. 8.]]
<NDArray 3x3 @cpu(0)>


In [50]:
X.sum(axis=0)


[ 9. 12. 15.]
<NDArray 3 @cpu(0)>

In [51]:
X.sum(axis=1, keepdims=True)


[[ 3.]
 [12.]
 [21.]]
<NDArray 3x1 @cpu(0)>

# CUMSUM

This is interesting.  It sums along the axis.  For output with axis=0/row,  

* first row is just the first row.
* second row is sum of first and second row.
* third ....

[docs](https://mxnet.apache.org/versions/1.8.0/api/python/docs/api/ndarray/ndarray.html#mxnet.ndarray.cumsum)

prototype

```
ndarray.cumsum(a=None, axis=_Null, dtype=_Null, out=None, name=None, **kwargs)
```

In [52]:
X = nd.array([[0,1,2],[3,4,5],[6,7,8]])
print('X.shape ', X.shape)
print(X)

X.shape  (3, 3)

[[0. 1. 2.]
 [3. 4. 5.]
 [6. 7. 8.]]
<NDArray 3x3 @cpu(0)>


In [53]:
nd.cumsum(X, axis=0)


[[ 0.  1.  2.]
 [ 3.  5.  7.]
 [ 9. 12. 15.]]
<NDArray 3x3 @cpu(0)>

# where
Return the elements, either from x or y, depending on the condition.

[docs](https://mxnet.apache.org/versions/1.8.0/api/python/docs/api/ndarray/ndarray.html#mxnet.ndarray.where)

prototype

```
mxnet.ndarray.where(condition=None, x=None, y=None, out=None, name=None, **kwargs)
```

#### from api example

In [83]:
# official example 1
#  [[1, 2]
#   [3, 4]]

# [[5, 6]
#  [7, 8]]
#

# [[ 0, 1]
#  [-1, 0 ]

# 0 pulls Y in same position 
# 1 pulls X  "True pull x"
# -1 pulls X "-1 is non zero so true, pull x"
# 0 pulls Y  "0 is false pull y"
x = nd.array([[1, 2], [3, 4]])
y = nd.array([[5, 6], [7, 8]])
cond = nd.array([[0, 1], [-1, 0]])
# should return  = [[5, 2], [3, 8]]
nd.where(cond, x, y)


[[5. 2.]
 [3. 8.]]
<NDArray 2x2 @cpu(0)>

In [82]:
# official example 2
csr_cond = nd.cast_storage(cond, 'csr')
# should return  = [[5, 2], [3, 8]]
nd.where(csr_cond, x, y)


[[5. 2.]
 [3. 8.]]
<NDArray 2x2 @cpu(0)>

In [54]:
X = nd.array([[1,2,0],[3,4,0],[6,7,0]])
print('X.shape ', X.shape)
print(X)

X.shape  (3, 3)

[[1. 2. 0.]
 [3. 4. 0.]
 [6. 7. 0.]]
<NDArray 3x3 @cpu(0)>


In [67]:
Y = nd.array([1, 0, 0])
print('Y.shape ', Y.shape)
print(Y)

Y.shape  (3,)

[1. 0. 0.]
<NDArray 3 @cpu(0)>


In [68]:
nd.where(condition=Y, x=X, y=X)


[[1. 2. 0.]
 [3. 4. 0.]
 [6. 7. 0.]]
<NDArray 3x3 @cpu(0)>

### From Thomas Delteil

[here](https://github.com/apache/incubator-mxnet/issues/8546#issuecomment-402563089)

In [66]:
labels = nd.array([-1, 0, 1, 0, 1])
cls_scores = nd.array([0.1, 0.2, 0.3, 0.4, 0.5])
#cls_scores = nd.where(labels != -1, cls_scores, labels)  
# Kind of merges x and y tensors, pulling y, when condition is not met. pulling x when met.
cls_scores = nd.where(labels != -1, x=cls_scores, y=labels)  
cls_scores


[-1.   0.2  0.3  0.4  0.5]
<NDArray 5 @cpu(0)>

In [84]:
labels = nd.array([1, 0, 1, 0, 1])
cls_scores = nd.array([0.1, 0.2, 0.3, 0.4, 0.5])
#cls_scores = nd.where(labels != -1, cls_scores, labels)  
# Kind of merges x and y tensors, pulling y, when condition is not met. pulling x when met.
cls_scores = nd.where(labels == 0, x=cls_scores, y=labels)  
cls_scores


[1.  0.2 1.  0.4 1. ]
<NDArray 5 @cpu(0)>