# **Chapter 4 - THE PRELIMINARIES: A CRASHCOURSE**

## **4.1 Data Manapulation**

#### **4.1.1 Getting Started**

In [89]:
from mxnet import nd

- NDArrays represent (possibly multi-dimensional) arrays of numerical values. <br>
  NDArrays with one axis cor- respond (in math-speak) to vectors. NDArrays with two axes correspond to matrices. <br>
  For arrays with more than two axes, mathematicians do not have special names—they simply call them **_tensors_**

In [90]:
x = nd.arange(12) 
x


[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11.]
<NDArray 12 @cpu(0)>

In [91]:
x.shape

(12,)

In [92]:
x.reshape((3, 4))


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]
<NDArray 3x4 @cpu(0)>

- We can invoke this capability by placing **_-1_** for the dimension that we would like NDArray to automatically infer. <br>
  In our case, instead of x.reshape((3, 4)), we could have equivalently used x.reshape((-1, 4)) or x. reshape((3, -1)).


In [93]:
x.reshape((3, -1))


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]
<NDArray 3x4 @cpu(0)>

In [94]:
x.reshape((4, -1))


[[ 0.  1.  2.]
 [ 3.  4.  5.]
 [ 6.  7.  8.]
 [ 9. 10. 11.]]
<NDArray 4x3 @cpu(0)>

In [95]:
x.reshape((6, -1))


[[ 0.  1.]
 [ 2.  3.]
 [ 4.  5.]
 [ 6.  7.]
 [ 8.  9.]
 [10. 11.]]
<NDArray 6x2 @cpu(0)>

In [96]:
nd.empty((3, 4))


[[-4.1247744e-32  4.5664113e-41  1.0900574e+01  3.0930861e-41]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]]
<NDArray 3x4 @cpu(0)>

- The empty method just grabs some memory and hands us back a matrix without setting the values of any of its entries. <br>
  his is very effcient but it means that the entries might take any arbitrary values, **_including very big ones!_**

#### **4.1.2 Operations** 

In [97]:
x = nd.array([1, 2, 4, 8]) 
print(x)
y = nd.ones_like(x) * 2 
print(y)


[1. 2. 4. 8.]
<NDArray 4 @cpu(0)>

[2. 2. 2. 2.]
<NDArray 4 @cpu(0)>


In [98]:
print('x =', x) 
print('x + y', x + y) 
print('x - y', x - y) 
print('x * y', x * y) 
print('x * y', x ** y) 
print('x / y', x / y)

x = 
[1. 2. 4. 8.]
<NDArray 4 @cpu(0)>
x + y 
[ 3.  4.  6. 10.]
<NDArray 4 @cpu(0)>
x - y 
[-1.  0.  2.  6.]
<NDArray 4 @cpu(0)>
x * y 
[ 2.  4.  8. 16.]
<NDArray 4 @cpu(0)>
x * y 
[ 1.  4. 16. 64.]
<NDArray 4 @cpu(0)>
x / y 
[0.5 1.  2.  4. ]
<NDArray 4 @cpu(0)>


In [99]:
x.exp()


[2.7182817e+00 7.3890562e+00 5.4598148e+01 2.9809580e+03]
<NDArray 4 @cpu(0)>

In [100]:
x = nd.arange(12).reshape((3,4)) 
y = nd.array([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]]) 
print(x)
print(y)


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]
<NDArray 3x4 @cpu(0)>

[[2. 1. 4. 3.]
 [1. 2. 3. 4.]
 [4. 3. 2. 1.]]
<NDArray 3x4 @cpu(0)>


In [101]:
nd.dot(x, y.T)


[[ 18.  20.  10.]
 [ 58.  60.  50.]
 [ 98. 100.  90.]]
<NDArray 3x3 @cpu(0)>

In [102]:
nd.concat(x, y, dim=0)


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]
 [ 2.  1.  4.  3.]
 [ 1.  2.  3.  4.]
 [ 4.  3.  2.  1.]]
<NDArray 6x4 @cpu(0)>

In [103]:
nd.concat(x, y, dim=1)


[[ 0.  1.  2.  3.  2.  1.  4.  3.]
 [ 4.  5.  6.  7.  1.  2.  3.  4.]
 [ 8.  9. 10. 11.  4.  3.  2.  1.]]
<NDArray 3x8 @cpu(0)>

- We can also merge multiple NDArrays. For that, we need to tell the system along which dimension to merge.<br>
  The example below merges two matrices along dimension 0 (along rows) and dimension 1 (along columns) respectively.

In [104]:
x == y


[[0. 1. 0. 1.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
<NDArray 3x4 @cpu(0)>

In [105]:
x.sum()


[66.]
<NDArray 1 @cpu(0)>

In [106]:
x.norm().asscalar()

22.494442

#### **4.1.3 Broadcast Mechanism**

In [107]:
a = nd.arange(3).reshape((3, 1))
b = nd.arange(2).reshape((1, 2))
a, b

(
 [[0.]
  [1.]
  [2.]]
 <NDArray 3x1 @cpu(0)>, 
 [[0. 1.]]
 <NDArray 1x2 @cpu(0)>)

In [108]:
a + b


[[0. 1.]
 [1. 2.]
 [2. 3.]]
<NDArray 3x2 @cpu(0)>

In [109]:
c = nd.arange(12).reshape((3, 2, 2))
d = nd.arange(4).reshape((2, 2))
c, d

(
 [[[ 0.  1.]
   [ 2.  3.]]
 
  [[ 4.  5.]
   [ 6.  7.]]
 
  [[ 8.  9.]
   [10. 11.]]]
 <NDArray 3x2x2 @cpu(0)>, 
 [[0. 1.]
  [2. 3.]]
 <NDArray 2x2 @cpu(0)>)

In [110]:
c + d


[[[ 0.  2.]
  [ 4.  6.]]

 [[ 4.  6.]
  [ 8. 10.]]

 [[ 8. 10.]
  [12. 14.]]]
<NDArray 3x2x2 @cpu(0)>

#### **4.1.4 indexing and Slicing**

In [35]:
x = nd.arange(12).reshape((3,4)) 
print(x)
print(x[1:3])


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]
<NDArray 3x4 @cpu(0)>

[[ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]
<NDArray 2x4 @cpu(0)>


In [36]:
x[1, 2] = 9
x


[[ 0.  1.  2.  3.]
 [ 4.  5.  9.  7.]
 [ 8.  9. 10. 11.]]
<NDArray 3x4 @cpu(0)>

In [37]:
x[0:2, :] = 12
x


[[12. 12. 12. 12.]
 [12. 12. 12. 12.]
 [ 8.  9. 10. 11.]]
<NDArray 3x4 @cpu(0)>

#### **4.1.5 Saving Memory**

In [39]:
y = nd.array([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]]) 
y


[[2. 1. 4. 3.]
 [1. 2. 3. 4.]
 [4. 3. 2. 1.]]
<NDArray 3x4 @cpu(0)>

In [50]:
before = id(y)
print(before)

y = y + x
print(y)
print(id(y))

id(y) == before

139959697573264

[[298. 293. 308. 303.]
 [293. 298. 303. 308.]
 [212. 231. 250. 269.]]
<NDArray 3x4 @cpu(0)>
139959697656848


False

In [44]:
z = y.zeros_like()
print('id(z):', id(z))
z[:] = x + y
print('id(z):', id(z))

id(z): 139962319609360
id(z): 139962319609360


In [45]:
before = id(z)
nd.elemwise_add(x, y, out=z)
id(z) == before

True

In [55]:
before = id(x)
x += y
id(x) == before

True

#### **4.1.6 Mutual Transformation of NDArray and Numpy**

In [51]:
import numpy as np

a = x.asnumpy()
print(type(a))

b = nd.array(a)
print(type(b))

<class 'numpy.ndarray'>
<class 'mxnet.ndarray.ndarray.NDArray'>


____

## **4.2 Linear Algebra**

In [60]:
from mxnet import nd

#### **4.2.1 Scalars**

In [61]:
x = nd.array([3.0])
y = nd.array([2.0])

print('x + y = ', x + y)
print('x * y = ', x * y)
print('x / y = ', x / y)
print('x ** y = ', nd.power(x,y))

x + y =  
[5.]
<NDArray 1 @cpu(0)>
x * y =  
[6.]
<NDArray 1 @cpu(0)>
x / y =  
[1.5]
<NDArray 1 @cpu(0)>
x ** y =  
[9.]
<NDArray 1 @cpu(0)>


#### **4.2.2 Vectors**

In [62]:
x = nd.arange(4)
print('x = ', x)

x =  
[0. 1. 2. 3.]
<NDArray 4 @cpu(0)>


In [63]:
x[3]


[3.]
<NDArray 1 @cpu(0)>

#### **4.2.3 Length, dimensionality and shape**

In [64]:
x.shape

(4,)

In [65]:
a = 2
x = nd.array([1,2,3])
y = nd.array([10,20,30])

print(a * x)
print(a * x + y)


[2. 4. 6.]
<NDArray 3 @cpu(0)>

[12. 24. 36.]
<NDArray 3 @cpu(0)>


#### **4.2.4 Matrices**

In [66]:
A = nd.arange(20).reshape((5,4))
print(A)


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]
 [12. 13. 14. 15.]
 [16. 17. 18. 19.]]
<NDArray 5x4 @cpu(0)>


In [67]:
print(A.T)


[[ 0.  4.  8. 12. 16.]
 [ 1.  5.  9. 13. 17.]
 [ 2.  6. 10. 14. 18.]
 [ 3.  7. 11. 15. 19.]]
<NDArray 4x5 @cpu(0)>


#### **4.2.5 Tensors**

- Just as vectors generalize scalars, and matrices generalize vectors, we can actually build data structures with even more axes.<br> 
  **Tensors give us a generic way of discussing arrays with an arbitrary number of axes.**<br>
  Vectors, for example, are first-order tensors, and matrices are second-order tensors

In [68]:
X = nd.arange(24).reshape((2, 3, 4))

print('X.shape =', X.shape)
print('X =', X)

X.shape = (2, 3, 4)
X = 
[[[ 0.  1.  2.  3.]
  [ 4.  5.  6.  7.]
  [ 8.  9. 10. 11.]]

 [[12. 13. 14. 15.]
  [16. 17. 18. 19.]
  [20. 21. 22. 23.]]]
<NDArray 2x3x4 @cpu(0)>


#### **4.2.6 Basic properties of tensor arithetic**

In [69]:
a = 2
x = nd.ones(3)
y = nd.zeros(3)

print(x.shape)
print(y.shape)
print((a * x).shape)
print((a * x + y).shape)

(3,)
(3,)
(3,)
(3,)


#### **4.2.7 Sums and means**

In [70]:
print(x)
print(nd.sum(x))


[1. 1. 1.]
<NDArray 3 @cpu(0)>

[3.]
<NDArray 1 @cpu(0)>


In [73]:
print(A)
print(nd.sum(A))


[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]
 [12. 13. 14. 15.]
 [16. 17. 18. 19.]]
<NDArray 5x4 @cpu(0)>

[190.]
<NDArray 1 @cpu(0)>


In [74]:
print(nd.mean(A))
print(nd.sum(A) / A.size)


[9.5]
<NDArray 1 @cpu(0)>

[9.5]
<NDArray 1 @cpu(0)>


#### **4.2.8 Dot products**

In [75]:
x = nd.arange(4)
y = nd.ones(4)
print(x, y, nd.dot(x, y))


[0. 1. 2. 3.]
<NDArray 4 @cpu(0)> 
[1. 1. 1. 1.]
<NDArray 4 @cpu(0)> 
[6.]
<NDArray 1 @cpu(0)>


In [76]:
nd.sum(x * y)


[6.]
<NDArray 1 @cpu(0)>

#### **4.2.9 Matrix-vector products**

In [81]:
A.shape, x.shape

((5, 4), (4,))

In [82]:
nd.dot(A, x)


[ 14.  38.  62.  86. 110.]
<NDArray 5 @cpu(0)>

#### **4.2.10 Matrix-matrix multiplication**

In [85]:
B = nd.ones(shape=(4, 3))
A.shape, B.shape, nd.dot(A, B)

((5, 4), (4, 3), 
 [[ 6.  6.  6.]
  [22. 22. 22.]
  [38. 38. 38.]
  [54. 54. 54.]
  [70. 70. 70.]]
 <NDArray 5x3 @cpu(0)>)

#### **4.2.11 Norms**

In [88]:
# L2 norm
x, nd.norm(x)

(
 [0. 1. 2. 3.]
 <NDArray 4 @cpu(0)>, 
 [3.7416573]
 <NDArray 1 @cpu(0)>)

In [87]:
# L1 norm
nd.sum(nd.abs(x))


[6.]
<NDArray 1 @cpu(0)>

#### **4.2.12 Norms and objectives**

- While we do not want to get too far ahead of ourselves, we do want you to anticipate why these concepts are useful. <br>
  In machine learning we are often trying to solve optimization problems: Maximize the probability assigned to observed data. <br>
  Minimize the distance between predictions and the ground-truth observations.<br>
  Assign vector representations to items (like words, products, or news articles) such that the distance between similar items is minimized, <br> 
  and the distance between dissimilar items is maximized. Oftentimes, these objectives, <br>
  perhaps the most important component of a machine learning algorithm (besides the data itself), are expressed as norms.

#### **4.2.13 Intermediate linear algebra**

##### **Basic vector properties**

- **_Additive axioms_** (we assume that x,y,z are all vectors): <br>
  x + y = y + x and (x + y) + z = x + (y + z) and 0 + x = x + 0 = x and (−x) + x = x + (−x) = 0.

- **_Multiplicative axioms_** (we assume that x is a vector and a, b are scalars): <br>
  0 · x = 0 and 1 · x = x and (ab)x = a(bx).

- **_Distributive axioms_** (we assume that x and y are vectors and a, b are scalars): <br>
  a(x + y) = ax + ay and (a + b)x = ax + bx.

##### **Special matrices**

- **_Symmetric Matrix_** M⊤ = M

- **_Antisymmetric Matrix_**

- **_Diagonally Dominant Matrix_**

- **_Positive Definite Matrix_**

____

## **4.3 Automatic Differentiation**

In [126]:
from mxnet import autograd, nd

#### **4.3.1 A simple Example**

- As a toy example, say that we are interested in differentiating the mapping **y = 2x⊤x** with respect to the column vector x.<br> 
  To start, let’s create the variable x and assign it an initial value.

In [133]:
x = nd.arange(4)
x


[0. 1. 2. 3.]
<NDArray 4 @cpu(0)>

In [134]:
x.attach_grad()

In [135]:
with autograd.record():
    y = 2 * nd.dot(x, x)
y    


[28.]
<NDArray 1 @cpu(0)>

In [136]:
y.backward()

- The gradient of the function **y = 2x⊤x** with respect to x should be **_4x_**.<br> 
  Now let’s verify that the gradient produced is correct.

In [139]:
print(x)
print(x.grad)
print(x.grad - 4 * x)


[0. 1. 2. 3.]
<NDArray 4 @cpu(0)>

[ 0.  4.  8. 12.]
<NDArray 4 @cpu(0)>

[0. 0. 0. 0.]
<NDArray 4 @cpu(0)>


In [118]:
with autograd.record():
    y = x.norm()
y.backward()
x.grad


[0.         0.26726124 0.5345225  0.80178374]
<NDArray 4 @cpu(0)>

#### 4.3.2 Backward for Non-scalar Variable

In [156]:
print('x vector : ', x)

with autograd.record(): # y is a vector
    y = x * x
print('y vector : ', y)    
y.backward()
print('x.grad : ', x.grad)

u = x.copy()
u.attach_grad()

with autograd.record(): # v is scalar
    v = (u * u).sum()
print('v scalar : ', v)    
v.backward()
print('u.grad : ', u.grad)

x.grad - u.grad

x vector :  
[0. 1. 2. 3.]
<NDArray 4 @cpu(0)>
y vector :  
[0. 1. 4. 9.]
<NDArray 4 @cpu(0)>
x.grad :  
[0. 2. 4. 6.]
<NDArray 4 @cpu(0)>
v scalar :  
[14.]
<NDArray 1 @cpu(0)>
u.grad :  
[0. 2. 4. 6.]
<NDArray 4 @cpu(0)>



[0. 0. 0. 0.]
<NDArray 4 @cpu(0)>

#### 4.3.3 Detach Computations

In [161]:
with autograd.record():
    y = x * x
    u = y.detach()
    z = u * x
print('x : ', x)
print('u : ', u)
print('z : ', z)

z.backward()

print('x.grad : ', x.grad)
print('u : ', u)

x.grad - u

x :  
[0. 1. 2. 3.]
<NDArray 4 @cpu(0)>
u :  
[0. 1. 4. 9.]
<NDArray 4 @cpu(0)>
z :  
[ 0.  1.  8. 27.]
<NDArray 4 @cpu(0)>
x.grad :  
[0. 1. 4. 9.]
<NDArray 4 @cpu(0)>
u :  
[0. 1. 4. 9.]
<NDArray 4 @cpu(0)>



[0. 0. 0. 0.]
<NDArray 4 @cpu(0)>

- The following backward computes **_∂u2x/∂x_** with u = x instead of **∂x3/∂x**.

- Since the computation of y is still recorded, we can call y.backward() to get **∂y/∂x = 2x**.

In [162]:
y.backward()
print('y : ', y)
print('x.grad : ', x.grad)
print('x : ', x)

x.grad - 2*x

y :  
[0. 1. 4. 9.]
<NDArray 4 @cpu(0)>
x.grad :  
[0. 2. 4. 6.]
<NDArray 4 @cpu(0)>
x :  
[0. 1. 2. 3.]
<NDArray 4 @cpu(0)>



[0. 0. 0. 0.]
<NDArray 4 @cpu(0)>

#### 4.3.4 Attach Gradients to Internal Variables

In [None]:
#### 4.3.4 Attach Gradients to Internal Variables

#### 4.3.5 Head gradients

In [None]:
#### 4.3.5 Head gradients

#### 4.3.6 Computing the Gradient of Python Control Flow

In [None]:
#### 4.3.6 Computing the Gradient of Python Control Flow

#### 4.3.7 Training Mode and Prediction Mode 

In [None]:
#### 4.3.7 Training Mode and Prediction Mode 

#### 4.3.8 Summary


In [None]:
#### 4.3.8 Summary

#### 4.3.9 Exercises

In [None]:
#### 4.3.9 Exercises

____

## 4.4 Probability and Statistics

#### 4.4.1 Basic probability theory 

In [None]:
#### 4.4.1 Basic probability theory 

#### 4.4.2 Dealing with multiple random variables

In [None]:
#### 4.4.2 Dealing with multiple random variables

#### 4.4.3 Conditional independence 

In [None]:
#### 4.4.3 Conditional independence 

#### 4.4.4 Sampling 

In [None]:
#### 4.4.4 Sampling 

#### 4.4.5 Summary 

In [None]:
#### 4.4.5 Summary 

#### 4.4.6 Exercises

In [None]:
#### 4.4.6 Exercises

____

## 4.5 Naive Bayes Classiication

#### 4.5.1 Optical Character Recognition

In [None]:
#### 4.5.1 Optical Character Recognition

#### 4.5.2 The Probabilistic Model for Classiication 

In [None]:
#### 4.5.2 The Probabilistic Model for Classiication 

#### 4.5.3 The Naive Bayes Classiier 

In [None]:
#### 4.5.3 The Naive Bayes Classiier 

#### 4.5.4 Training

In [None]:
#### 4.5.4 Training

#### 4.5.5 Summary

In [None]:
#### 4.5.5 Summary

#### 4.5.6 Exercises

In [None]:
#### 4.5.6 Exercises

____

## 4.6 Documentation

#### 4.6.1 Finding all the functions and classes in the module 

In [None]:
#### 4.6.1 Finding all the functions and classes in the module 

#### 4.6.2 Finding the usage of speciic functions and classes 

In [None]:
#### 4.6.2 Finding the usage of speciic functions and classes 

#### 4.6.3 API Documentation 

In [None]:
#### 4.6.3 API Documentation 

#### 4.6.4 Exercise 

- Look up ones_like and autograd in the API documentation. 