NumPy is the standard numerical library available in the `python` environment. It allows quicker computations on array-like structures. Central objects in the `numpy` library are `ndarray`s. These are *homogenuous* $n$-dimensional arrays ; elements of the array are all of the same type. 

Efficiency of `ndarray` objects come from the fact that element-wise operations are `C` implemented to ensure low complexity. One can translate loop-like operations on array-like structures into available corresponding implementation for `ndarrays`. This process is called *vectorisation* ; it improves efficiency and must be on mind when dealing with scientific programming. 

## Defining an `ndarray` object

In [28]:
import numpy as np

In [29]:
np_matrix = np.array([[1, 2, 3, 4], [-1, 0, 1, 2]], dtype='float64')
np_matrix

array([[ 1.,  2.,  3.,  4.],
       [-1.,  0.,  1.,  2.]])

In [30]:
np_matrix.dtype

dtype('float64')

One can build up a $2$-dimensional `ndarray` object as a list of lists. The matrix in such a case is given line by line. An `ndarray` object comes with a lot of attributes, we'll be seeing a number of them while going on. Here are the ones enclosing the shape of the array.

In [31]:
np_matrix.ndim

2

In [32]:
np_matrix.shape

(2, 4)

In many cases one to initialize an `ndarray`, either by giving random coefficients to the elements of the matrix or by giving a specified type matrix. Here are the standard available `ndarray`s.

In [33]:
Z = np.zeros(4, dtype='float64')

In [34]:
Z.shape
Z

array([ 0.,  0.,  0.,  0.])

In [35]:
np.zeros((3, 4), dtype='int64')

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [36]:
np.ones(4)

array([ 1.,  1.,  1.,  1.])

In [37]:
np.ones((3, 2))

array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

In [38]:
np.identity(5)

array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])

In [39]:
np.diag([1, 2, 3, 4, 5])

array([[1, 0, 0, 0, 0],
       [0, 2, 0, 0, 0],
       [0, 0, 3, 0, 0],
       [0, 0, 0, 4, 0],
       [0, 0, 0, 0, 5]])

To build up a random `ndarray` one can use available `numpy` built-in random generators.

In [40]:
np.random.rand(2, 4)

array([[ 0.72940135,  0.356009  ,  0.79669859,  0.14608245],
       [ 0.95778512,  0.21656914,  0.68271512,  0.86626767]])

In [41]:
np.random.randn(2, 4)

array([[ 1.06814318,  0.43862567,  0.0051862 ,  1.91809189],
       [-0.02020801, -1.13698834,  1.58550318,  0.68893061]])

In [42]:
np.random.randint?

A useful way of building up matrices out of lists is to reshape the standard one-line corresponding numpy array object. 

In [43]:
np_A = np.random.randint(10, size=20)
np_A

array([1, 7, 5, 3, 3, 0, 3, 5, 8, 2, 5, 1, 4, 2, 9, 2, 4, 3, 1, 5])

In [44]:
np_A.ndim

1

In [45]:
np_A = np_A.reshape(4, -1)  # -1 here leaves the choice to python.
np_A

array([[1, 7, 5, 3, 3],
       [0, 3, 5, 8, 2],
       [5, 1, 4, 2, 9],
       [2, 4, 3, 1, 5]])

In [46]:
np_A.shape

(4, 5)

In [47]:
np_A.T

array([[1, 0, 5, 2],
       [7, 3, 1, 4],
       [5, 5, 4, 3],
       [3, 8, 2, 1],
       [3, 2, 9, 5]])

Another useful array definition is the one given by `arange`. It is the `numpy` version of `python` range. It returns a one dimensional array containing an arithmetic sequence of integers following range syntax.

In [48]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [49]:
np.arange(2, 10)

array([2, 3, 4, 5, 6, 7, 8, 9])

In [50]:
np.arange(10, 2, -1)

array([10,  9,  8,  7,  6,  5,  4,  3])

In many applications one looks for a sequence of floats modelling the real line. A way of generating such a `numpy` array is to use the `linspace` function.

In [52]:
np.linspace?

In [53]:
np.linspace(1, 10, 100)

array([  1.        ,   1.09090909,   1.18181818,   1.27272727,
         1.36363636,   1.45454545,   1.54545455,   1.63636364,
         1.72727273,   1.81818182,   1.90909091,   2.        ,
         2.09090909,   2.18181818,   2.27272727,   2.36363636,
         2.45454545,   2.54545455,   2.63636364,   2.72727273,
         2.81818182,   2.90909091,   3.        ,   3.09090909,
         3.18181818,   3.27272727,   3.36363636,   3.45454545,
         3.54545455,   3.63636364,   3.72727273,   3.81818182,
         3.90909091,   4.        ,   4.09090909,   4.18181818,
         4.27272727,   4.36363636,   4.45454545,   4.54545455,
         4.63636364,   4.72727273,   4.81818182,   4.90909091,
         5.        ,   5.09090909,   5.18181818,   5.27272727,
         5.36363636,   5.45454545,   5.54545455,   5.63636364,
         5.72727273,   5.81818182,   5.90909091,   6.        ,
         6.09090909,   6.18181818,   6.27272727,   6.36363636,
         6.45454545,   6.54545455,   6.63636364,   6.72

## Slicing 

There are many different ways of slicing an `ndarray`. One needs to be careful about the fact that some give back a view on a slice of the array others copy part of it.

In [54]:
np_A

array([[1, 7, 5, 3, 3],
       [0, 3, 5, 8, 2],
       [5, 1, 4, 2, 9],
       [2, 4, 3, 1, 5]])

Standard slicing gives views on subelements of `ndarray`. 

In [55]:
np_A[1]

array([0, 3, 5, 8, 2])

In [56]:
np_A[1, 0]

0

In [57]:
np_A[:, 0]

array([1, 0, 5, 2])

In [58]:
np_A[1:3]

array([[0, 3, 5, 8, 2],
       [5, 1, 4, 2, 9]])

In [59]:
np_A[1:3, 1:4]

array([[3, 5, 8],
       [1, 4, 2]])

Boolean choices.

In [60]:
np_A

array([[1, 7, 5, 3, 3],
       [0, 3, 5, 8, 2],
       [5, 1, 4, 2, 9],
       [2, 4, 3, 1, 5]])

In [61]:
np_A[[False, True, False, True]] 

array([[0, 3, 5, 8, 2],
       [2, 4, 3, 1, 5]])

In [62]:
np_A[0] < 2

array([ True, False, False, False, False], dtype=bool)

In [63]:
np_A[:, np_A[0] < 2]

array([[1],
       [0],
       [5],
       [2]])

## Setting Coefficient Values

In [64]:
np_A = np_A.reshape(4, -1)
np_A

array([[1, 7, 5, 3, 3],
       [0, 3, 5, 8, 2],
       [5, 1, 4, 2, 9],
       [2, 4, 3, 1, 5]])

In [65]:
np_A[2, 2] = 0
np_A

array([[1, 7, 5, 3, 3],
       [0, 3, 5, 8, 2],
       [5, 1, 0, 2, 9],
       [2, 4, 3, 1, 5]])

In [66]:
np_A[1, :] = -3
np_A

array([[ 1,  7,  5,  3,  3],
       [-3, -3, -3, -3, -3],
       [ 5,  1,  0,  2,  9],
       [ 2,  4,  3,  1,  5]])

## Universal Functions

Many standard mathematical functions are reimplemented in numpy to ensure efficiency.

In [67]:
np.exp(np_A)

array([[  2.71828183e+00,   1.09663316e+03,   1.48413159e+02,
          2.00855369e+01,   2.00855369e+01],
       [  4.97870684e-02,   4.97870684e-02,   4.97870684e-02,
          4.97870684e-02,   4.97870684e-02],
       [  1.48413159e+02,   2.71828183e+00,   1.00000000e+00,
          7.38905610e+00,   8.10308393e+03],
       [  7.38905610e+00,   5.45981500e+01,   2.00855369e+01,
          2.71828183e+00,   1.48413159e+02]])

In [68]:
np.sqrt(np.exp(np_A))

array([[  1.64872127,  33.11545196,  12.18249396,   4.48168907,
          4.48168907],
       [  0.22313016,   0.22313016,   0.22313016,   0.22313016,
          0.22313016],
       [ 12.18249396,   1.64872127,   1.        ,   2.71828183,  90.0171313 ],
       [  2.71828183,   7.3890561 ,   4.48168907,   1.64872127,
         12.18249396]])

In [69]:
np_B = np.random.randint(100, size=20)
np_B = np_B.reshape(4, 5)

In [70]:
np_A + np_B

array([[88, 43, 90, 47, 63],
       [94,  1, 66, 39,  7],
       [91, 96, 70, 92, 99],
       [51, 12, 90, 34, 18]])

In [71]:
np_A * np_B

array([[  87,  252,  425,  132,  180],
       [-291,  -12, -207, -126,  -30],
       [ 430,   95,    0,  180,  810],
       [  98,   32,  261,   33,   65]])

In [72]:
np.maximum(np_A, np_B)

array([[87, 36, 85, 44, 60],
       [97,  4, 69, 42, 10],
       [86, 95, 70, 90, 90],
       [49,  8, 87, 33, 13]])

In [73]:
np.dot(np_A, np_B.T)

array([[ 1076,   626,  1641,   678],
       [ -936,  -666, -1293,  -570],
       [ 1099,   663,  1515,   436],
       [  917,   509,  1302,   489]])

## Exercise

Look into saving and loading numpy arrays.

In [80]:
np.save("np_A", np_A)

In [81]:
np.load("np_A.npy")

array([[ 1,  7,  5,  3,  3],
       [-3, -3, -3, -3, -3],
       [ 5,  1,  0,  2,  9],
       [ 2,  4,  3,  1,  5]])

## Exercise

Compare efficiency of `numpy` matrix multiplication to naive function using built-in structures.

In [82]:
# Defining a matrix as a list of lists in raw python
n = 200
M = [[np.random.randint(100) for _ in range(n)] for _ in range(n)]

In [83]:
%%time
M_square = [[0 for _ in range(n)] for _ in range(n)] 
for i in range(n):
    for j in range(n):
        for k in range(n):
            M_square[i][j] += M[i][k]*M[k][j]

CPU times: user 2.35 s, sys: 0 ns, total: 2.35 s
Wall time: 2.35 s


In [85]:
np_M = np.array(M)

In [86]:
%%time
np_M_square = np.dot(np_M, np_M)

CPU times: user 24 ms, sys: 0 ns, total: 24 ms
Wall time: 22.3 ms


## Exercise

Simulate a random walk using both `numpy` and built-in structures. Compare both functions.

* Looking into the documentation of `matplotlib` write down a function enabling you to represent a random walk. 