TIP 22 - BACKGROUND
---------

NumPy is a powerful library for numerical programming.The essential problem that NumPy solves is fast array processing. A typical use might be to create an array of 1 million random draws from a uniform distribution and compute the mean.

Using pure (uncompiled) Python our calculation would be extremely slow. Various optimizations can be carried out during compilation, when the compiler sees the instructions as a whole.

However, for a task like the one described above there’s no need to switch back to C or Fortran - instead we can use NumPy, which sends operations in batches to optimized C and Fortran code.

In [1]:
import numpy as np

x = np.random.uniform(0, 1, size=1000000)

x.mean()

0.50014278039683913

TIP 23 - ARRAYS
--------

The most important thing that NumPy defines is an array data type formally called a ```numpy.ndarray```

To create a NumPy array containing only zeros we use ```np.zeros```:

In [2]:
import numpy as np
a = np.zeros(3)
a

array([ 0.,  0.,  0.])

In [3]:
type(a)

numpy.ndarray

By default these arrays do not have a shape and begin life as a flat list, we use ```shape``` to craft into the rectangular shape we want.

In [7]:
z = np.zeros(10)
z

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

In [5]:
z.shape

(10,)

In [8]:
z.shape = (10, 1)
z

array([[ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.]])

In [9]:
z.shape = (5, 2)
z

array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])

```np.empty``` creates arrays in memory that can later be populated with data (garbage values are put in to start).

In [15]:
y = np.empty(6)
y

array([  0.00000000e+000,   0.00000000e+000,   1.48219694e-323,
         0.00000000e+000,   2.15783944e-314,   4.17201482e-309])

In [17]:
x = np.linspace(2, 4, 5)  # Evenly spaced from 2 to 4, with 5 elements
x

array([ 2. ,  2.5,  3. ,  3.5,  4. ])

In [18]:
w = np.identity(2)
w

array([[ 1.,  0.],
       [ 0.,  1.]])

In [23]:
z = np.array([10, 20])                 # ndarray from Python list
z

array([10, 20])

In [22]:
type(z)

numpy.ndarray

In [24]:
z = np.array((10, 20), dtype=float)    # Here 'float' is equivalent to 'np.float64'
z

array([ 10.,  20.])

In [25]:
z = np.array([[1, 2], [3, 4]])         # 2D array from a list of lists
z

array([[1, 2],
       [3, 4]])

In [26]:
z.shape

(2, 2)

TIP 24 - INDEXING
-------------

Flat arrays:

In [27]:
z = np.linspace(1, 2, 5)
z

array([ 1.  ,  1.25,  1.5 ,  1.75,  2.  ])

In [28]:
z[0]

1.0

In [29]:
z[0:2]  # Two elements, starting at element 0

array([ 1.  ,  1.25])

In [30]:
z[-1]

2.0

Matrices (still zero based labelling):

In [31]:
z = np.array([[1, 2], [3, 4]])
z

array([[1, 2],
       [3, 4]])

In [32]:
z[0, 0]

1

In [33]:
z[0, 1]

2

Whole rows and columns:

In [34]:
z[0,:]

array([1, 2])

In [35]:
z[:,1]

array([2, 4])

Arrays of integers can also be used to extract elements:

In [38]:
z = np.linspace(2, 4, 5)
z

array([ 2. ,  2.5,  3. ,  3.5,  4. ])

In [39]:
indices = np.array((0, 2, 3))
z[indices]

array([ 2. ,  3. ,  3.5])

TIP 25 - METHODS
-------------

Arrays have useful methods, all of which are carefully optimized as described above.

In [40]:
A = np.array((4, 3, 2, 1))
A

array([4, 3, 2, 1])

In [41]:
A.sort()              # Sorts A in place
A

array([1, 2, 3, 4])

In [42]:
A.sum()               # Sum

10

In [43]:
A.mean()              # Mean

2.5

In [44]:
A.max()               # Max

4

In [45]:
A.argmax()            # Returns the index of the maximal element

3

In [46]:
A.cumsum()            # Cumulative sum of the elements of A

array([ 1,  3,  6, 10])

In [47]:
A.cumprod()           # Cumulative product of the elements of A

array([ 1,  2,  6, 24])

In [48]:
A.var()               # Variance

1.25

In [49]:
A.std()               # Standard deviation

1.1180339887498949

In [52]:
A.shape = (2, 2)
A

array([[1, 2],
       [3, 4]])

In [53]:
A.T                   # Equivalent to A.transpose()

array([[1, 3],
       [2, 4]])

TIP 26 - SEARCH SORTED
---------

If ```z``` is a nondecreasing array, then ```z.searchsorted(a)``` returns the index of the first element of ```z``` that is >= a

In [54]:
z = np.linspace(2, 4, 5)
z

array([ 2. ,  2.5,  3. ,  3.5,  4. ])

In [55]:
z.searchsorted(2.2)

1

In [56]:
z.searchsorted(2.5)

1

In [57]:
z.searchsorted(2.6)

2

TIP 27 - ALGEBRAIC OPERATIONS ON ARRAYS
-----------

The algebraic operators +, -, *, / and ** all act elementwise on arrays.

In [58]:
a = np.array([1, 2, 3, 4])

b = np.array([5, 6, 7, 8])

a + b

array([ 6,  8, 10, 12])

In [59]:
a * b

array([ 5, 12, 21, 32])

In [60]:
a + 10

array([11, 12, 13, 14])

In [61]:
a * 10

array([10, 20, 30, 40])

In [62]:
A = np.ones((2, 2))

B = np.ones((2, 2))

A + B

array([[ 2.,  2.],
       [ 2.,  2.]])

In [63]:
A + 10

array([[ 11.,  11.],
       [ 11.,  11.]])

In [64]:
A * B

array([[ 1.,  1.],
       [ 1.,  1.]])

In particular, ```A * B``` is not the matrix product, it is an elementwise product.

TIP 28 - MATRIX MULTIPLICATION
---------------

We use ```np.dot(-,-)``` for matrix multiplication (and dot product of vectors):

In [2]:
import numpy as np

A = [[1,2],[3,4]]

B = [[1,0],[0,1]]

np.dot(A, B)

array([[1, 2],
       [3, 4]])

In [3]:
np.dot([1,2], [3,4])

11

TIP 29 - COPIES
-------------

The Python assignment model means that the command ```b=a``` means that ```b``` is bound to ```a``` and becomes just another reference to the array, and so has equal rights to make changes to that array (making copies is expensive in terms of both speed and memory).

It is of course possible to make ```b``` an independent copy of ```a``` when required using ```np.copyto```.

In [73]:
a = np.random.randn(3)
a

array([ 1.22614266, -0.09554738,  0.37694868])

In [75]:
b = a
b[0] = 0.0
a

array([ 0.        , -0.09554738,  0.37694868])

In [76]:
a = np.random.randn(3)
a

array([ 2.00238603,  1.36623348, -0.46906479])

In [78]:
b = np.empty_like(a)  # empty array with same shape as a
np.copyto(b, a)  # copy to b from a
b[0] = 0.0
a

array([ 2.00238603,  1.36623348, -0.46906479])

TIP 30 - STANDARD FUNCTIONS
----------
NumPy provides versions of the standard functions log, exp, sin, etc. that act elementwise on arrays and this eliminates the need for explicit element-by-element loops:

In [79]:
z = np.array([1, 2, 3])
np.sin(z)

array([ 0.84147098,  0.90929743,  0.14112001])

TIP 31 - COMPARISONS
----------

Comparisons between arrays are done elementwise:

In [4]:
z = np.array([2, 3])
y = np.array([2, 3])
z == y

array([ True,  True], dtype=bool)

We can also do comparisons against scalars:

In [5]:
z = np.linspace(0, 10, 5)
z

array([  0. ,   2.5,   5. ,   7.5,  10. ])

In [6]:
z > 3

array([False, False,  True,  True,  True], dtype=bool)

In [7]:
b = z > 3
b

array([False, False,  True,  True,  True], dtype=bool)

In [8]:
z[b]

array([  5. ,   7.5,  10. ])

This can also be perfomed in a single step:

In [9]:
z[z > 3]

array([  5. ,   7.5,  10. ])

TIP 32 - SUBPACKAGES
--------------

We've seen how to generate random varibales using ```np.random``` and this fits into a more general framework of NumPy giving functionality through subpackages.

In [10]:
z = np.random.randn(10000)  # Generate standard normals
y = np.random.binomial(10, 0.5, size=1000)    # 1,000 draws from Bin(10, 0.5)
y.mean()

4.9649999999999999

Another commonly used subpackage is ```np.linalg``` which is used to calculate the discriminant.

In [11]:
np.array([[1, 2], [3, 4]])

np.linalg.det(A)           # Compute the determinant

-2.0000000000000004

In [12]:
np.linalg.inv(A)           # Compute the inverse

array([[-2. ,  1. ],
       [ 1.5, -0.5]])