# NumPy Arrays

NumPy (module ``numpy``) provides an array datatype with vectorized operations (similar to Matlab or IDL)

In [1]:
import numpy as np

Create two NumPy arrays containing 5 elements each. The ``numpy`` module contains a number of functions for generating common arrays:

In [2]:
x = np.arange(5)
x

array([0, 1, 2, 3, 4])

In [3]:
y = np.ones(5)
y

array([ 1.,  1.,  1.,  1.,  1.])

Operations are vectorized, so we can do arithmetic with arrays (as long as the dimensions match!) as we would with scalar variables.

In [4]:
x - (y+0.005) * 3

array([-3.015, -2.015, -1.015, -0.015,  0.985])

You can try to put different types of objects into an array, and NumPy will pick a data type that can hold them all. 

The results might not be quite what you expect!

In [5]:
np.array([3,3,"string",5,5])

array(['3', '3', 'string', '5', '5'], 
      dtype='<U21')

In [6]:
_[3] * 5

'55555'

More sensibly, it will choose types to avoid losing precision.

In [9]:
z = np.array([5,6.66666666666,7,8,9], dtype=np.float128)
z

array([ 5.0,  6.6666667,  7.0,  8.0,  9.0], dtype=float128)

Supports the same type of list operations as ordinary Python lists:

In [7]:
sorted(x - y * 3)

[-3.0, -2.0, -1.0, 0.0, 1.0]

...except the data type must match! A NumPy array only holds values of a single data type.

* This allows them to be packed efficiently in memory like C arrays

In [8]:
y.dtype

dtype('float64')

## Speed comparison

Math with NumPy arrays is much faster and more intuitive than the equivalent native Python operations

Consider the function $y = 1.324\cdot a - 12.99\cdot b + 1$

In pure Python we would define:

In [10]:
def py_add(a, b):
    assert(len(a) == len(b))
    c = [0]*len(a)
    for i in range(0,len(a)):
        c[i] = 1.324 * a[i] - 12.99*b[i] + 1
    return c

Using NumPy we could instead define:

In [11]:
def np_add(a, b):
    return 1.324 * a - 12.99 * b + 1

Now let's create a couple of very large arrays to work with:

In [12]:
a = np.arange(1000000)
b = np.random.randn(1000000)
len(a)

1000000

In [13]:
b[0:20]

array([ -1.04038635e+00,  -1.12557264e+00,  -3.35934524e-01,
        -1.05243357e+00,   7.79713901e-01,   1.62961031e+00,
         7.28904392e-01,   2.18205762e+00,   1.85499487e+00,
        -8.25888753e-02,   5.25834259e-02,  -2.05213712e+00,
         4.66048496e-01,  -1.48532950e+00,  -2.04051704e-03,
        -1.12309195e+00,   3.76113414e-02,  -3.35193490e-02,
         1.04619624e+00,  -1.71511853e+00])

Use the magic function ``%timeit`` to test the performance of both approaches.

In [14]:
%timeit py_add(a,b)

1 loop, best of 3: 3.31 s per loop


In [15]:
%timeit np_add(a,b)

100 loops, best of 3: 9.27 ms per loop
