# NumPy Arrays

NumPy (module ``numpy``) provides an array datatype with vectorized operations (similar to Matlab or IDL)

In [1]:
import numpy as np

Create two NumPy arrays containing 5 elements each. The ``numpy`` module contains a number of functions for generating common arrays:

In [9]:
x = np.arange(5)
x

array([0, 1, 2, 3, 4])

In [10]:
y = np.ones(5)
y

array([ 1.,  1.,  1.,  1.,  1.])

Operations are vectorized, so we can do arithmetic with arrays (as long as the dimensions match!) as we would with scalar variables.

In [20]:
x - y * 3

array([-3., -2., -1.,  0.,  1.])

In [17]:
np.array([3,3,"string",5,5])

array(['3', '3', 'string', '5', '5'], 
      dtype='|S21')

In [19]:
_17[3]

'5'

Supports the same type of list operations as ordinary Python lists:

In [21]:
sorted(x - y * 3)

[-3.0, -2.0, -1.0, 0.0, 1.0]

...except the data type must match! A NumPy array only holds values of a single data type.

* This allows them to be packed efficiently in memory like C arrays

In [22]:
x.dtype

dtype('int64')

## Speed comparison

Math with NumPy arrays is much faster and more intuitive than the equivalent native Python operations

Consider the function $y = 1.324\cdot a - 12.99\cdot b + 1$

In pure Python we would define:

In [23]:
def py_add(a, b):
    c = []
    for i in xrange(0,len(a)):
        c.append(1.324 * a[i] - 12.99*b[i] + 1)
    return c

Using NumPy we could instead define:

In [24]:
def np_add(a, b):
    return 1.324 * a - 12.99 * b + 1

Now let's create a couple of very large arrays to work with:

In [25]:
a = np.arange(1e6)
b = np.random.randn(1e6)
len(a)

  from ipykernel import kernelapp as app


1000000

In [27]:
b[0:20]

array([ 0.49883948, -0.37267717,  0.46815053, -0.33360697,  0.4578345 ,
        1.09583529, -1.0082755 ,  0.15779526,  0.80375381,  0.42321135,
       -1.16654497, -0.85785497,  1.6719712 ,  0.46119872, -0.14166226,
       -1.16675487, -0.7568906 , -0.59884587,  0.62254346,  0.34340959])

Use the magic function ``%timeit`` to test the performance of both approaches.

In [28]:
%timeit py_add(a,b)

1 loop, best of 3: 855 ms per loop


In [29]:
%timeit np_add(a,b)

100 loops, best of 3: 10.1 ms per loop
