# NumPy Arrays

NumPy (module ``numpy``) provides an array datatype with vectorized operations (similar to Matlab or IDL)

In [1]:
import numpy as np

Create two NumPy arrays containing 5 elements each. The ``numpy`` module contains a number of functions for generating common arrays:

In [2]:
x = np.arange(5)
x

array([0, 1, 2, 3, 4])

In [3]:
y = np.ones(5)
y

array([ 1.,  1.,  1.,  1.,  1.])

Operations are vectorized, so we can do arithmetic with arrays (as long as the dimensions match!) as we would with scalar variables.

In [4]:
x - y * 3

array([-3., -2., -1.,  0.,  1.])

Supports the same type of list operations as ordinary Python lists:

In [5]:
sorted(x - y * 3)

[-3.0, -2.0, -1.0, 0.0, 1.0]

...except the data type must match! A NumPy array only holds values of a single data type.

* This allows them to be packed efficiently in memory like C arrays

In [6]:
x.dtype

dtype('int64')

## Speed comparison

Math with NumPy arrays is much faster and more intuitive than the equivalent native Python operations

Consider the function $y = 1.324\cdot a - 12.99\cdot b + 1$

In pure Python we would define:

In [8]:
def py_add(a, b):
    c = []
    for i in xrange(0,len(a)):
        c.append(1.324 * a[i] - 12.99*b[i] + 1)
    return c

Using NumPy we could instead define:

In [9]:
def np_add(a, b):
    return 1.324 * a - 12.99 * b + 1

Now let's create a couple of very large arrays to work with:

In [10]:
a = np.arange(1e6)
b = np.random.randn(1e6)
len(a)

  from ipykernel import kernelapp as app


1000000

Use the magic function ``%timeit`` to test the performance of both approaches.

In [11]:
%timeit py_add(a,b)

1 loops, best of 3: 761 ms per loop


In [12]:
%timeit np_add(a,b)

100 loops, best of 3: 10.2 ms per loop
