# Array Oriented Programming

Using NumPy arrays enable you to express many kinds of data processing tasks as concise array expressions. This removes the need for to create loops to process arrays. This process of replacing loops with array expressions is called *vectorization*. In general, a vectorized array operation will often be one or two orders of magnitude faster than pure python equivalents, with the biggest impact in any kind of numerical computation.

As an example, let's take a look at the function `sqrt(x^2 + y^2)` across a regular grid of values. 

In [67]:
import numpy as np

In [68]:
points = np.arange(-5, 5, 0.01)

Here, we get 1000 equally spaced points and using the `meshgrid()` function which takes two 1D arrays and produces two 2D matrices corresponding to a pair of (x, y) in two arrays:

In [69]:
xs, ys = np.meshgrid(points, points)

In [70]:
xs

array([[-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       ...,
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99],
       [-5.  , -4.99, -4.98, ...,  4.97,  4.98,  4.99]])

Now, evaluating the function is a matter of writing the same expression you would write with two points:

In [71]:
time z = np.sqrt(xs ** 2 + ys ** 2)

CPU times: user 0 ns, sys: 222 ms, total: 222 ms
Wall time: 233 ms


In [72]:
z

array([[7.07106781, 7.06400028, 7.05693985, ..., 7.04988652, 7.05693985,
        7.06400028],
       [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
        7.05692568],
       [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
        7.04985815],
       ...,
       [7.04988652, 7.04279774, 7.03571603, ..., 7.0286414 , 7.03571603,
        7.04279774],
       [7.05693985, 7.04985815, 7.04278354, ..., 7.03571603, 7.04278354,
        7.04985815],
       [7.06400028, 7.05692568, 7.04985815, ..., 7.04279774, 7.04985815,
        7.05692568]])

## Conditional Logic as Array Operations

The NumPy.where function is vectorized version of the ternary expression `x if condition else y`. Suppose we have a boolean array and two arrays of values:

In [73]:
xarr = np.arange(15) + 0.1
yarr = np.arange(15) + 0.2

In [74]:
#mask = np.array([True, False, True, True, False, False, False, True, False, True, False, False, True, True, True, False, True, ])
mask = np.random.choice([True, False], size=15)
mask

array([ True, False,  True,  True,  True,  True, False, False, False,
       False,  True, False, False, False, False])

Suppose we wanted to take a value from xarr whenever the corresponding valule in cond is True and otherwise take from yarr. A list comprehension doing this might look like:

In [75]:
timeit [(x if m else y) for m, x, y in zip(mask, xarr, yarr)]

11.3 µs ± 326 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Although this works, it poses multiple problems. First, it will not be very fast for large arrays because all the work is being done in interpreted python code. Second, it will not work in multidimensional arrays without additional tweaking with nested loops. With`where()` we can do this in a more concise manner:

In [76]:
timeit np.where(mask, xarr, yarr)

2.68 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Our list comprehension is at least 4 times slower than the vectorized version. The second and third arguments to `where()` don't even need to bbe arrays - one or both of them could even be scalars. A useful application for this will be masking data. Suppose we have a dataset where we wanted to replace all negative values with 0. This is very easy using `where()`.

In [79]:
a = np.random.randn(10, 10)

In [81]:
a

array([[-2.20302665, -0.61390337, -2.54058693, -0.72424268,  1.00206805,
        -0.72982098, -0.45723666,  1.67182593,  1.01264522,  0.05765577],
       [ 0.39698014, -0.14967381, -2.10907265, -0.03142134,  0.10020257,
        -0.53826631, -1.33109636,  0.12893053,  2.06430783, -1.50428213],
       [-0.6605047 ,  0.93245544, -1.43171449,  1.68121844,  0.7620569 ,
        -0.8316734 ,  0.40425373, -0.77703922, -0.25853696, -0.42799875],
       [-0.86045771,  0.5884398 , -1.4829827 ,  0.38073089,  0.32761923,
        -0.68931152, -0.15839306, -1.70489105,  0.23393342, -0.06584003],
       [-1.34040182,  0.89438112,  0.65813786, -0.10964945, -0.7760108 ,
        -0.54210343,  0.9078766 , -0.28270344,  0.10127039,  1.95625879],
       [ 0.78618308,  0.97840891,  0.22726628, -0.81492472, -1.20414097,
        -0.56786495, -0.77759112,  0.66347548,  1.81336315, -0.45029782],
       [-0.43237719,  0.5403987 ,  0.87390357, -0.68445767,  0.10901413,
         0.37443348,  0.75627998,  1.93014442

In [83]:
np.where(a < 0, 0, a)

array([[0.        , 0.        , 0.        , 0.        , 1.00206805,
        0.        , 0.        , 1.67182593, 1.01264522, 0.05765577],
       [0.39698014, 0.        , 0.        , 0.        , 0.10020257,
        0.        , 0.        , 0.12893053, 2.06430783, 0.        ],
       [0.        , 0.93245544, 0.        , 1.68121844, 0.7620569 ,
        0.        , 0.40425373, 0.        , 0.        , 0.        ],
       [0.        , 0.5884398 , 0.        , 0.38073089, 0.32761923,
        0.        , 0.        , 0.        , 0.23393342, 0.        ],
       [0.        , 0.89438112, 0.65813786, 0.        , 0.        ,
        0.        , 0.9078766 , 0.        , 0.10127039, 1.95625879],
       [0.78618308, 0.97840891, 0.22726628, 0.        , 0.        ,
        0.        , 0.        , 0.66347548, 1.81336315, 0.        ],
       [0.        , 0.5403987 , 0.87390357, 0.        , 0.10901413,
        0.37443348, 0.75627998, 1.93014442, 0.        , 0.87575939],
       [0.        , 0.        , 0.0257978

## Statistical Methods

A set of mathematical functions that compute statistics about an entire array or about the data along an axis are accessible as methods of the array class. You can use aggregations (reductions) like `sum`, `mean`, and `std` either by calling the array instance or using top-level NumPy functions.

In [88]:
a

array([[-2.20302665, -0.61390337, -2.54058693, -0.72424268,  1.00206805,
        -0.72982098, -0.45723666,  1.67182593,  1.01264522,  0.05765577],
       [ 0.39698014, -0.14967381, -2.10907265, -0.03142134,  0.10020257,
        -0.53826631, -1.33109636,  0.12893053,  2.06430783, -1.50428213],
       [-0.6605047 ,  0.93245544, -1.43171449,  1.68121844,  0.7620569 ,
        -0.8316734 ,  0.40425373, -0.77703922, -0.25853696, -0.42799875],
       [-0.86045771,  0.5884398 , -1.4829827 ,  0.38073089,  0.32761923,
        -0.68931152, -0.15839306, -1.70489105,  0.23393342, -0.06584003],
       [-1.34040182,  0.89438112,  0.65813786, -0.10964945, -0.7760108 ,
        -0.54210343,  0.9078766 , -0.28270344,  0.10127039,  1.95625879],
       [ 0.78618308,  0.97840891,  0.22726628, -0.81492472, -1.20414097,
        -0.56786495, -0.77759112,  0.66347548,  1.81336315, -0.45029782],
       [-0.43237719,  0.5403987 ,  0.87390357, -0.68445767,  0.10901413,
         0.37443348,  0.75627998,  1.93014442

In [85]:
a.mean()

-0.1874701096167288

In [86]:
np.mean(a)

-0.1874701096167288

In [87]:
a.sum()

-18.74701096167288