> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.


# 4.8. Making efficient selections in arrays with NumPy

In [1]:
import numpy as np

In [2]:
id = lambda x: x.__array_interface__['data'][0]

We create a large array.

In [3]:
n, d = 100000, 100

In [5]:
a = np.random.random_sample((n, d)); aid = id(a)
print(a)

[[ 0.55018576  0.5077209   0.32207054 ...,  0.1759042   0.40456074
   0.72045229]
 [ 0.49654852  0.10445829  0.35981761 ...,  0.81583951  0.64972485
   0.76342936]
 [ 0.3679865   0.5734621   0.05725147 ...,  0.88270858  0.95325079
   0.84138566]
 ..., 
 [ 0.93199799  0.25924236  0.75891025 ...,  0.78919081  0.46252381
   0.42590794]
 [ 0.39260158  0.02782338  0.61697793 ...,  0.71679825  0.86234797
   0.82528389]
 [ 0.18653985  0.60269072  0.74888681 ...,  0.73380749  0.67840095
   0.21951377]]


## Array views and fancy indexing

We take a selection using two different methods: with a view and with fancy indexing.

In [None]:
b1 = a[::10]
b2 = a[np.arange(0, n, 10)]

In [None]:
np.array_equal(b1, b2)

The view refers to the original data buffer, whereas fancy indexing yields a copy.

In [None]:
id(b1) == aid, id(b2) == aid

Fancy indexing is several orders of magnitude slower as it involves copying a large array. Fancy indexing is more general as it allows to select any portion of an array (using any list of indices), not just a strided selection.

In [None]:
%timeit a[::10]

In [None]:
%timeit a[np.arange(0, n, 10)]

## Alternatives to fancy indexing: list of indices

Given a list of indices, there are two ways of selecting the corresponding sub-array: fancy indexing, or the np.take function.

In [None]:
i = np.arange(0, n, 10)

In [None]:
b1 = a[i]
b2 = np.take(a, i, axis=0)

In [None]:
np.array_equal(b1, b2)

In [None]:
%timeit a[i]

In [None]:
%timeit np.take(a, i, axis=0)

Using np.take instead of fancy indexing is faster.

**Note**: Performance of fancy indexing has been improved in recent versions of NumPy; this trick is especially useful on older versions of NumPy.

## Alternatives to fancy indexing: mask of booleans

Let's create a mask of booleans, where each value indicates whether the corresponding row needs to be selected in x.

In [None]:
i = np.random.random_sample(n) < .5

The selection can be made using fancy indexing or the np.compress function.

In [None]:
b1 = a[i]
b2 = np.compress(i, a, axis=0)

In [None]:
np.array_equal(b1, b2)

In [None]:
%timeit a[i]

In [None]:
%timeit np.compress(i, a, axis=0)

Once again, the alternative method to fancy indexing is faster.

> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).

> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages).