> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.


# 4.8. Making efficient selections in arrays with NumPy

In [1]:
import numpy as np

In [2]:
id = lambda x: x.__array_interface__['data'][0]

We create a large array.

In [3]:
n, d = 100000, 100

In [19]:
a = np.random.random_sample((n, d)); aid = id(a)
print(a)
print(a.shape)

[[ 0.12244177  0.91476988  0.49651874 ...,  0.78331324  0.60397791
   0.2963022 ]
 [ 0.99868592  0.03647682  0.15306486 ...,  0.38934841  0.37463091
   0.17552922]
 [ 0.88705423  0.16283962  0.64356109 ...,  0.88209551  0.76812215
   0.94613389]
 ..., 
 [ 0.64943325  0.52114212  0.30639615 ...,  0.20489939  0.24337615
   0.32623384]
 [ 0.33298506  0.01995627  0.83498921 ...,  0.93481185  0.49159768
   0.95320871]
 [ 0.21199617  0.6825873   0.30943955 ...,  0.69251929  0.95378877
   0.84859493]]
(100000, 100)


## Array views and fancy indexing

We take a selection using two different methods: with a view and with fancy indexing.

In [20]:
b1 = a[::10]
b2 = a[np.arange(0, n, 10)]
print(np.arange(0, 10, 2))
print(type(np.arange(0, n, 10)))
print()
print(b1)
print(b1.shape)
a = np.array([111,2222,32,454,565,786,887,228,19,110])
print(a[[1,2]])
print(a[(1,2)])

[0 2 4 6 8]
<class 'numpy.ndarray'>

[[ 0.12244177  0.91476988  0.49651874 ...,  0.78331324  0.60397791
   0.2963022 ]
 [ 0.31361518  0.45538291  0.93550425 ...,  0.67722948  0.99149909
   0.17902664]
 [ 0.76310626  0.59070447  0.34235967 ...,  0.99969801  0.47123304
   0.36770397]
 ..., 
 [ 0.31553581  0.00408134  0.44312397 ...,  0.62774265  0.7027292
   0.97735481]
 [ 0.7737982   0.99797381  0.767959   ...,  0.32571516  0.54105122
   0.45112336]
 [ 0.17369569  0.04245214  0.68432632 ...,  0.85014477  0.86120262
   0.34316655]]
(10000, 100)
[2222   32]


IndexError: too many indices for array

In [None]:
np.array_equal(b1, b2)

The view refers to the original data buffer, whereas fancy indexing yields a copy.

In [None]:
id(b1) == aid, id(b2) == aid

Fancy indexing is several orders of magnitude slower as it involves copying a large array. Fancy indexing is more general as it allows to select any portion of an array (using any list of indices), not just a strided selection.

In [None]:
%timeit a[::10]

In [None]:
%timeit a[np.arange(0, n, 10)]

## Alternatives to fancy indexing: list of indices

Given a list of indices, there are two ways of selecting the corresponding sub-array: fancy indexing, or the np.take function.

In [None]:
i = np.arange(0, n, 10)

In [None]:
b1 = a[i]
b2 = np.take(a, i, axis=0)

In [None]:
np.array_equal(b1, b2)

In [None]:
%timeit a[i]

In [None]:
%timeit np.take(a, i, axis=0)

Using np.take instead of fancy indexing is faster.

**Note**: Performance of fancy indexing has been improved in recent versions of NumPy; this trick is especially useful on older versions of NumPy.

## Alternatives to fancy indexing: mask of booleans

Let's create a mask of booleans, where each value indicates whether the corresponding row needs to be selected in x.

In [None]:
i = np.random.random_sample(n) < .5

The selection can be made using fancy indexing or the np.compress function.

In [None]:
b1 = a[i]
b2 = np.compress(i, a, axis=0)

In [None]:
np.array_equal(b1, b2)

In [None]:
%timeit a[i]

In [None]:
%timeit np.compress(i, a, axis=0)

Once again, the alternative method to fancy indexing is faster.

> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).

> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages).