**Chapter 03 Anatomy of an array**

# Introduction

Let's consider a simple example where we want to clear all the values from an array which has the dtype np.float32. How does one write it to maximize speed? The below syntax is rather obvious (at least for those familiar with numpy) but the above question asks to find the fastest operation.

In [17]:
import numpy as np

Z = np.ones(4 * 1000000 ,np.float32)
Z

array([1., 1., 1., ..., 1., 1., 1.], dtype=float32)

In [18]:
Z.dtype

dtype('float32')

In [19]:
Z[...] = 0
Z

array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)

In [23]:
timeit("Z.view(np.float16)[...] = 0", globals())

83.7 ns ± 5.43 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [24]:
timeit("Z.view(np.int16)[...] = 0", globals())

82.7 ns ± 5.37 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [25]:
timeit("Z.view(np.int32)[...] = 0", globals())

82 ns ± 5.27 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [26]:
timeit("Z.view(np.float32)[...] = 0", globals())

81.7 ns ± 3.94 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [27]:
timeit("Z.view(np.int64)[...] = 0", globals())

84 ns ± 6.62 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [28]:
timeit("Z.view(np.float64)[...] = 0", globals())

81 ns ± 4.17 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [29]:
timeit("Z.view(np.complex128)[...] = 0", globals())

77.6 ns ± 3.94 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [30]:
timeit("Z.view(np.int8)[...] = 0", globals())

75.1 ns ± 0.676 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


Interestingly enough, the obvious way of clearing all the values is not the fastest. By casting the array into a larger data type such as np.float64, we gained a 25% speed factor. But, by viewing the array as a byte array (np.int8), we gained a 50% factor. The reason for such speedup are to be found in the internal numpy machinery and the compiler optimization. This simple example illustrates the philosophy of numpy as we'll see in the next section below.

# Memory layout

The [numpy documentation](https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html) defines the ndarray class very clearly:

> An instance of class ndarray consists of a contiguous one-dimensional segment of computer memory (owned by the array, or by some other object), combined with an indexing scheme that maps N integers into the location of an item in the block.

Said differently, an array is mostly a contiguous block of memory whose parts can be accessed using an indexing scheme. Such indexing scheme is in turn defined by a **_shape_** and a **_data type_** and this is precisely what is needed when you define a new array:

In [50]:
Z = np.arange(9).reshape(3, 3).astype(np.int16)
Z

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]], dtype=int16)

In [51]:
Z.itemsize

2

In [52]:
Z.shape

(3, 3)

In [54]:
Z.ndim

2

Furthermore and because Z is not a view, we can deduce the [strides](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.strides.html#numpy.ndarray.strides) of the array that define the number of bytes to step in each dimension when traversing the array.

In [57]:
strides = Z.shape[1]*Z.itemsize, Z.itemsize
strides

(6, 2)

In [58]:
Z.strides

(6, 2)


# Views and copies


In [68]:
Z = np.zeros(9)
Z

array([0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [70]:
Z_view = Z[:3]
Z_view

array([0., 0., 0.])

In [72]:
Z_view[...]=1
Z_view

array([1., 1., 1.])

In [74]:
Z

array([1., 1., 1., 0., 0., 0., 0., 0., 0.])

In [82]:
Y = np.arange(9)
Y

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [92]:
Y_copy = Y[[1,-2,-1]]
Y_copy

array([1, 7, 8])

In [95]:
Y_copy[...]=9
Y_copy

array([9, 9, 9])

In [96]:
Y

array([0, 1, 2, 3, 4, 5, 6, 7, 8])