#  Advanced Numpy

In [1]:
import numpy as np

## ndarray Object Internals

In [2]:
# TODO: complete this section

## Advanced Array Manipulation

### Reshaping Arrays

In [3]:
arr = np.arange(8)

In [4]:
arr

array([0, 1, 2, 3, 4, 5, 6, 7])

In [5]:
arr.reshape((4, 2))

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7]])

In [6]:
arr.reshape((4, 2)).reshape((2, 4))

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

One of the passed shape dimensions can be -1, in which case the value used for that dimension will be inferred from the data:

In [7]:
arr = np.arange(15)

In [8]:
arr.reshape((5, -1))

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

Since an array's shape attribute is a tuple, it can be passed to `reshape`, too:

In [9]:
other_arr = np.ones((3, 5))

In [10]:
other_arr.shape

(3, 5)

In [11]:
arr.reshape(other_arr.shape)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

The opposite operation of reshape from one-dimensional to a higher dimension is typically known as _flattening_ or _raveling_:

In [12]:
arr = np.arange(15).reshape((5, 3))

In [13]:
arr

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [14]:
arr.ravel()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

`ravel` does not produce a copy of the underlying values if the values in the result were contiguous in the original array. The `flatten` method behaves like ravel except it always _returns a copy of the data_:

In [15]:
arr.flatten()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

`a.ravel()`:

(i) Return only reference/view of original array

(ii) If you modify the array you would notice that the value of original array also changes.

(iii) Ravel is faster than flatten() as it does not occupy any memory.

(iv) Ravel is a library-level function.

`a.flatten()`:

(i) Return copy of original array

(ii) If you modify any value of this array value of original array is not affected.

(iii) Flatten() is comparatively slower than ravel() as it occupies memory.

(iv) Flatten is a method of an ndarray object.

The data can be reshaped or raveled in different orders:

#### C Versus Fortran Order

NumPy gives you control and flexibility over the layout of your data in memory. By default, NumPy arrays are created in **row major** order. Spatially this means that if you have a two-dimensional array of data, **the items in each row of the array are stored in adjacent memory locations**. The alternative to row major ordering is **column major** order, which means that values within each column of data are stored in adjacent memory locations.

Row and column major order are also know as **C** and **Fortran** order, respectively.

Functions like reshape and ravel accept an **order** argument indicating the order to use the data in the array. This is usually set to 'C' or 'F' in most cases (there are also less commonly used options 'A' and 'K') 

![alt text](images/orders.png "Reshaping in C (row major) or Fortran (column major) order")

In [16]:
arr = np.arange(12).reshape((3, 4))

In [17]:
arr.ravel()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [18]:
arr.ravel("F")

array([ 0,  4,  8,  1,  5,  9,  2,  6, 10,  3,  7, 11])

The key difference between C and Fortran order is the way in which the dimensions are walked:

C/row major order:

Traverse higher dimensions first (e.g., axis 1 before advancing on axis 0).

****
Fortran/column major order:

Traverse higher dimensions last (e.g., axis 0 before advancing on axis 1).


### Concatenating and Splitting Arrays

`numpy.concatenate` takes a sequence (tuple, list, etc.) of arrays and joins them together in order along the input axis:

In [19]:
arr1 = np.array([[1, 2, 3], [4, 5, 6]])

In [20]:
arr2 = np.array([[7, 8, 9], [10, 11, 12]])

In [21]:
np.concatenate([arr1, arr2], axis=0)

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [22]:
np.concatenate([arr1, arr2], axis=1)

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

There are some convenience functions, like `vstack` and `hstack`, for common kinds of concatenation. The preceding operations could have been expressed as:

In [23]:
np.vstack((arr1, arr2))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [24]:
np.hstack((arr1, arr2))

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

`split` slices apart an array into multiple arrays along an axis:

In [25]:
arr = np.random.randn(5, 2)

In [26]:
arr

array([[-1.72024202, -0.07894658],
       [ 0.84036023,  1.08687879],
       [-1.14617989,  0.62440376],
       [ 0.55847118,  0.80446232],
       [-0.85696054,  1.77101892]])

In [27]:
first, second, third = np.split(arr, [1, 3])

In [28]:
first

array([[-1.72024202, -0.07894658]])

In [29]:
second

array([[ 0.84036023,  1.08687879],
       [-1.14617989,  0.62440376]])

In [30]:
third

array([[ 0.55847118,  0.80446232],
       [-0.85696054,  1.77101892]])

The value [1, 3] passed to `np.split` indicate the indices at which to split the array into pieces.

![alt text](images/concatenation.png "Array concatenation functions")

### Stacking helpers: r_ and c_

There are two special objects in the NumPy namespace, `r_` and `c_`, that make stacking arrays more concise:

In [31]:
arr = np.arange(6)
arr1 = arr.reshape((3, 2))
arr2 = np.random.randn(3, 2)

In [32]:
arr1

array([[0, 1],
       [2, 3],
       [4, 5]])

In [33]:
arr2

array([[-0.86955105,  0.48323758],
       [-0.36226666,  0.89730989],
       [-1.11039436,  0.19035115]])

In [34]:
np.r_[arr1, arr2]

array([[ 0.        ,  1.        ],
       [ 2.        ,  3.        ],
       [ 4.        ,  5.        ],
       [-0.86955105,  0.48323758],
       [-0.36226666,  0.89730989],
       [-1.11039436,  0.19035115]])

In [35]:
np.c_[np.r_[arr1, arr2], arr]

array([[ 0.        ,  1.        ,  0.        ],
       [ 2.        ,  3.        ,  1.        ],
       [ 4.        ,  5.        ,  2.        ],
       [-0.86955105,  0.48323758,  3.        ],
       [-0.36226666,  0.89730989,  4.        ],
       [-1.11039436,  0.19035115,  5.        ]])

These additionally can translate slices to arrays:

In [36]:
np.c_[1:6, -10:-5]

array([[  1, -10],
       [  2,  -9],
       [  3,  -8],
       [  4,  -7],
       [  5,  -6]])

### Repeating Elements: tile and repeat

`repeat` replicates each element in an array some number of times, producing a larger array:

In [37]:
arr = np.arange(3)

In [38]:
arr

array([0, 1, 2])

In [39]:
arr.repeat(3)

array([0, 0, 0, 1, 1, 1, 2, 2, 2])

By default, if you pass an integer, each element will be repeated that number of times.
If you pass an array of integers, each element can be repeated a different number of times:

In [40]:
arr.repeat([2, 3, 4])

array([0, 0, 1, 1, 1, 2, 2, 2, 2])

Multidimensional arrays can have their elements repeated along a particular axis.

In [41]:
arr = np.arange(4).reshape(2, 2)

In [42]:
arr

array([[0, 1],
       [2, 3]])

In [43]:
arr.repeat(2, axis=0)

array([[0, 1],
       [0, 1],
       [2, 3],
       [2, 3]])

In [44]:
arr.repeat([2, 3], axis=0)

array([[0, 1],
       [0, 1],
       [2, 3],
       [2, 3],
       [2, 3]])

In [45]:
arr.repeat([2, 3], axis=1)

array([[0, 0, 1, 1, 1],
       [2, 2, 3, 3, 3]])

`tile` is a shortcut for stacking copies of an array along an axis. Visually you can think of it as being akin to _laying down tiles_:

In [46]:
arr

array([[0, 1],
       [2, 3]])

In [47]:
np.tile(arr, 2)

array([[0, 1, 0, 1],
       [2, 3, 2, 3]])

The second argument is the number of tiles; with a scalar, the tiling is made _row by row_, rather than column by column. The second argument to `tile` can be a tuple indicating the layout of the _tiling_:

In [48]:
arr = np.array([[1, 2], [3, 4]])

In [49]:
arr

array([[1, 2],
       [3, 4]])

In [50]:
np.tile(arr, (2, 1))

array([[1, 2],
       [3, 4],
       [1, 2],
       [3, 4]])

In [51]:
np.tile(arr, (3, 2))

array([[1, 2, 1, 2],
       [3, 4, 3, 4],
       [1, 2, 1, 2],
       [3, 4, 3, 4],
       [1, 2, 1, 2],
       [3, 4, 3, 4]])

### Fancy Indexing Equivalents: `take()` and `put()`

As you may recall, one way to get and set subsets of arrays is by fancyindexing using integer arrays:

In [52]:
arr = np.arange(10) * 100

In [53]:
arr

array([  0, 100, 200, 300, 400, 500, 600, 700, 800, 900])

In [54]:
inds = [7, 1, 2, 6]

In [55]:
arr[inds]

array([700, 100, 200, 600])

There are alternative ndarray methods that are useful in the special case of only making a selection on a single axis:

In [56]:
arr.take(inds)

array([700, 100, 200, 600])

In [57]:
arr.put(inds, 42)

In [58]:
arr

array([  0,  42,  42, 300, 400, 500,  42,  42, 800, 900])

In [59]:
arr.put(inds, [40, 41, 42, 43])

In [60]:
arr

array([  0,  41,  42, 300, 400, 500,  43,  40, 800, 900])

To use take along other axes, you can pass the `axis` keyword:

In [61]:
inds = [2, 0, 2, 1]

In [62]:
arr = np.arange(8).reshape(2, 4)

In [63]:
arr

array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

In [64]:
arr.take(inds, axis=1)

array([[2, 0, 2, 1],
       [6, 4, 6, 5]])

`put` does not accept an axis argument but rather indexes into the flattened (one-dimensional, C order) version of the array. Thus, when you need to set elements using an index array on other axes, it is often easiest to use fancy indexing.

## Broadcasting

*Broadcasting* describes how arithmetic works between arrays of different shapes.

The simplest example of broadcasting occurs when combining a scalar value with an array:

In [65]:
arr = np.arange(5)

In [66]:
arr

array([0, 1, 2, 3, 4])

In [67]:
arr * 4

array([ 0,  4,  8, 12, 16])

We can demean each column of an array by subtracting the column means. In this case, it is very simple:

In [68]:
arr = np.arange(12).reshape(4, 3)

In [69]:
arr

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [70]:
arr.mean(0)

array([4.5, 5.5, 6.5])

In [71]:
demeaned = arr - arr.mean(0)

In [72]:
demeaned

array([[-4.5, -4.5, -4.5],
       [-1.5, -1.5, -1.5],
       [ 1.5,  1.5,  1.5],
       [ 4.5,  4.5,  4.5]])

In [73]:
demeaned.mean(0)

array([0., 0., 0.])

## The Broadcasting Rules

Two arrays are compatible for broadcasting if for each *trailing dimension* (i.e., starting from the end) the axis lengths match or if *either of the lengths is 1*. Broadcasting is then performed over the *missing or length 1 dimensions*.

Rules:

Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:

**Rule 1**: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.

**Rule 2**: If the shape of the two arrays does not match in any dimension, the array with a shape equal to 1 in that dimension is stretched to match the other shape.

**Rule 3**: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

Note that the number of dimensions of ndarray can be obtained with the ndim attribute and the shape with the shape attribute.

![alt text](images/broadcast0.png "Broadcasting over axis 0 with a 1D array")

Suppose we wished instead to subtract the mean value from each row. Since `arr.mean(0)` has length 3, it is compatible for broadcasting across axis 0 because the trailing dimension in arr is 3 and therefore matches. According to the rules, to subtract over axis 1 (i.e., subtract the row mean from each row), the smaller array must have shape (4, 1):

In [74]:
arr

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [75]:
row_means = arr.mean(1, keepdims=True)

In [76]:
row_means

array([[ 1.],
       [ 4.],
       [ 7.],
       [10.]])

In [77]:
row_means.shape

(4, 1)

In [78]:
row_means.reshape((4, 1))

array([[ 1.],
       [ 4.],
       [ 7.],
       [10.]])

In [79]:
demeaned = arr - row_means.reshape((4, 1))

In [80]:
demeaned

array([[-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.],
       [-1.,  0.,  1.]])

![alt text](images/broadcast1.png "Broadcasting over axis 1 of a 2D array")

Adding a two-dimensional array to a three-dimensional one across axis 0.

![alt text](images/broadcast2.png "Broadcasting over axis 0 of a 3D array")

In the three-dimensional case, broadcasting over any of the three dimensions is only a matter of reshaping the data to be shape-compatible.

![alt text](images/broadcast3d.png "Compatible 2D array shapes for broadcasting over a 3D array")

According to the broadcasting rule, the “broadcast dimensions” must be 1 in the smaller array.

A common problem, therefore, is needing to add a new axis with length 1 specifically for broadcasting purposes. Using `reshape` is one option, but inserting an axis requires constructing a tuple indicating the new shape. Thus, NumPy arrays offer a special syntax for inserting new axes by indexing. We use the special `np.newaxis` attribute along with "full" slices to insert the `newaxis`:

In [81]:
x = np.arange(10)

In [82]:
x, x.shape

(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), (10,))

In [83]:
x2 = x[:, np.newaxis]

In [84]:
x2, x2.shape

(array([[0],
        [1],
        [2],
        [3],
        [4],
        [5],
        [6],
        [7],
        [8],
        [9]]),
 (10, 1))

In [85]:
arr = np.arange(16).reshape(4, 4)

In [86]:
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [87]:
arr_3d = arr[:, np.newaxis, :]

In [88]:
arr_3d.shape

(4, 1, 4)

In [89]:
arr_3d

array([[[ 0,  1,  2,  3]],

       [[ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11]],

       [[12, 13, 14, 15]]])

In [90]:
arr_1d = np.random.normal(size=3)

In [91]:
arr_1d

array([-1.69939913,  0.12424106,  1.6237344 ])

In [92]:
arr_1d[np.newaxis, :]

array([[-1.69939913,  0.12424106,  1.6237344 ]])

Thus, if we had a three-dimensional array and wanted to demean axis 2, say, we would need to write:

In [93]:
arr = np.arange(60).reshape(3, 4, 5)

In [94]:
arr

array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39]],

       [[40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59]]])

In [95]:
depth_means = arr.mean(2)

In [96]:
depth_means

array([[ 2.,  7., 12., 17.],
       [22., 27., 32., 37.],
       [42., 47., 52., 57.]])

In [97]:
depth_means.shape

(3, 4)

In [98]:
depth_means[:, :, np.newaxis].shape

(3, 4, 1)

In [99]:
demeaned = arr - depth_means[:, :, np.newaxis]

You might be wondering if there’s a way to generalize demeaning over an axis without sacrificing performance. There is, but it requires some indexing gymnastics:

In [100]:
def demean_axis(arr, axis=0):
    means = arr.mean(axis)
    # This generalizes things like [:, :, np.newaxis] to N dimensions
    indexer = [slice(None)] * arr.ndim
    indexer[axis] = np.newaxis
    return arr - means[indexer]

### Setting Array Values by Broadcasting

In [101]:
arr = np.zeros((4, 3))

In [102]:
arr[:] = 5

If we had a one-dimensional array of values we wanted to set into the columns of the array, we can do that as long as the shape is compatible:

In [103]:
col = np.array([1.28, -0.42, 0.44, 1.6])

In [104]:
arr

array([[5., 5., 5.],
       [5., 5., 5.],
       [5., 5., 5.],
       [5., 5., 5.]])

In [105]:
col[:, np.newaxis]

array([[ 1.28],
       [-0.42],
       [ 0.44],
       [ 1.6 ]])

In [106]:
arr[:] = col[:, np.newaxis]

In [107]:
arr

array([[ 1.28,  1.28,  1.28],
       [-0.42, -0.42, -0.42],
       [ 0.44,  0.44,  0.44],
       [ 1.6 ,  1.6 ,  1.6 ]])

In [108]:
arr[:2] = [[-1.37], [0.509]]

In [109]:
arr

array([[-1.37 , -1.37 , -1.37 ],
       [ 0.509,  0.509,  0.509],
       [ 0.44 ,  0.44 ,  0.44 ],
       [ 1.6  ,  1.6  ,  1.6  ]])