# 02.01 NumPy Operations

NumPy arrays work like lists to some extent and like simple values to another.
As lists you can retrieve parts of an array,
but also can perform vectorized operations on all (or some) of the values of an array.
We will now look at the operations on NumPy arrays that are most useful
for working with data and which will lead us further towards data science.

To keep each section self contained we will perform the required imports
from previous section at the top.
If some import looks strange one should go back and check the previous sections.
For now we only know about NumPy,

In [1]:
import numpy as np

## Indexing and Slicing

Like lists indexing and slicing is done with square brackets.
One dimensional indexing works pretty much the same as a list.
Let's import NumPy, create an array and check.

In [2]:
x = np.arange(9)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [3]:
x[3]

3

In [4]:
x[1:5]

array([1, 2, 3, 4])

In more dimensions we add an extra index.
The index is understood as a tuple of integers or slice objects.
Behind the scenes this is just a cleverly designed Python `__getitem__` method.

In [5]:
x = np.arange(18).reshape((3, 6))
x, x.shape

(array([[ 0,  1,  2,  3,  4,  5],
        [ 6,  7,  8,  9, 10, 11],
        [12, 13, 14, 15, 16, 17]]),
 (3, 6))

In [6]:
x[1, 1]

7

Slicing can become complicated with several dimensions, let's try to memorize some operations.

Note: remember that slicing in Python uses the **`[start:stop:step]`** syntax.
And that not providing one of the components they are take as:

- no start: `start=0`
- no stop: `stop=-1`
- no step: `step=1`

Which also means that `[:]` means "take everything" since start = 1, stop = -1 and step = 1.

![Slice 1: Select](np-slice-1-select.svg)

<div style="text-align:right;"><sup>np-slice-1-select.svg</sup></div>

In [7]:
x[1,3:]

array([ 9, 10, 11])

![Slice 2: All Values](np-slice-2-all-values.svg)


<div style="text-align:right;"><sup>np-slice-2-all-values.svg</sup></div>

In [8]:
x[0:2,:]

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

![Slice 3: Slice Both](np-slice-3-slice-both.svg)

<div style="text-align:right;"><sup>np-slice-3-slice-both.svg</sup></div>

In [9]:
x[1:3,2:5]

array([[ 8,  9, 10],
       [14, 15, 16]])

![Slice 4: Step](np-slice-4-step.svg)

<div style="text-align:right;"><sup>np-slice-4-step.svg</sup></div>

In [10]:
x[:,::2]

array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

![Slice 5: Step Both](np-slice-5-step-both.svg)

<div style="text-align:right;"><sup>np-slice-5-step-both.svg</sup></div>

In [11]:
x[::2,1::3]

array([[ 1,  4],
       [13, 16]])

### Quirk, omitting `:`

Thanks to the tuple-of-slices syntax that NumPy uses one can omit the `:` from the last dimension.
The following works:

In [12]:
x[1]

array([ 6,  7,  8,  9, 10, 11])

and is equivalent to

In [13]:
x[1,]

array([ 6,  7,  8,  9, 10, 11])

and equivalent to

In [14]:
x[1,:]

array([ 6,  7,  8,  9, 10, 11])

But this one will not work

In [15]:
x[,1]

SyntaxError: invalid syntax (<ipython-input-15-cefd0d14ff16>, line 1)

The correct way is to use `:` in the first dimension

In [16]:
x[:,1]

array([ 1,  7, 13])

Always explicitly use `:` to mean that you are taking the full dimension.
This works this way in NumPy because the array can be understood as a list of lists,
and `x[1]` takes the first of those lists, i.e. **a row**.
When we get to see `pandas` a single index will mean **a column**.

### Modifying slices

Contrary to Python lists NumPy array slices are views on the data,
therefore slice assignment is possible.

In [17]:
x = np.arange(18).reshape((3, 6))
x

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17]])

In [18]:
x[:2,::2]

array([[ 0,  2,  4],
       [ 6,  8, 10]])

In [19]:
x[:2, ::2] = np.zeros((2, 3))
x

array([[ 0,  1,  0,  3,  0,  5],
       [ 0,  7,  0,  9,  0, 11],
       [12, 13, 14, 15, 16, 17]])

Another piece of metadata are array flags one such flag - named `owndata` -
can tell you whether an array is a view (false) or not (true).
To get a new array from a view one can use the `copy` method.

In [20]:
x = np.arange(18).reshape((3, 6))
y = x[:2,::2]
y.flags.owndata

False

In [21]:
y[:] = np.zeros((2, 3))
x

array([[ 0,  1,  0,  3,  0,  5],
       [ 0,  7,  0,  9,  0, 11],
       [12, 13, 14, 15, 16, 17]])

In [22]:
x = np.arange(18).reshape((3, 6))
y = x[:2,::2].copy()
y.flags.owndata

True

### Concatenating and slicing

Concatenation can be performed in several ways,
the main procedure is `np.concatenate` which accepts as `axis=` parameter.
The **axis** can be very confusing since it means different things in the PyData
group of libraries.  For now remember that in NumPy is means the dimension of the array.

To concatenate arrays must match on the axis used.

In [23]:
x = np.arange(18).reshape((3, 6))
y = np.arange(12).reshape((2, 6))
np.concatenate((x, y), axis=0)

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

In [24]:
x = np.arange(18).reshape((3, 6))
y = np.arange(12).reshape((3, 4))
np.concatenate((x, y), axis=1)

array([[ 0,  1,  2,  3,  4,  5,  0,  1,  2,  3],
       [ 6,  7,  8,  9, 10, 11,  4,  5,  6,  7],
       [12, 13, 14, 15, 16, 17,  8,  9, 10, 11]])

There are also `np.vstack` and `np.hstack` equivalent to `axis=0` and `axis=1` respectively.

In [25]:
x = np.arange(18).reshape((3, 6))
y = np.arange(12).reshape((2, 6))
np.vstack((x, y))

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

In [26]:
x = np.arange(18).reshape((3, 6))
y = np.arange(12).reshape((3, 4))
np.hstack((x, y))

array([[ 0,  1,  2,  3,  4,  5,  0,  1,  2,  3],
       [ 6,  7,  8,  9, 10, 11,  4,  5,  6,  7],
       [12, 13, 14, 15, 16, 17,  8,  9, 10, 11]])

`np.split` separates the array in pieces.  Can you tell how?

In [27]:
np.split(np.arange(9), 3)

[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]

In [28]:
np.split(np.arange(6), 2)

[array([0, 1, 2]), array([3, 4, 5])]

Similar to concatenate it accepts an `axis=` argument,
and there are `np.vsplit` and `np.hsplit`.

Note: There are also `np.dstack` and `np.dslipt` that are equivalent to `axis=2`.
Yet, we will not be delving into three dimensional arrays too often.

### Sorting

Sorting an array is performed in place.  Also, `axis=None` cannot be used.

Note: `-1` in NumPy indexing means a maximum operation, e.g. the last dimension or the flattening.

In [29]:
x = np.arange(18, 0, -1).reshape((3, 6))
x

array([[18, 17, 16, 15, 14, 13],
       [12, 11, 10,  9,  8,  7],
       [ 6,  5,  4,  3,  2,  1]])

In [30]:
x = np.arange(18, 0, -1).reshape((3, 6))
x.sort(axis=-1)
x

array([[13, 14, 15, 16, 17, 18],
       [ 7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6]])

In [31]:
x = np.arange(18, 0, -1).reshape((3, 6))
y = x.reshape(-1)
y.sort()
y

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18])

Another way to return a sorted arrays instead of sorting it in place is to use `argsort`.

In [32]:
x = np.arange(18, 0, -1).reshape((3, 6))
x.argsort()

array([[5, 4, 3, 2, 1, 0],
       [5, 4, 3, 2, 1, 0],
       [5, 4, 3, 2, 1, 0]])

In [33]:
x = np.arange(18, 0, -1).reshape((3, 6))
x[:,x.argsort()][:,0,:]

array([[13, 14, 15, 16, 17, 18],
       [ 7,  8,  9, 10, 11, 12],
       [ 1,  2,  3,  4,  5,  6]])

### Fancy Indexing

The sorting with `argsort` was an example of fancy indexing,
where one indexes an array using another array.

In [34]:
x = np.arange(18).reshape((3, 6))
x

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17]])

In [35]:
x[np.array([1, 2, 2]), np.array([1, 1, 3])]

array([ 7, 13, 15])

In [36]:
x[np.array([1, 1]), :]

array([[ 6,  7,  8,  9, 10, 11],
       [ 6,  7,  8,  9, 10, 11]])

### Boolean Logic and Masks

Since we can use arrays to index arrays we can create arrays that work as masks.
With fancy indexing we used arrays of integers that work as indexes.
A mask is a boolean only arrays of the same size as the array being indexed,
and we retrieve (index) only the entries for which the mask is true.
But first how do we generate a boolean array?

In [37]:
x = np.arange(18).reshape((3, 6))
x < 12

array([[ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True],
       [False, False, False, False, False, False]])

In [38]:
x[x < 12]

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

We can use the `|` (or) and the `&` (and) operators.
Unfortunately in plain Python these are bitwise operators and have very high precedence,
therefore we need to at parentheses around the expressions.

In [39]:
x[(x < 12) & (x > 3)]

array([ 4,  5,  6,  7,  8,  9, 10, 11])

One can retain the shape by masking columns or rows only.
To build row and column based masks one can use `np.all` and `np.any`.

In [40]:
x[np.all((x < 12) | (x < -2), axis=1), :]

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])

The boolean operators are a from of NumPy ufuncs,
functions that perform element-wise over an array.

## Broadcasting

When you add (or subtract, or multiply) a number against an array,
the number is summed to all elements.
We say that the number is broadcast against all elements.
But we are not limited to broadcast single numbers only.
As long as arrays share parts of their shape they can be broadcast together.

![Broadcast Simple](np-broadcast-simple.svg)

<div style="text-align:right;"><sup>np-broadcast-simple.svg</sup></div>

In [41]:
x[0,:] + 42

array([42, 43, 44, 45, 46, 47])

![Broadcast Axis](np-broadcast-axis.svg)

<div style="text-align:right;"><sup>np-broadcast-axis.svg</sup></div>

In [42]:
x[:,2:4] + np.arange(1, 3)

array([[ 3,  5],
       [ 9, 11],
       [15, 17]])

In [43]:
x[:,2:4] + np.arange(1, 4)[:, np.newaxis]

array([[ 3,  4],
       [10, 11],
       [17, 18]])

The row vector example is equivalent to the following.
Part of the broadcasting procedure is to *extend an axis at the front* of the array.
For finer control one can do it explicitly.

In [44]:
x[:,2:4] + np.arange(1, 3)[np.newaxis, :]

array([[ 3,  5],
       [ 9, 11],
       [15, 17]])

![Broadcasting](np-broadcast-tv.svg)

<div style="text-align:right;"><sup>np-broadcast-tv.svg</sup></div>

## Ufuncs

To perform operations on NumPy arrays one adds, subtracts, multiplies, divides,
or even other operations as if dealing with plain numbers.
Behind the scenes NumPy does its magic and performs the operation across the array.

In [45]:
x = np.arange(6).reshape((2, 3))
x

array([[0, 1, 2],
       [3, 4, 5]])

In [46]:
x + 7

array([[ 7,  8,  9],
       [10, 11, 12]])

In [47]:
x / 3.14

array([[0.        , 0.31847134, 0.63694268],
       [0.95541401, 1.27388535, 1.59235669]])

In [48]:
x // 3.14

array([[0., 0., 0.],
       [0., 1., 1.]])

The [full list of ufuncs][docufunc] is quite big but some are:

| ufunc           | operator | description |
|:--------------- |:-------- |:----------- |
| np.add          | \+       | Add arguments element-wise. |
| np.subtract     | \-       | Subtract arguments element-wise. |
| np.multiply     | \*       | Multiply arguments element-wise. |
| np.divide       | /        | Returns a true division of the inputs, element-wise. |
| np.floor_divide | //       | Return the largest integer smaller or equal to the division of the inputs. |
| np.negative     | \-       | Numerical negative, element-wise. |
| np.power        | \*\*     | First array elements raised to powers from second array, element-wise. |
| np.mod          | %        | Return element-wise remainder of division. |
| np.sin          |          | Trigonometric sine, element-wise. |
| np.cos          |          | Cosine element-wise. |
| np.tan          |          | Compute tangent element-wise. |
| np.arcsin       |          | Inverse sine, element-wise. |
| np.arccos       |          | Inverse cosine, element-wise. |
| np.arctan       |          | Inverse tangent, element-wise. |

Note that not all operations are linked to an operator.
You can use them by directly invoking the ufunc on the array, e.g. `np.sin(x).`

[docufunc]: https://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs

And there are [aggregation functions][docustat] too, some of them:

| aggregation function | nan-safe     | description |
|:-------------------- |:------------ |:----------- |
| np.sum               | np.nansum    | Compute sum of elements |
| np.prod              | np.nanprod   | Compute product of elements |
| np.mean              | np.nanmean   | Compute mean of elements |
| np.average           |              | Compute the weighted average |
| np.median            | np.nanmedian | Compute median of elements |
| np.std               | np.nanstd    | Compute standard deviation |
| np.var               | np.nanvar    | Compute variance |
| np.min               | np.nanmin    | Find minimum value |
| np.max               | np.nanmax    | Find maximum value |

[docustat]: https://docs.scipy.org/doc/numpy/reference/routines.math.html

The different thing about aggregation functions is that these again accept the `axis=` parameter.
Yet, the tricky part is that the argument means the *axis through which the aggregation
will take place*.  For example:

In [49]:
x = np.arange(18).reshape((3, 6))
x

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17]])

![Aggregation Axis None](np-agg-axis-none.svg)

<div style="text-align:right;"><sup>np-agg-axis-none.svg</sup></div>

In [50]:
np.sum(x, axis=None)

153

![Aggregation Axis Zero](np-agg-axis-0.svg)

<div style="text-align:right;"><sup>np-agg-axis-0.svg</sup></div>

In [51]:
np.sum(x, axis=0)

array([18, 21, 24, 27, 30, 33])

![Aggregation Axis One](np-agg-axis-1.svg)

<div style="text-align:right;"><sup>np-agg-axis-1.svg</sup></div>

In [52]:
np.sum(x, axis=1)

array([15, 51, 87])