# Numpy Arrays What - can we do with them?

[<center><img src="images/numpy_logo.png" width="400"/></center>](https://numpy.org/doc/stable/user/quickstart.html)

## Table of Contents

* [Accessing Array Elements](#aae)
    * [Indexing and Slicing](#Indx_Slic)
    * [Masking/Filtering](#Mask_Filt)
<br>
<br>
* [Math with Numpy Arrays](#mwna)
    * [Math Operators and Functions - Ufuncs](#Operators)
    * [Comparison](#Comparison)
    * [Constants](#Constants)
<br>
<br>
* [Changing Shape](#Changing_Shape)
    * [The axis argument](#Axis)
    * [Reshape](#Reshape)
    * [Removing Dimensions](#Removing_Dimensions)
        * [Flattening](#Flattening)
        * [Squeezing](#Squeezing)
    * [Adding Dimensions](#Adding_Dimensions)
        * [Extending](#Extending)
        * [Combining Arrays](#Combining_Arrays)
        * [Repeating Arrays](#Repeating_Arrays)
    * [Aggregation](#Aggregation)
    * [Broadcasting](#Broadcasting)
<br>
<br>
* [Miscellaneous](#Miscellaneous)
    * [Random Number Generation](#Random)


## Imports

In [1]:
import numpy as np
print(np.__version__)

1.24.3


<a id='aae'></a>
# Accessing Array Elements

<a id='Indx_Slic'></a>
## Indexing and Slicing
To index a "single element" of a `N` dimensional `array` we can use the following syntax. <br>
`array[dim1, dim2, dim3, ..., dimN]` where `dimN` is the **index** in the **N**th dimension

In [4]:
a = np.arange(27).reshape((3,3,3))
a

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

In [5]:
a[2,0,1]

19

To access multiple elements we can **slice** an `array` with the following syntax.<br>
`array[start:stop:step]`, where `step` allows us to set a stride size.<br>

In [6]:
a = np.arange(100)
a[0:50:5]

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45])

We can also slice across multiple dimensions.

In [7]:
a = np.arange(100).reshape((10,10))
a[0:8:2, 0:10:5]


array([[ 0,  5],
       [20, 25],
       [40, 45],
       [60, 65]])

#### negative indices
We can also **index** and **slice** using **negative integers**. Negative indices while start from the back of the array and count backwards.

In [8]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [9]:
a[-1]

9

In [10]:
a[-2]

8

If we define the **stride** with a **negative integer** we can reverse the array.

In [11]:
a[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

<a id='Mask_Filt'></a>
## Masking/Filtering

One way of **filtering** an `array` is by creating a **boolean mask**.<br>
A **boolean mask** should have values of either `True` `1` or `False` `0`, depending on whether or not the respective array value satisfies the given **condition**.

In [36]:
a = np.arange(10)

mask =  a%2 == 0 # condition: divisible by 2
mask

array([ True, False,  True, False,  True, False,  True, False,  True,
       False])

With the **boolean mask** we can now **filter** the `array`, which will return the values of the indices where the mask is `True`

In [37]:
a[mask]

array([0, 2, 4, 6, 8])

We can more or less use masking and filtering synonymously, but there are also [other filtering methods](https://stackoverflow.com/questions/58422690/filtering-reducing-a-numpy-array) that don't rely on masking.

<br>

<a id='mwna'></a>
# Math with Numpy Arrays

<a id='Operators'></a>
## Math Operators and Functions - Ufuncs
**Ufuncs** (short for "universal functions") are functions that operate **element-wise** on arrays.

They are called "universal" because they are able to perform a wide variety of operations on arrays of any shape or size, and are a fundamental building block of numpy's array processing capabilities.

**Ufuncs** are vectorized, meaning we offload the calculation to C, where the element-wise operations can happen for multiple elements at once. This way we don't have to rely on pythons slow looping.

In [54]:
a = np.arange(10)
b = np.arange(10)

### Mathematical operators
In mathematics we can differentiate operators and functions. Operators perform operations on objects, such as two numbers, while functions represent a relation between two objects. Some functions can also be seen as operators but not all.

#### addition

In [55]:
# same as np.add(a, 2)
a + 2

array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [58]:
a + b

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

#### substraction

In [56]:
# same as np.subtract(a, 2)
a - 2

array([-2, -1,  0,  1,  2,  3,  4,  5,  6,  7])

In [59]:
a - b

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

#### multiplication

In [57]:
a * 2

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

Note that multiplication with the `*` operator will also be performed element-wise and not as matrix multiplication.

In [64]:
a * b

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

#### division

In [65]:
a / 2

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

#### matrix multiplication
The proper operator for matrix multiplication is `@`.

In [66]:
a @ b

285

### Mathematical functions

In [73]:
np.sin(a)

array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])

In [74]:
np.exp(a)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [76]:
np.log(a[1:])

array([0.        , 0.69314718, 1.09861229, 1.38629436, 1.60943791,
       1.79175947, 1.94591015, 2.07944154, 2.19722458])

In [77]:
np.log2(a[1:])

array([0.        , 1.        , 1.5849625 , 2.        , 2.32192809,
       2.5849625 , 2.80735492, 3.        , 3.169925  ])

#### List of all **[ufuncs](https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs)**<br>


<a id='Comparison'></a>
## Comparison

### equal `==`

In [81]:
val = 1.2-1
val

0.19999999999999996

In [91]:
a = np.array([val, 2, 3, 4, 5], dtype=np.float32)
b = np.array([val, 2, 3, 8, 10], dtype=np.float64)

In [92]:
a == b

array([False,  True,  True, False, False])

Because `a` and `b` are of a different `dtype`, `val` is represented with a different "resolution"<br>
and thus it will be equal in the two arrays.

In [93]:
a[0]

0.2

In [94]:
b[0]

0.19999999999999996

To address this issue we can use `np.isclose()`

In [95]:
np.isclose(a, b)

array([ True,  True,  True, False, False])

### not equal `!=`

In [100]:
a != b

array([ True, False, False,  True,  True])

### smaller `<` greater

In [96]:
a < b

array([False, False, False,  True,  True])

In [97]:
a > b

array([ True, False, False, False, False])

### smaller/greater equal `<=` `>=`

In [98]:
a <= b

array([False,  True,  True,  True,  True])

In [99]:
a >= b

array([ True,  True,  True, False, False])

### `np.nan` != `np.nan`

In numpy, **undefined** values are not equal. 

In [101]:
np.nan == np.nan

False

to check if something is a `nan` we can use `np.isnan()`


In [119]:
np.isnan(np.nan)

True

### `all()`

To check if a given comparison is true for all array elements we can use `all()`.<br>
So we can check for example if *all* array elements are equal.

In [108]:
a = np.arange(10)
b = np.arange(10)

(a == b).all()

True

#### Watch out!

In [110]:
a = np.array([])
b = np.array([1])

In [111]:
a == b

array([], dtype=bool)

In [112]:
(a == b).all()

True

<a id='Constants'></a>
## Constants
[Numpy documentation - constants](https://numpy.org/doc/stable/reference/constants.html)

In [113]:
np.pi

3.141592653589793

In [114]:
np.e

2.718281828459045

In [116]:
np.inf > 999999999999999999999999999999999999999999999999999999

True

In [117]:
-np.inf

-inf

In [118]:
np.inf - np.inf

nan

<br>

<a id='Changing_Shape'></a>
# Changing Shape

<a id='Axis'></a>
## The `axis` argument

Many numpy functions allow you to specify an `axis` argument. In practice they are quite simple, but it can take a while to wrap your head around how they work. We will attempt a short explanation and demonstration here. For a more comprehensive guide check out this helpful tutorial by Joshua Ebner: https://www.sharpsightlabs.com/blog/numpy-axes-explained/ 

By definition, the axis number of the dimension is the index of that dimension within the array's shape. It is also the position used to access that dimension during indexing.
Axes can be interpreted as the axes of a coordinate system, or in the specific case of a 2 dimensional array, the rows and columns of a table.

In the 2D example, specifying `axis=0` applies operations "column-wise", while axis=1 applies operations "row-wise".

No matter how many dimensions an array has, you can access it's last dimension with `axis=-1`, its second to last dimension with `axis=-2` and so on.

<img src="images/npaxes.jpg" width="700"/>

Let's take a look at some 2D examples:

In [329]:
a = np.arange(9).reshape((3,3))
a

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

When specifying `axis=0` the sum will be calculated "column-wise", or "top-down":

In [330]:
#np.sum calculates the sum of all given values and will be properly introduced later
np.sum(a, axis=0)

array([ 9, 12, 15])

When specifying `axis=1` the sum will be calculated "row-wise", or "sideways":

In [331]:
np.sum(a, axis=1)

array([ 3, 12, 21])

Specifying `axis=-1` is the same as specifying the last axis of an array, `axis=1` in this case:

In [332]:
np.sum(a, axis=-1)

array([ 3, 12, 21])

An example with a three-dimensional array can be found in the [Aggregation](#Aggregation) section.

<a id='Reshape'></a>
## Reshape

With `np.reshape` we can change the shape of an existing array without changing it's contents. The shape in this context is the dimensionality of the array. <br>
We can remove or add as many dimensions as we want, as long as the specified shape is consistent with the number of entries!<br>
It can be called both as a numpy function or as a method on an array. Both do the exact same thing.

In [333]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [334]:
# The reshape function, it needs to be given an array and a shape
np.reshape(a, (5,2))

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

In [335]:
# When calling reshape as a method of an array it only needs to be given the shape
a.reshape((5,2))

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

We can specify one shape dimension with -1. In this case the value will be automatically inferred from the length of the array and remaining dimensions.

In [336]:
a.reshape((5,-1))

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

If we try to reshape a into a 3x3 matrix we will get an error because a has 10 entries, while a 3x3 matrix has only 9.

In [337]:
a.reshape((3,3))

ValueError: cannot reshape array of size 10 into shape (3,3)

If we remove the last element from a before the operation by slicing, the reshape will work just fine, creating a 3x3 matrix missing the last element of a.

In [338]:
a[:-1].reshape((3,3))

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

<a id='Removing_Dimensions'></a>
## Removing Dimensions

<a id='Flattening'></a>
### Flattening

In [339]:
a = np.ones((2,2,2))
a

array([[[1., 1.],
        [1., 1.]],

       [[1., 1.],
        [1., 1.]]])

In [340]:
a.flatten()

array([1., 1., 1., 1., 1., 1., 1., 1.])

<a id='Squeezing'></a>
### Squeezing
Sometimes we have unnecessary dimensions that we want to get rid of. In such cases we can `squeeze` the array.<br>
Squeezing the array while remove dimensions that only have one element.

In [341]:
a = np.array([[[[1, 2], [3, 4], [5, 6]]]])
a

array([[[[1, 2],
         [3, 4],
         [5, 6]]]])

In [342]:
a.shape

(1, 1, 3, 2)

In [343]:
a = a.squeeze()
a

array([[1, 2],
       [3, 4],
       [5, 6]])

In [344]:
a.shape

(3, 2)

<a id='Adding_Dimensions'></a>
## Adding Dimensions

<a id='Extending'></a>
### Extending

We can add new dimensions to an array with `np.newaxis`

In [345]:
one_dim_arr = np.arange(5)
one_dim_arr, one_dim_arr.shape

(array([0, 1, 2, 3, 4]), (5,))

In [346]:
two_dim_arr = one_dim_arr[np.newaxis, :]
two_dim_arr, two_dim_arr.shape

(array([[0, 1, 2, 3, 4]]), (1, 5))

In [347]:
two_dim_arr = one_dim_arr[:, np.newaxis]
two_dim_arr, two_dim_arr.shape

(array([[0],
        [1],
        [2],
        [3],
        [4]]),
 (5, 1))

Instead of `np.newaxis`, `None` can be used. Both are exactly equivalent, except that `np.newaxis` is more explicit.

In [348]:
print(np.newaxis is None)
two_dim_arr = one_dim_arr[:, None]
two_dim_arr, two_dim_arr.shape

True


(array([[0],
        [1],
        [2],
        [3],
        [4]]),
 (5, 1))

Adding new dimensions can be useful for example when Tensorflow is used to batch-inputs, but you want to provide a single datapoint for prediction:

In [349]:
minibatch = one_dim_arr[None, :]

# Tensorflow will internally iterate over all data of the batch, without the new axis there would be an error
for datum in minibatch:
    print("tensorflow_model.fit(datum)")

tensorflow_model.fit(datum)


<a id='Combining_Arrays'></a>
### Combining Arrays

There are many ways to combine existing arrays, like `np.append`, `np.concatenate` and `np.stack`. However, these operations always require the whole array to be copied. Therefore, it often makes more sense to allocate an array of the size you need later upfront and then just fill the respective parts.

`np.concatenate` joins a sequence of arrays along an **existing** axis.

In [350]:
a = np.arange(10)
b = np.arange(10)[::-1]
c = np.array([10])
# Needs a sequence (in this case a tuple) of array-likes
np.concatenate((a,c,b))

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10,  9,  8,  7,  6,  5,  4,
        3,  2,  1,  0])

`np.append` uses concatenation internally and is used to append one array to another:

In [351]:
# Needs exactly two array-likes
np.append(a, np.append(c, b))

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10,  9,  8,  7,  6,  5,  4,
        3,  2,  1,  0])

`np.insert` can be used to insert values along a given axis before the given indices.

In [352]:
array = np.concatenate((a,b))
print(array)
array = np.insert(array,10, c)
print(array)

[0 1 2 3 4 5 6 7 8 9 9 8 7 6 5 4 3 2 1 0]
[ 0  1  2  3  4  5  6  7  8  9 10  9  8  7  6  5  4  3  2  1  0]


For higher-dimensional arrays, other functions are useful.

`np.stack` for example joins a sequence of arrays along a **new** axis.

In [353]:
np.stack((a, b))

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]])

In [354]:
np.stack((a, b), axis=1)

array([[0, 9],
       [1, 8],
       [2, 7],
       [3, 6],
       [4, 5],
       [5, 4],
       [6, 3],
       [7, 2],
       [8, 1],
       [9, 0]])

There are also the functions `np.vstack` (vertical/row-wise-stacking) and `np.hstack` (horizontal/column-wise-stacking):

- hstack is equivalent to concatenation along the second axis, except for 1-D arrays where it concatenates along the first axis
- vstack is equivalent to concatenation along the first axis after 1-D arrays of shape (N,) have been reshaped to (1,N).

In [355]:
two_dim_arr = np.arange(16).reshape(4, -1)
two_dim_arr_2 = np.arange(16).reshape(4, -1) + 16
two_dim_arr, two_dim_arr_2

(array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]]),
 array([[16, 17, 18, 19],
        [20, 21, 22, 23],
        [24, 25, 26, 27],
        [28, 29, 30, 31]]))

In [356]:
np.stack((two_dim_arr, two_dim_arr_2), axis=1)

array([[[ 0,  1,  2,  3],
        [16, 17, 18, 19]],

       [[ 4,  5,  6,  7],
        [20, 21, 22, 23]],

       [[ 8,  9, 10, 11],
        [24, 25, 26, 27]],

       [[12, 13, 14, 15],
        [28, 29, 30, 31]]])

In [357]:
np.hstack((two_dim_arr, two_dim_arr_2))

array([[ 0,  1,  2,  3, 16, 17, 18, 19],
       [ 4,  5,  6,  7, 20, 21, 22, 23],
       [ 8,  9, 10, 11, 24, 25, 26, 27],
       [12, 13, 14, 15, 28, 29, 30, 31]])

In [358]:
np.concatenate((two_dim_arr, two_dim_arr_2), axis=1)

array([[ 0,  1,  2,  3, 16, 17, 18, 19],
       [ 4,  5,  6,  7, 20, 21, 22, 23],
       [ 8,  9, 10, 11, 24, 25, 26, 27],
       [12, 13, 14, 15, 28, 29, 30, 31]])

In [359]:
np.vstack((two_dim_arr, two_dim_arr_2))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [360]:
np.concatenate((two_dim_arr, two_dim_arr_2), axis=0)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

A quick and easy way to combine scalars and arrays is using `np.r_`, with the desired arrays, lists, or numbers in square brackets:

In [361]:
np.r_[2, 2, 2, np.arange(10), c, np.arange(10)[::-1], [0, 1, 2]]

array([ 2,  2,  2,  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10,  9,  8,  7,
        6,  5,  4,  3,  2,  1,  0,  0,  1,  2])

<a id='Repeating_Arrays'></a>
### Repeating Arrays

`np.repeat` repeats elements of an array:

In [362]:
np.repeat(3, 5)

array([3, 3, 3, 3, 3])

In [363]:
%%timeit -n100000
np.array([3]*500)

24.8 �s � 87.4 ns per loop (mean � std. dev. of 7 runs, 100,000 loops each)


In [364]:
%%timeit
np.repeat(3, 500)

5 �s � 28.7 ns per loop (mean � std. dev. of 7 runs, 100,000 loops each)


In [365]:
np.repeat([[1,2],[3,4]], 2)

array([1, 1, 2, 2, 3, 3, 4, 4])

`np.tile` is another way of repeating values using NumPy.<br>
Instead of repeating each element seperately, the whole array is repeated.

In [366]:
print('Repeat:', np.repeat([1, 2, 3], 3))
print('Tile:', np.tile([1, 2, 3], 3))

Repeat: [1 1 1 2 2 2 3 3 3]
Tile: [1 2 3 1 2 3 1 2 3]


<a id='Aggregation'></a>
## Aggregation

Aggregation functions are functions that reduce the dimensionality of an array. They provide an `axis` argument, to specify which dimension to reduce.

In [367]:
# np.random will be explained in another section
np.random.seed(1)
two_dim_arr = np.random.randint(0, high=20, size=(3, 4))
two_dim_arr

array([[ 5, 11, 12,  8],
       [ 9, 11,  5, 15],
       [ 0, 16,  1, 12]])

`np.min` returns the minimum of an array. If just the array is passed, the aggregation operation is performed over the whole array.

In [368]:
np.min(two_dim_arr)

0

The optional `axis` argument allows us to specify, which dimension should be aggregated. You can think of it as the operation being applied to all entries that are obtained by keeping the indices in all dimensions fixed except for the `axis` dimension. Let's look at the result of the minimum operation with `axis=0`:

In [369]:
np.min(two_dim_arr, axis=1)

array([5, 5, 0])

The axis concept extends to more than one dimension.

In [370]:
np.random.seed(1)
three_dim_arr = np.random.randint(0, high=20, size=(2, 3, 4))
three_dim_arr

array([[[ 5, 11, 12,  8],
        [ 9, 11,  5, 15],
        [ 0, 16,  1, 12]],

       [[ 7, 13,  6, 18],
        [ 5, 18, 11, 10],
        [14, 18,  4,  9]]])

In [371]:
np.min(three_dim_arr, axis=1)

array([[ 0, 11,  1,  8],
       [ 5, 13,  4,  9]])

In the array found in the cell output above, the entry at index `[0, 0]`, is the minimum of the following values.

In [372]:
for i in range(3):
    print(three_dim_arr[0, i, 0])

5
9
0


Let's demonstrate all axes again with another three-dimensional array:

In [373]:
a = np.array([[[2,4],[6,9]],[[3,1],[7,8]],[[4,5],[9, 0]]])
a, a.shape

(array([[[2, 4],
         [6, 9]],
 
        [[3, 1],
         [7, 8]],
 
        [[4, 5],
         [9, 0]]]),
 (3, 2, 2))

In [374]:
np.min(a)

0

In [375]:
np.min(a, axis=0)

array([[2, 1],
       [6, 0]])

Setting the axis-argument is the same as going through all other axes of the respective array in turn, returning the respective aggregate for every combination of these.

In [376]:
for i in range(a.shape[1]):
    for j in range(a.shape[2]):
        print(a[:, i, j])

[2 3 4]
[4 1 5]
[6 7 9]
[9 8 0]


For axis=1, we loop through axis 0 and axis 2:

In [377]:
a

array([[[2, 4],
        [6, 9]],

       [[3, 1],
        [7, 8]],

       [[4, 5],
        [9, 0]]])

In [378]:
np.min(a, axis=1)

array([[2, 4],
       [3, 1],
       [4, 0]])

In [379]:
for i in range(a.shape[0]):
    for j in range(a.shape[2]):
        print(a[i, :, j])

[2 6]
[4 9]
[3 7]
[1 8]
[4 9]
[5 0]


...and finally, for axis 2 we loop through axis 0 and 1

In [380]:
a

array([[[2, 4],
        [6, 9]],

       [[3, 1],
        [7, 8]],

       [[4, 5],
        [9, 0]]])

In [381]:
np.min(a, axis=2)

array([[2, 6],
       [1, 7],
       [4, 0]])

In [382]:
for i in range(a.shape[0]):
    for j in range(a.shape[1]):
        print(a[i, j, :])

[2 4]
[6 9]
[3 1]
[7 8]
[4 5]
[9 0]


The shape of the resulting array is simply the shape of the original array, leaving the specified axis out:

In [383]:
mins = []
for i in range(a.shape[0]):
    for j in range(a.shape[1]):
        mins.append(min(a[i,j,:]))
np.array(mins).reshape([a.shape[0], a.shape[1]])

array([[2, 6],
       [1, 7],
       [4, 0]])

...however, of course, using numpy is much faster than looping over the array:

In [384]:
def find_min_manual(arr):
    mins = []
    for i in range(arr.shape[0]):
        for j in range(arr.shape[1]):
            mins.append(min(arr[i,j,:]))
    np.array(mins).reshape([arr.shape[0], arr.shape[1]])

%timeit find_min_manual(a)
%timeit np.min(a, axis=2)

11.1 �s � 46.9 ns per loop (mean � std. dev. of 7 runs, 100,000 loops each)
5.76 �s � 7.81 ns per loop (mean � std. dev. of 7 runs, 100,000 loops each)


Other useful aggregation functions include:

- `np.sum`: Compute sum of elements
- `np.prod`: Compute product of elements
- `np.mean`: Compute mean of elements
- `np.std`: Compute standard deviation
- `np.var`: Compute variance
- `np.min`: Find minimum value
- `np.max`: Find maximum value
- `np.argmin`: Find index of minimum value
- `np.argmax`: Find index of maximum value
- `np.median`: Compute median of elements
- `np.percentile`: Compute rank-based statistics of elements
- `np.any`: Evaluate whether any elements are true
- `np.all`: Evaluate whether all elements are true

<a id='Broadcasting'></a>
## Broadcasting

What happens if you try to add arrays of different shapes? Numpy will try to expand the arrays according to three rules and try to make their shapes match, so the operation can be applied elementwise.

1. Rule: If the arrays have different numbers of dimensions, the smaller shape is padded with ones on its left side.<br>
Example: (5 x 3) + (3) -> (5 x 3) + (**1** x 3)
2. Rule: If the number of the dimensions matches, but the size of a dimension does not, dimensions with the size of 1 are expanded.<br>
Example: (5 x 3) + (1 x 3) -> (5 x 3) + (**5** x 3)
3. Rule: If the shapes of the arrays still defer after applying the Rule 1 and 2, a broadcasting error is raised.<br>

The figure below gives an illustration (source https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html)

<img src="images/broadcasting.png" width="500"/>

The Numpy documentation gives further insights https://numpy.org/doc/stable/user/basics.broadcasting.html

Here you can see a demonstration similar to the 2nd case in the above image:

In [385]:
a = np.arange(15).reshape(5, 3)
a

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [386]:
b = np.arange(3)
b

array([0, 1, 2])

In [387]:
print(a.shape)
print(b.shape)

a + b

(5, 3)
(3,)


array([[ 0,  2,  4],
       [ 3,  5,  7],
       [ 6,  8, 10],
       [ 9, 11, 13],
       [12, 14, 16]])

And that is what numpy does with the shapes:

In [388]:
b = np.arange(3)
print(b.shape)
b = b[np.newaxis,:] #Rule1
print(b.shape)
b = np.repeat(b,5,axis=0) #ERule2
print(b.shape)
a + b

(3,)
(1, 3)
(5, 3)


array([[ 0,  2,  4],
       [ 3,  5,  7],
       [ 6,  8, 10],
       [ 9, 11, 13],
       [12, 14, 16]])

Here is a case in which broadcasting fails because the arrays differ in both number of dimensions **and** size (Rule 3):

In [389]:
c = np.arange(4)
print(a)
print(c)
a + c

[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]
 [12 13 14]]
[0 1 2 3]


ValueError: operands could not be broadcast together with shapes (5,3) (4,) 

<br>

<a id='Miscellaneous'></a>
# Miscellaneous

<br>

<a id='Random'></a>
## Random Number Generation

NumPy has module `numpy.random` which provides us with options for generating pseudo-random numbers. For instance, there are various distributions to sample from. It has options for creating distributions as well.

Below we show some of them with examples.

For more information, please refer to the documentations. https://numpy.org/doc/stable/reference/random/index.html#random-sampling-numpy-random

Generates a **random sample** from a given** 1-D **array:

- `random.choice(a, size=None, replace=True, p=None)`

In [390]:
import numpy as np

select = ["pizza", "burger", "shakes", "fries", "salad"]
np.random.choice(select, 3, replace=False)

array(['salad', 'pizza', 'shakes'], dtype='<U6')

Return **random integers** from low (inclusive) to high (exclusive).

- `np.random.randint(low, high=None, size=None, dtype=int)`

In [391]:
np.random.randint(0,10,3)

array([6, 9, 9])

Draw samples from a **uniform distribution**.

- `random.uniform(low=0.0, high=1.0, size=None)`

In [392]:
np.random.uniform(0, 5, size=10)  

array([4.01378752, 0.46400404, 2.59076274, 4.32510126, 4.14573454,
       4.1480168 , 1.36524987, 0.29621601, 3.3526402 , 2.96532759])

Draw **random samples** from a normal (Gaussian) distribution.

- `random.normal(loc=0.0, scale=1.0, size=None)`

In [393]:
np.random.normal(0, 1, size=10)

array([-0.89191376,  1.73559653, -0.63110833, -0.90734336,  0.37475197,
       -0.47338275, -0.45452082, -0.08533806,  1.50318838,  1.16064112])

If we need a number of **random values that sum up to 1** we can draw samples from a Dirichlet distribution.
- `random.dirichlet(alpha, size=None)`

In [394]:
# Each of the 10 rows will sum up to exactly 1, while the average of each column should be 0.5 (over infinite samples)
np.random.dirichlet((1, 1), 10)

array([[0.37501826, 0.62498174],
       [0.76726182, 0.23273818],
       [0.86234396, 0.13765604],
       [0.94835706, 0.05164294],
       [0.89495653, 0.10504347],
       [0.50758189, 0.49241811],
       [0.28799958, 0.71200042],
       [0.46718013, 0.53281987],
       [0.9120397 , 0.0879603 ],
       [0.25625963, 0.74374037]])

In some cases, you may want the generated random numbers to be the same every time you run your program. In that case, you can set a seed like this before generating.

In [395]:
# If using seed(any number) before producing random numbers, 
# it will produce the same number all the time. 
np.random.seed(1000)
np.random.uniform(0, 1, size=10)

array([0.65358959, 0.11500694, 0.95028286, 0.4821914 , 0.87247454,
       0.21233268, 0.04070962, 0.39719446, 0.2331322 , 0.84174072])

It is also possible to shuffle an array in place, with numpy.random.shuffle.

If we want to return a copy of the array shuffled, numpy.random.permutation gets the job done!

In [396]:
array1 = np.random.randint(0,20,10)
print("the original array:")
print(array1)

np.random.shuffle(array1)
print("the original array after shuffling:")
print(array1)

the original array:
[10  9  4 18  9 18 13 18 10 14]
the original array after shuffling:
[18 13 14  9  4 10 10 18  9 18]


In [397]:
array1 = np.random.randint(0,20,10)
print("the original array:")
print(array1)

array_2 = np.random.permutation(array1)
print("the original array after permutation:")
print(array1)

print("the shuffled array:")
print(array_2)

the original array:
[ 7 16 11 17  0 13 18  1 11 14]
the original array after permutation:
[ 7 16 11 17  0 13 18  1 11 14]
the shuffled array:
[ 0 11 11  7 17 14 18 13 16  1]
