<div class="licence">
<span>Licence CC BY-NC-ND</span>
<span>Valérie Roy</span>
<span><img src="../media/ensmp-25-alpha.png" /></span>
</div>

In [1]:
import numpy as np

# aggregating array values

## *numpy* functions can **combine** elements along the specified axis

| function | behavior|
|------|-----|
| *numpy.sum* | sums elements over an axis|
| *numpy.min* | returns the smallest element|
| *numpy.argmin* | returns the index of the smallest element|
| *numpy.mean*| computes the means of the elements|
| *numpy.std*  | computes the standard deviation|
| .../...| .../... |

## suming elements along rows axis

we create a *3* rows and *4* columns **matrix**

In [3]:
a = np.random.randint(0, 10, size=(3, 4))
a

array([[2, 9, 6, 4],
       [6, 3, 5, 7],
       [0, 7, 5, 4]])

summing returns the global sum

In [4]:
a.sum()

58

## suming elements along the rows axis
   - it will be the same for the other operations (product, division, ...)   

In [11]:
# we create a 3 rows and 4 columns matrix
a = np.random.randint(0, 10, size=(3, 4))
a

array([[1, 9, 6, 2],
       [4, 0, 6, 0],
       [9, 2, 2, 1]])

summing along the **rows** axis (0 in our example) is summing the **rows** together

In [12]:
a.sum(axis=0)

array([14, 11, 14,  3])

## suming elements along the columns axis

In [15]:
# we create a 3 rows and 4 columns matrix
a = np.random.randint(0, 10, size=(3, 4))
a

array([[9, 1, 3, 2],
       [9, 1, 5, 0],
       [2, 6, 3, 9]])

summing along the **columns** axis (1 in our example) is summing the **columns** together

In [16]:
np.sum(a, axis=1)

array([15, 15, 20])

## summing matrices (axis 0)

In [23]:
# we create two matrices of 3 rows and 4 columns matrix
a = np.random.randint(0, 50, size=(2, 3, 4))
a

array([[[14,  0, 12,  2],
        [13, 43, 23,  6],
        [40, 17, 13, 18]],

       [[11, 45, 18, 48],
        [31, 48, 49, 18],
        [18, 18,  7, 38]]])

summing along **axis** *0* is **summing** the two arrays together 

In [28]:
np.sum(a, axis=0) # we sum the arrays

3.83 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


## summing along axis 1 with several matrices

In [56]:
# we create two matrices of size (2 x 2)
a = np.random.randint(0, 16,
                      size=(2, 2, 2))
a

array([[[ 7, 14],
        [ 3,  2]],

       [[10,  7],
        [ 2, 14]]])

summing along the **axis** *1* is summing along the **rows** of each array 
   - i.e. we obtain one **row** per array
   - they form a new array

In [57]:
b = np.sum(a, axis=1)
b

array([[10, 16],
       [12, 21]])

## summing along axis 2 with several matrices

In [64]:
# we create two matrices of size (2 x 2)
a = np.random.randint(0, 16,
                      size=(2, 2, 3))
a

array([[[ 2,  8,  2],
        [ 6,  0,  3]],

       [[14,  1,  5],
        [ 5, 13, 15]]])

summing along **axis** *2* is summing along the **columns** of each array  
   - i.e. we obtain one **column** per array
   - they form a new array

In [65]:
b = np.sum(a, axis=2)
b

array([[12,  9],
       [20, 33]])

## summing without axis when we have several matrices

In [68]:
a = np.random.randint(0, 50, size=(2, 3, 4))
a

array([[[ 2, 32,  0, 34],
        [ 0, 24, 30, 17],
        [13,  7,  7,  9]],

       [[14, 15,  2,  0],
        [ 6, 27, 13, 22],
        [37, 48, 48, 27]]])

it sums all the elements

In [69]:
np.sum(a)

434

## summing in presence of *numpy.nan* values
   - it will be the same for the other functions

**classic functions** have their **NaN-safe** counterpart:

   - *numpy.nansum*, *numpy.nanprod*, ...
   - *numpy.nanmean*, *numpy.nanstd*, *numpy.nanvar*, ...
   - *numpy.nanmin*, *numpy.nanmax*, ...
   - *numpy.nanmedian*, *numpy.nanpercentile*, ...
   
where *numpy.nan* can be **replaced** by a given value to avoid NaN contagion

### NaN is dominant in classic operations
   - remember that **only** **float** values can be **NaN**

In [75]:
# we create tow matrices of type float
a = np.random.randint(0, 50,
                      size=(2, 3, 4))
a = a.astype(float)
a[0]

array([[11., 14., 25., 22.],
       [29.,  3., 31.,  1.],
       [34.,  0., 36., 13.]])

In [76]:
# we insert nan values
a[0, 1, 0] = np.nan
a[0, 2, 2] = np.nan
a[1, 0, 0] = np.nan
a[1, 1, 3] = np.nan
a[1]

array([[nan,  5., 40., 35.],
       [14., 11., 21., nan],
       [37., 24., 39., 34.]])

 you can see that *numpy.nan* is **dominant** (**contagious**)

In [77]:
np.sum(a) # the result is NaN

nan

## NaN-safe function (matrices axis)
   - *numpy.nan* values are replaced by *0*

In [118]:
a

array([[[11., 14., 25., 22.],
        [nan,  3., 31.,  1.],
        [34.,  0., nan, 13.]],

       [[nan,  5., 40., 35.],
        [14., 11., 21., nan],
        [37., 24., 39., 34.]]])

In [119]:
np.nansum(a) # no axis

414.0

**axis=0** matrices are added together

In [120]:
np.nansum(a, axis=0)

array([[11., 19., 65., 57.],
       [14., 14., 52.,  1.],
       [71., 24., 39., 47.]])

## NaN-safe function (rows axis)
   - **NaN** is treated as **zero**

In [125]:
a

array([[[11., 14., 25., 22.],
        [nan,  3., 31.,  1.],
        [34.,  0., nan, 13.]],

       [[nan,  5., 40., 35.],
        [14., 11., 21., nan],
        [37., 24., 39., 34.]]])

on **axis=1** **rows** of each matrice are **added**

In [126]:
np.nansum(a, axis=1)

array([[ 45.,  17.,  56.,  36.],
       [ 51.,  40., 100.,  69.]])

## NaN-safe function (columns axis)

In [127]:
a

array([[[11., 14., 25., 22.],
        [nan,  3., 31.,  1.],
        [34.,  0., nan, 13.]],

       [[nan,  5., 40., 35.],
        [14., 11., 21., nan],
        [37., 24., 39., 34.]]])

on **axis=2** the **columns** of the two arrays are **added**


In [128]:
np.nansum(a, axis=2)

array([[ 72.,  35.,  47.],
       [ 80.,  46., 134.]])

## index of min and  max values
   - *numpy.argmax*, *numpy.argmin*

In [157]:
a = np.random.randint(0, 100, 15).reshape(3, 5)
a

array([[90,  3, 59, 81, 71],
       [28, 37, 60, 54, 51],
       [62, 96, 19, 79, 79]])

the indice is given on the **flattened** array

In [158]:
print( np.argmin(a) )# the index of the min
a.flatten().argmin() # the same

1


1

## tests on all values
   - *numpy.all* returns *True* if **all** values are *True*
   - *numpy.any* returns *True* if **any** value is *True*
   - *np.where(cond, x, y)* returns *x* or *y* depending on the condition
   - (https://docs.scipy.org/doc/numpy/reference/generated/numpy.where.html)
   
   - they have no NaN-safer counterpart

### all or any values

In [171]:
a = np.random.randint(0, 100,
                      15).reshape(3, 5)
a

array([[52, 78, 58, 34, 54],
       [ 3,  5, 38,  9,  8],
       [62,  6, 56, 10, 16]])

In [176]:
# a boolean mask
a <= 50

array([[False, False, False,  True, False],
       [ True,  True,  True,  True,  True],
       [False,  True, False,  True,  True]])

In [177]:
np.any(a <= 50)

True

In [178]:
np.all(a <= 100)

True

where

In [179]:
a = np.random.randint(0, 100, 15).reshape(3, 5)
a

array([[95, 77, 12, 43, 34],
       [76, 49, 66, 69, 26],
       [40, 57, 33, 32, 29]])

In [181]:
np.where(a<50, -2*a, 3*a)

array([[285, 231, -24, -86, -68],
       [228, -98, 198, 207, -52],
       [-80, 171, -66, -64, -58]])