# Aggregations: Min, Max, and Everything In Between

- np.sum
- np.max, np.min
- a list of common used operators
- axis parameter for multiple dimensional arrays
- Nan-safe operator



In [1]:
import numpy as np

In [3]:
L = np.random.random(100)

## Summing the Values in an Array
The syntax is quite similar to that of NumPy's ``sum`` function, and the result is the same in the simplest case:

In [6]:
# python's built-in sum
sum(L)

48.05124096976659

In [8]:
# Numpy's sum, which is much more faster
np.sum(L)

48.05124096976655

the ``sum`` function and the ``np.sum`` function are not identical, which can sometimes lead to confusion!
In particular, their optional arguments have different meanings, and ``np.sum`` is aware of multiple array dimensions, as we will see in the following section.

## Minimum and Maximum

In [9]:
# Python has built-in ``min`` and ``max`` functions
min(L), max(L)

(0.021602586383004385, 0.9968274618333506)

In [11]:
# NumPy's corresponding functions have similar syntax, and again operate much more quickly:
np.min(L), np.max(L)

(0.021602586383004385, 0.9968274618333506)

In [14]:
# a shorter syntax is to use methods of the array object itself
L.min(), L.max(), L.sum()

(0.021602586383004385, 0.9968274618333506, 48.05124096976655)

Whenever possible, make sure that you are using the NumPy version of these aggregates when operating on NumPy arrays!

### Multi dimensional aggregates

One common type of aggregation operation is an aggregate along a row or column.
Say you have some data stored in a two-dimensional array:

In [15]:
M = np.random.random((3, 4))
M

array([[0.37064414, 0.20460817, 0.34480001, 0.38189218],
       [0.74132566, 0.95678387, 0.28139537, 0.69742444],
       [0.35525932, 0.51609444, 0.6873333 , 0.30839767]])

In [16]:
#By default, each NumPy aggregation function will return the aggregate over the entire array:
M.sum()

5.845958569198249

In [21]:
# Aggregation functions take an additional argument specifying the *axis* along which the aggregate is computed. 
M.min(axis=0) 

array([0.35525932, 0.20460817, 0.28139537, 0.30839767])

In [20]:
M.max(axis=1) 

array([0.38189218, 0.95678387, 0.6873333 ])

The way the axis is specified here can be confusing to users coming from other languages.
The ``axis`` keyword specifies the *dimension of the array that will be collapsed*, rather than the dimension that will be returned.
So specifying ``axis=0`` means that the first axis will be collapsed: for two-dimensional arrays, this means that values within each column will be aggregated.

In [22]:
M = np.random.random((3, 4, 5))
M

array([[[0.28018268, 0.26398575, 0.49850171, 0.81347134, 0.61918661],
        [0.1316305 , 0.75412693, 0.89338259, 0.3731961 , 0.2714668 ],
        [0.15762812, 0.87787034, 0.35555938, 0.93384908, 0.13256703],
        [0.41133287, 0.77819104, 0.44874782, 0.20420692, 0.52240942]],

       [[0.79043629, 0.27672625, 0.11009067, 0.44593767, 0.25287065],
        [0.90181313, 0.7823179 , 0.41303956, 0.5341588 , 0.46826841],
        [0.58558087, 0.52280177, 0.63099564, 0.94541878, 0.6785837 ],
        [0.67754742, 0.62064587, 0.29521517, 0.87010937, 0.56516432]],

       [[0.4390283 , 0.48985159, 0.36898575, 0.46279144, 0.88682328],
        [0.60216921, 0.32293493, 0.21183113, 0.39589281, 0.85290958],
        [0.21901199, 0.12622211, 0.28230671, 0.86773126, 0.80104081],
        [0.7417255 , 0.39396194, 0.21979067, 0.46686985, 0.44798182]]])

In [25]:
M.min(axis=0).shape

(4, 5)

In [26]:
M.min(axis = (0, 1)).shape

(5,)

### Other aggregation functions

NumPy provides many other aggregation functions, but we won't discuss them in detail here.
Additionally, most aggregates have a ``NaN``-safe counterpart that computes the result while ignoring missing values, which are marked by the special IEEE floating-point ``NaN`` value (for a fuller discussion of missing data, see [Handling Missing Data](03.04-Missing-Values.ipynb)).
Some of these ``NaN``-safe functions were not added until NumPy 1.8, so they will not be available in older NumPy versions.

The following table provides a list of useful aggregation functions available in NumPy:

|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |

We will see these aggregates often throughout the rest of the book.