--------------
## statistical functions
-----------------------

NumPy has quite a few useful statistical functions for finding minimum, maximum, percentile standard deviation and variance, etc. from the given elements in the array. 

#### Order

In [1]:
import numpy as np 

`numpy.amin()`

These functions return the minimum from the elements in the given array along the specified axis.

In [2]:
arr = np.array([11, 12, 13, 14, 15, 16, 17, 15, 11, 12, 14, 15, 16, 17])

In [3]:
np.amin(arr)

11

Find index of minimum value :

In [5]:
result = np.where(arr == np.amin(arr))

print('Returned tuple of arrays :', result)
print('List of Indices of minimum element :', result[0])

Returned tuple of arrays : (array([0, 8], dtype=int64),)
List of Indices of minimum element : [0 8]


`Find minimum value & it’s index in a 2D Numpy Array`

In [6]:
a = np.array([[3,7,5], 
              [8,4,3], 
              [2,4,9]]) 
a

array([[3, 7, 5],
       [8, 4, 3],
       [2, 4, 9]])

In [7]:
np.amin(a)

2

In [8]:
np.amin(a, axis=1)

array([3, 3, 2])

In [9]:
np.amin(a, axis=1, keepdims=True) 

array([[3],
       [3],
       [2]])

In [10]:
np.amin(a, axis=0, keepdims=True) 

array([[2, 4, 3]])

In [11]:
np.amin(a, keepdims=True) 

array([[2]])

In [12]:
np.amax(a, axis = 0, keepdims=True)

array([[8, 7, 9]])

In [13]:
np.amax(a, keepdims=True)

array([[9]])

`numpy.amin() & NaN`

- numpy.amin() propagates the NaN values i.e. if there is a NaN in the given numpy array then numpy.amin() will `return NaN as minimum` value. For example,

In [14]:
arr    = np.array([11, 12, 13, 14, 15], dtype=float)
arr[3] = np.NaN

print('min element from Numpy Array : ', np.amin(arr))

min element from Numpy Array :  nan


`numpy.nanmin`

- Return minimum of an array or minimum along an axis, `ignoring any NaNs`. 
- When all-NaN slices are encountered a RuntimeWarning is raised and Nan is returned for that slice.

In [15]:
a = np.array([[1, 2], 
              [3, np.nan]])
np.nanmin(a)

1.0

In [16]:
np.nanmin(a, axis=0)

array([1., 2.])

In [17]:
np.nanmin(a, axis=1)

array([1., 3.])

When positive infinity and negative infinity are present:

In [18]:
np.nanmin([1, 2, np.nan, np.inf])

1.0

In [19]:
np.nanmin([1, 2, np.nan, np.NINF])

-inf

`numpy.nanmax`
- Return the maximum of an array or maximum along an axis, `ignoring any NaNs`. 

- When all-NaN slices are encountered a RuntimeWarning is raised and NaN is returned for that slice.

In [20]:
a = np.array([[1, 2], 
              [3, np.nan]])
np.nanmax(a)

3.0

In [21]:
np.nanmax(a, axis=0)

array([3., 2.])

In [22]:
np.nanmax(a, axis=1)

array([2., 3.])

`When positive infinity and negative infinity are present:`

In [23]:
np.nanmax([1, 2, np.nan, np.NINF])

2.0

In [24]:
np.nanmax([1, 2, np.nan, np.inf])

inf

`numpy.ptp()`

- The name of the function comes from the acronym for ‘peak to peak’.

- The numpy.ptp() function returns the `range` (maximum-minimum) of values along an axis.

In [28]:
x = np.array([[4, 9, 2, 10],
              [6, 9, 7, 12]])

In [29]:
np.ptp(x, keepdims=True) 

array([[10]])

In [30]:
np.ptp(x, axis = 1, keepdims=True) 

array([[8],
       [6]])

In [31]:
np.ptp(x, axis = 0, keepdims=True) 

array([[2, 0, 5, 2]])

`numpy.percentile()`


Percentile (or a centile) is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. 

The function numpy.percentile() takes the following arguments.

- numpy.percentile(a, q, axis)
    - a Input array
    - q The percentile to compute must be between 0-100
    - axis - The axis along which the percentile is to be calculated

Returns the q-th percentile(s) of the array elements.

In [38]:
a = np.array([[3,7,5], 
              [8,4,3], 
              [2,4,9]]) 
a

array([[3, 7, 5],
       [8, 4, 3],
       [2, 4, 9]])

In [39]:
np.percentile(a, 50, keepdims=True) 

array([[4.]])

In [41]:
np.percentile(a, 20, axis = 1, keepdims=True) 

array([[3.8],
       [3.4],
       [2.8]])

In [43]:
np.percentile(a, 25, axis=0, keepdims=True)

array([[2.5, 4. , 4. ]])

`numpy.nanpercentile`

- Compute the qth percentile of the data along the specified axis, while `ignoring nan` values.
- Returns the qth percentile(s) of the array elements.

In [44]:
a = np.array([[10., 7., 4.], [3., 2., 1.]])
a[0][1] = np.nan
a

array([[10., nan,  4.],
       [ 3.,  2.,  1.]])

In [45]:
np.percentile(a, 50)

  interpolation=interpolation)


nan

In [46]:
np.nanpercentile(a, 50)

3.0

In [47]:
np.nanpercentile(a, 50, axis=0)

array([6.5, 2. , 2.5])

In [48]:
np.nanpercentile(a, 50, axis=1, keepdims=True)

array([[7.],
       [2.]])

`numpy.quantile`

- Compute the q-th quantile of the data along the specified axis.

- `Quartiles` are also `quantiles`; they divide the distribution into 4 equal parts. 
- `Percentiles` are `quantiles` that divide a distribution into 100 equal parts and 
- `decile`s are `quantiles` that divide a distribution into 10 equal parts.

Authors refer to the median as the 0.5 quantile, which means that the proportion 0.5 (half) will be below the median and 0.5 will be above it. 

This way of defining quartiles makes sense if you are trying to find a particular quantile in a data set (i.e. the median). Use the following formula to estimate the ith observation:

ith observation = q (n + 1)
where q is the quantile, the proportion below the ith value that you are looking for
n is the number of items in a data set

**Sample question**: Find the number in the following set of data where 20 percent of values fall below it, and 80 percent fall above:
1 3 5 6 9 11 12 13 19 21 22 32 35 36 45 44 55 68 79 80 81 88 90 91 92 100 112 113 114 120 121 132 145 146 149 150 155 180 189 190

- Step 1: Order the data from `smallest to largest`. 
- Step 2: Count how many observations you have in your data set. this particular data set has 40 items.
- Step 3: Convert any percentage to a decimal for “q”. We are looking for the number where 20 percent of the values fall below it, so convert that to .2.
- Step 4: Insert your values into the formula:

ith observation = q (n + 1)
ith observation = .2 (40 + 1) = 8.2

**Answer**: The ith observation is at 8.2, so we round down to 8 (remembering that this formula is an estimate). The 8th number in the set is 13, which is the number where 20 percent of the values fall below it.

In [50]:
a = np.array([[10, 7, 4], 
              [3, 2, 1]])
a

array([[10,  7,  4],
       [ 3,  2,  1]])

In [51]:
np.quantile(a, 0.5)

3.5

In [52]:
np.quantile(a, 0.5, axis=0)

array([6.5, 4.5, 2.5])

In [53]:
np.quantile(a, 0.5, axis=1, keepdims=True)

array([[7.],
       [2.]])

`numpy.nanquantile`

- Compute the qth quantile of the data along the specified axis, while `ignoring nan values`. Returns the qth quantile(s) of the array elements.

In [54]:
a = np.array([[10., 7., 4.], [3., 2., 1.]])
a[0][1] = np.nan
a

array([[10., nan,  4.],
       [ 3.,  2.,  1.]])

In [55]:
np.quantile(a, 0.5)

nan

In [56]:
np.nanquantile(a, 0.5)

3.0

In [57]:
np.nanquantile(a, 0.5, axis=1, keepdims=True)

array([[7.],
       [2.]])