# 7 Statistics with NumPy
## 7_3 Statitical Order Functions wth NumPy

#### numpy.ptp(a, axis=None, out=None, keepdims=<no value>)
- Range of values (maximum - minimum) along an axis.
- The name of the function comes from the acronym for ‘peak to peak’.

#### numpy.percentile(a, q, axis=None, out=None, overwrite_input=False, method='linear', keepdims=False, *, weights=None, interpolation=None)
- Compute the q-th percentile of the data along the specified axis.
- Returns the q-th percentile(s) of the array elements.
- El percentil es una medida de posición usada en estadística que indica, una vez ordenados los datos de menor a mayor, el valor de la variable por debajo del cual se encuentra un porcentaje dado de observaciones en un grupo

#### numpy.quantile(a, q, axis=None, out=None, overwrite_input=False, method='linear', keepdims=False, *, weights=None, interpolation=None)
- Compute the q-th quantile of the data along the specified axis.

In [54]:
import numpy as np
np.__version__

'2.1.1'

In [55]:
# Functions

def show_attr(arrnm: str) -> str:
    strout = f' {arrnm}: '

    for attr in ('shape', 'ndim', 'size', 'dtype'):     #, 'itemsize'):
            arrnm_attr = arrnm + '.' + attr
            strout += f'| {attr}: {eval(arrnm_attr)} '

    return strout

In [56]:
A = np.array([[1,0,0,3,1],
              [3,6,6,2,9],
              [4,5,3,8,0]])

print(show_attr('A'))
A

 A: | shape: (3, 5) | ndim: 2 | size: 15 | dtype: int64 


array([[1, 0, 0, 3, 1],
       [3, 6, 6, 2, 9],
       [4, 5, 3, 8, 0]])

In [57]:
# np.ptp() |  max() - min()
#   Convenient and time saving using only one funct. instead of two
display(np.ptp(A))          # Whole matrix
display(np.ptp(A[0,:]))     # 1st row
display(np.ptp(A, axis=1))  # .ptp() for each row
display(np.ptp(A, axis=0))  # .ptp() for each column
show_attr('np.ptp(A, axis=0)')

np.int64(9)

np.int64(3)

array([3, 7, 8])

array([3, 6, 6, 6, 9])

' np.ptp(A, axis=0): | shape: (5,) | ndim: 1 | size: 5 | dtype: int64 '

In [58]:
ptp_rows = np.ptp(A, axis=1)
print(show_attr('ptp_rows') + '\n')
ptp_rows

 ptp_rows: | shape: (3,) | ndim: 1 | size: 3 | dtype: int64 



array([3, 7, 8])

In [59]:
# Percentile: a value that is grater than the corresponding % of
# the dataset. Ex. the 70-th percentile is grater than the 70%
# of the data. Or we're 70% through a sorted version of the array
# (in increasing order)

print(show_attr('A'))               # size = num of elements
A_sorted = np.sort(A, axis=None)    # Sorting the array in a flatten view
A_sorted

 A: | shape: (3, 5) | ndim: 2 | size: 15 | dtype: int64 


array([0, 0, 0, 1, 1, 2, 3, 3, 3, 4, 5, 6, 6, 8, 9])

In [60]:
# Calculate the 70-th percentile
np.percentile(A, 70)

np.float64(4.799999999999999)

In [61]:
# Let's check: we have 15 elements; 70-th percentile = 15 * 70 / 100
position_70th = 15 * 70 / 100
print(position_70th)                # 10.5

# Somewhere between 10th and 11th elements of the flattened array
print(A_sorted[9], A_sorted[10])    # Elements in pos 10 and 11

10.5
4 5


In [62]:
# The percentile is somewhere in between. In cases like this where
# the percentile is between two values of the dateset, it's up to
# us to decide haw to compute it. The default for this funct. is to
# use linear interpolation which creates a number that is some fraction
# of the way between (in this case), 4 and 5.

# Outcome halfway between the two values:
np.percentile(A, 70, interpolation='midpoint')

np.float64(4.5)

In [63]:
# Choosing  from the dataset
display(np.percentile(A, 70, interpolation='lower'))
np.percentile(A, 70, interpolation='higher')

np.int64(4)

np.int64(5)

In [64]:
# Closer to an existing value on the dataset
np.percentile(A, 70, interpolation='nearest')
# The recommended. Value exist and result is the same dtype

np.int64(5)

In [65]:
# 50-th percentile is the median of the set.
# Median: value positioned in the middle of a sorted dataset
display(np.percentile(A, 50))
display(np.median(A))
np.percentile(A, 50, interpolation='nearest')

np.float64(3.0)

np.float64(3.0)

np.int64(3)

In [66]:
# 0-th or 100-th: the minimum and the maximum respectively.
display(np.percentile(A, 0, interpolation='nearest'))
display(np.min(A))
display(np.percentile(A, 100, interpolation='nearest'))
display(np.max(A))

np.int64(0)

np.int64(0)

np.int64(9)

np.int64(9)

In [67]:
# Quantiles: express what part of the set we've covered
# A value tha is grater than the corresponding part of the dataset
display(np.percentile(A, 70))
display(np.quantile(A, 0.7))
display(np.percentile(A, 70, interpolation='nearest'))
display(np.quantile(A, 0.7, interpolation='nearest'))

np.float64(4.799999999999999)

np.float64(4.799999999999999)

np.int64(5)

np.int64(5)

Statistics - Order statistics

ptp(a[, axis, out, keepdims])
Range of values (maximum - minimum) along an axis.

percentile(a, q[, axis, out, ...])
Compute the q-th percentile of the data along the specified axis.

nanpercentile(a, q[, axis, out, ...])
Compute the qth percentile of the data along the specified axis, while ignoring nan values.

quantile(a, q[, axis, out, overwrite_input, ...])
Compute the q-th quantile of the data along the specified axis.

nanquantile(a, q[, axis, out, ...])
Compute the qth quantile of the data along the specified axis, while ignoring nan values.