In [1]:
%run ../../common/import_all.py

from common.setup_notebook import set_css_style, setup_matplotlib, config_ipython
config_ipython()
setup_matplotlib()
set_css_style()

# Quantiles and mode of a distribution

## Quantiles

Quantiles are the values which divide a probability distribution into equally populated sets, how many, you decide. As special types of quantiles you got

* *deciles*: 10 sets, so the first decile is the value such that 10% of the observations are smaller and the tenth decile is the value such that 90% of the observations are smaller 
* *quartiles*: 4 sets, so the first quartile is such that 25% of the observatins are smaller
* *percentile*: 100 sets, so the first percentile is such that 1% of the observations are smaller

The second quartile, corresponding to the fifth decile and to the fiftieth percentile, is kind of special and is called the *median*. 

This all means you can use the percentile everywhere as it's the most fine-grained one, and calculate the other splits from them. This is in fact what Numpy does, for this reason, and we'll see it below.

### Trying them out

Let's extract 1000 numbers from a given distribution and let's compute the quartiles. We use `numpy.percentile(array, q=[0, 25, 50, 75, 100])`. Note that the quartile 0 and the quartile 100 correspond respectively to the minimum and maximum of the data.

#### On a uniform distribution, between 0 and 1

In [11]:
u = np.random.uniform(size=1000)

np.percentile(u, q=[0, 25 , 50, 75, 100])
min(u), max(u)

array([ 0.00525287,  0.27590627,  0.52032294,  0.74847448,  0.99746279])

(0.0052528727651390827, 0.99746278984270464)

#### On a standard gaussian (mean 0, std 1)

Note the median is the mean, that is, 0. It won't be precisely, because of finite size effect.

In [12]:
g = np.random.normal(size=1000)

np.percentile(g, q=[0, 25 , 50, 75, 100])

array([-3.23114861, -0.61061122,  0.01119369,  0.75210898,  3.94790946])

#### On a power law with exponent -0.3

Can see that they span orders of magnitude.

In [13]:
p = np.random.power(0.7, size=1000)

np.percentile(p, q=[0, 25 , 50, 75, 100])

array([  1.66067839e-05,   1.53384509e-01,   3.90738210e-01,
         6.68819197e-01,   9.97947347e-01])

## Mode

The mode of a distribution is the most probable (frequent) value.