In [1]:
%run ../common/import_all.py

from common.setup_notebook import set_css_style, setup_matplotlib, config_ipython
config_ipython()
setup_matplotlib()
set_css_style()

# (Some of) the most famous distributions

## Bernoulli

Let's consider a binary variable $X \in \{0,1\}$, so that it can take the two values 1 (which we'll call the *success*) or 0 (which we'll call the *failure*). The prototype of this is the result of flipping of a coin. Let's also call $\mu$ the probability of the success so that, by definition

$$
P(X=1) = \mu \ ; P(X=0) = 1 - \mu \ ,
$$

so that the [pmf](probfunctions-histogram.ipynb#The-PMF) (it is a discrete variable) can be expressed as

$$
p(x;\mu) = \mu^x(1-\mu)^{1-x}
$$

because when we have $x=1$ we are left with $\mu$ and when we have $x=0$ we are left with $1-\mu.$

Such distribution has  expected value

$$
\mathbb{E}[X] = \sum_{x \in \{0,1\}} x \mu^x(1-\mu)^{1-x} = 0 + 1\mu(1-\mu)^0 = \mu
$$

and variance

$$
Var[X] =  \sum_{x \in \{0,1\}} x^2 \mu^x(1-\mu)^{1-x} - \mu^2 = \mu - \mu^2 = \mu(1-\mu)
$$

The Bernoulli distribution is a special case of a binomial distribution for a single observation, see below!

## Binomial

The binomial distribution describes the probability of observing $k$ occurrences of $x=1$ in a set of $n$ samples from a Bernoulli distribution. $\mu$ is the probability of  observing $x=1$. The pmf will be then

$$
p(x;\mu) = {{n}\choose{k}} \mu^k (1-\mu)^{n-k} \ ,
$$

because we have ${{n}\choose{k}}$ ways of creating groups of $k$ from $n$ values and because each extraction is a Bernoulli.

The expected value is

$$
\mathbb{E}[X] = n \mu
$$

and the variance is

$$
Var[X] =  n \mu (1- \mu)
$$

Head to [Wikipedia](https://en.wikipedia.org/wiki/Binomial_distribution) for the proofs.

## Multinomial

It is a multivariate generalisation of the binomial and gives the distribution over counts $m_k$ for a $k$-state discrete variable to be in state $k$ given a total of observations $n$.

An example is the extraction of $n$ balls of $k$ different colours from a bag, replacing the extracted ball after each draw. The pmf reads

$$
p(m_1, m_2, \ldots, m_k, \mu_1, \mu_2, \ldots, \mu_k, n) =  {{n}\choose{m_1 m_2 \ldots m_k}}  \mu_1^{m_1} \mu_2^{m_2} \ldots \mu_k^{m_k}
$$

and we have

$$
\mathbb{E}[m_k]  = n \mu_k \ ,
$$

$$
Var[m_k] = n \mu_k(1-\mu_k)
$$

## Uniform

Given a continuous variable $X$ taking values in interval $\in [a,b]$, a uniform distribution is one where every possible value has the same probability. Its pdf is simply

$$
p(x) = \frac{1}{b-a} \ ,
$$

because you have 1 case over the total possible cases, which is the width of the interval.

The expected value is 

$$
\mathbb{E}[X] = \int_a^b \text{d} x \ \frac{1}{b-a} = \frac{b+a}{2} \ ,
$$

which, as expected (!), corresponds to the middle point of the interval because given that every point is equiprobable, this is where we fall by averaging values.

The variance is 

$$
\begin{align}
Var[X] &= \int_a^b \text{d} x \ x \Big(x - \frac{1}{b-a}\Big)^2  \\
&= \int_a^b \text{d} x \ x^3 - 2x^2\frac{1}{b-a} + \frac{x}{(b-a)^2} \\ 
&= \frac{b^4 - a^4}{4} - \frac{2}{3} \frac{b^3 - a^3}{b-a} + \frac{b^2-a^2}{2(b-a)^2} \\
&= \frac{(b^2 - a^2)(b^2 + a^2)}{4} - \frac{2}{3} \frac{(b-a)(b^2 + ab + a^2)}{b-a} + \frac{b+a}{2(b-a)} \\
& = \ \frac{(b-a)^2}{12} \ .
\end{align}
$$

## Gaussian

The gaussian distribution (after C F Gauss) is also called a normal distribution or, in some cases, bell curve (from its shape). Let $\mu$ be the expected value and $\sigma$ the standard deviation,

$$
p(x; \mu, \sigma) =  \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{1}{2 \sigma^2} (x-\mu)^2}
$$

It is usually indicated as $\mathcal N(\mu, \sigma^2)$, where the $\mathcal N$ stands for "normal", another name for the gaussian

## Beta

Given a continuous variable $x \in [0,1]$, the distribution is parametrized by  $\alpha, \beta > 0$ which define its shape.

$$
p(x; \alpha, \beta) =  \mathcal{N} x^{\alpha-1}(1-x)^{\beta-1} \ ,
$$

where $\mathcal{N}$ is the normalisation constant:

$$
\mathcal{N} = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)} = \frac{1}{\int d u u^{\alpha -1}(1-u)^{\beta-1}} \ ,
$$

with $\Gamma$ the gamma function (extension of the factorial to real and complex numbers), defined as

$$
\Gamma(t) = \int_0^\infty x^{t-1} e^{-x} dx
$$

and 

$$
\frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)} = \frac{1}{B(\alpha, \beta)} \ ,
$$

$B$ being the  beta function.

The beta distribution is the [conjugate prior](conjugate-dist.ipynb) of the Bernoulli distribution for which $\alpha$ and $\beta$ are the prior number of observations $x=1$ and $x=0$. When $\alpha=\beta=1$, it reduces to a uniform distribution.

## Student's t

*Student* was the pseudonym of W Gosset. 

This distribution arises when estimating the mean of a normally distributed population in situations where the sample size is mall and the population standard deviation is unknown. Hence, it describes a sample extracted from said population: the larger the sample, the more the distribution resembles the normal.

$$
p(x; \nu) = \frac{\Gamma(\frac{\nu+1}{2})}{\sqrt{\nu \pi} \Gamma(\frac{\nu}{2})}  \Big(1 + \frac{x^2}{\nu}\Big)^{-\frac{\nu+1}{2}}
$$

$\nu$ is the number of degrees of freedom. For $\nu=1$, the distribution reduces to the  Cauchy distribution.

## Chi-squared, $\chi^2$

It is the distribution (with $k$ degrees of freedom) of the sum of the squares of $k$ independent standardised normal variables $z_i$ (that is, normal variables standardised to have mean 0 and standard deviation 1). It is a special case of the $\Gamma$ distribution.

$$
Q = \sum_1^k z_i^2 \ ,
$$

So 

$$
Q \sim \chi^2(k)
$$

and depends on the degrees of freedom.

## Poisson

It is a discrete probability distribution and describes the probability that a given number of events occurs in a fixed interval of time and/or space if they are known to occur with a certain (known) average rate and independently of the time and/or distance of the last event.

\textit{An example}: the mail you receive per day. Suppose on average you receive 4 mails per day. Assuming that the events ``mail arriving'' are indenepdent, then it is reasonable to assume that the number ofnails received each day follows a Poissonian. 

\textit{Another example}: the number of people in a queue at a given time of the day.

\textit{Another example}: the number of goals scored in a world cup match.

$$
P(k) = \frac{\lambda^k e^{-\lambda}}{k!} \ ,
$$

where $k = 0, 1, 2, \ldots$ is the number of events in an interval and $\lambda$ the average number of such events in the same interval.

The expected value is 

$$
\mathbb{E}[k] = \sum_{k \geq 0}  k \frac{\lambda^k e^{-\lambda}}{k!} = \sum_{k \geq 1} \lambda \frac{\lambda^{k-1}}{(k-1)!} e^{-\lambda} = \lambda e^{-\lambda} e^\lambda = \lambda
$$

and the variance is

$$
\begin{align*}
Var[k] &= \mathbb{E}[k^2] - \mathbb{E}^2[k] \\
         &=  \sum^{k \geq 0} k^2 \frac{\lambda^k e^{-\lambda}}{k!} - \lambda^2 \\
         &= \lambda e^{-\lambda} \sum_{k \geq 1} k \frac{\lambda^{k-1}}{(k-1)!} - \lambda^2 \\
         &= \lambda e^{-\lambda} \Big[ \sum_{k \geq 1} (k-1) \frac{\lambda^{k-1}}{(k-1)!} + \sum_{k \geq 1} \frac{\lambda^{k-1}}{(k-1)!} - \lambda^2 \Big] \\
         &= \lambda \Big[ \lambda \sum_{k \geq 2} \frac{1}{(k-2)!} \lambda^{k-2} + \sum_{k \geq 1} \frac{1}{(k-1)!} \lambda^{k-1} - \lambda^2 \Big] \\
         &= \lambda e^{-\lambda} \Big[ \lambda \sum_{k \geq 2} \frac{1}{(k-2)!} \lambda^{k-2} + \sum_{k \geq 1} \frac{1}{(k-1)!} \lambda^{k-1}  - \lambda^2\Big] \\
         &= \lambda e^{-\lambda} \Big[ \lambda \sum_{i \geq 0} \frac{1}{i!}\lambda^i + \sum_{j \geq 0} \frac{1}{j!} \lambda^j \Big] \\
         &= \lambda e^{- \lambda} [\lambda e^\lambda  + e^\lambda] - \lambda^2 = \lambda^2 + \lambda - \lambda^2 = \lambda
\end{align*}
$$

So expected value and variance are the same and equal to the average rate of occurrence.

The Poisson distribution is appropriate if 

* the events are independent, _i.e._, the occurrence of one of them does not affect the probability that a second one occurs;
* the rate st which events occur is constant;
* two events cannot occur at the same time;
* the probability of an occurrence of an event in an interval is proportional to the length of the interval

**Example**

Knowing from historical data that the average number of goals scored in a world football match is 2.5, and because the phenomenon can be described by a Poissonian, we have

$$
P(k \text{ goals in a match}) = \frac{2.5^k e^{-2.5}}{k!} \ ,
$$

and we can calculate the expected value and teh variance as above.

An example of a phenomenon which violates the Poissonian assumptions would be the number of students arriving at the student union: the rate is not constant (as it is low during class time and high between class times) and events are co-occurring (students tend to come in groups).

## Dirichlet

It is a continuous multivariate distribution and the generalisation of the beta distribution, typically denoted as $\text{Dir}(\alpha)$, $\alpha$ being the parametrising vector such that $\alpha = (\alpha_i), \alpha_i \in \mathbb{R}, \alpha_i > 0 \forall i$. It is usually used as a prior in bayesian statistics as it is the conjugate prior of the multinomial distribution.

A Dirichlet distribution of order $k \geq 2$ with parameters $\alpha_i$ has the probability density function

$$
f(x_1, \ldots, x_k; \alpha_1, \ldots, \alpha_k) = \frac{1}{B(\alpha)} \Pi_{i=1}^k x_i^{\alpha_i - 1}
$$

with $B$ being the beta function in $\mathbb{R}^{k-1}$ and $x$ living on the open $(k-1)$-dimensional simplex $x_1, \ldots, x_k > 0$, $x_1 + \ldots + x_{k-1} < 1$, $x_k = 1 - x_1 - \ldots - x_{k-1}$.