# The Bernoulli family of distributions

A **Bernoulli trial** is an event with two possible outcomes. Typically these are labeled as "success" and "failure", and the probability of success is often denoted by $p$.

## [Binomial](https://en.wikipedia.org/wiki/Binomial_distribution)
$$
\newcommand{\given}{\;\lvert\;}
p(y \given n, p) = \binom{n}{y} p^y (1 - p)^{n - y}
$$
* A discrete distribution on the number of successes $y \in \mathbb{Z}_{\ge 0}$ in $n$ Bernoulli trials, where each trial has probability $p$ of success.
* Parameterized by $n$ (the number of trials), and $p$ (the probability of success of each trial).
* The conjugate prior on $p$ is Beta$(\alpha, \beta)$. Given new data of $y$ successes in $n$ trials, the parameters update by
$$\begin{align}
\alpha &\mapsto \alpha + y \\
\beta &\mapsto \beta + n - y
\end{align}$$
* The prior predictive distribution is
$$
p(y \given n, \alpha, \beta) = \binom{n}{y} \frac{\prod_{i=0}^{y-1} (\alpha + i) \prod_{j=0}^{n-y-1} (\beta + j)}
   {\prod_{k=0}^{n - 1} (\alpha + \beta + k) }
$$
which in the simple case of $n = 1, y = 1$ reduces to
$$
p( y=1 \given n=1, \alpha, \beta) = \frac{\alpha}{\alpha + \beta}
$$

## [Geometric](https://en.wikipedia.org/wiki/Geometric_distribution)
$$
p(y \given p) = p(1 - p)^{y - 1}
$$
* A discrete distribution on the number of trials $y \in \mathbb{Z}_{\ge 0}$ required for the first success.
* Parameterized by $p$ (the probability of success of each trial).

## [Negative Binomial](https://en.wikipedia.org/wiki/Negative_binomial_distribution)
$$
p(y \given p, r)= \binom{y - 1}{r - 1} p^{r} (1 - p)^{y - r}
$$
* A discrete distribution on the number of trials $y \in \mathbb{Z}_{\ge 0}$ required for $r$ successes.
* Parameterized by $p$ and $r$ (the number of required successes), though note this is well defined for non-integer $r > 0$, by using $\Gamma$ functions instead of the binomial coefficient.
* The $r=1$ case is the geometric distribution.

## [Beta](https://en.wikipedia.org/wiki/Beta_distribution)
$$
p(p \given \alpha, \beta) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)} p^{\alpha - 1} (1 - p)^{\beta - 1}
$$
* A continuous distribution on the parameter $p \in [0, 1]$ of Bernoulli trials.
* Parameterized by $\alpha, \beta > 0$. 

## [Multinomial](https://en.wikipedia.org/wiki/Multinomial_distribution)
$$
p(y_1, \ldots, y_k \given p_1, \dots, p_k) = \left(\sum_{i=1}^k y_i\right)! \prod_{i=1}^{k} \frac{p_i^{y_i}}{y_i!}
$$
* Related to an extention of Bernoulli trials, where every trial has $k$ possible outcomes. 
* A discrete-vector distribution on the counts $(y_1, \ldots, y_k)$ of trials that fall into the respective bins.
* Parameterized by $p_1, \dots, p_k$, the probability of each bin (these must sum to 1).
* Specializes to the binomial distribution in the case of exactly two bins.

## [Dirichlet](https://en.wikipedia.org/wiki/Dirichlet_distribution)
$$
p(p_1, \dots, p_k) = \Gamma\left(\sum_{i=1}^k \alpha_i \right) \prod_{i=1}^k 
                                    \frac{p_i^{\alpha_i}}{\Gamma(\alpha_i)}
$$
* A continuous vector distribution on parameters $p_1, \dots, p_k$ of multinomial trials.
* Parameterized by $\alpha_1, \dots, \alpha_k > 0$.
* The conjugate prior to the multinomial distribution. The update function is analogous to the beta distribution.
* Reduces to the beta distribution when $k = 2$.


# The Poisson family of distributions
A **Poisson process** is one in which events occur exchangeably in all time intervals.

## [Poisson](https://en.wikipedia.org/wiki/Poisson_distribution)
$$
p(y \given \lambda) = \frac{\lambda^y e^{-\lambda}}{y!}
$$
* A discrete distribution on the count $y$ of events occurring in a unit interval. 
* Parameterized by $\lambda > 0$, the rate at which events occur (in expectation).
* The conjugate prior on $\lambda$ is Gamma$(\alpha, \beta)$. Given new data of $y$ events in $n$ unit intervals, the parameters update by
$$\begin{align}
\alpha &\mapsto \alpha + y \\
\beta &\mapsto \beta + n
\end{align}$$
* The prior predictive distribution is
$$
p(y \given \alpha, \beta) = \textrm{Neg-Bin}(\alpha, \beta)
$$

## [Exponential](https://en.wikipedia.org/wiki/Exponential_distribution)
$$
p(t \given \lambda) = \lambda e^{-\lambda t}
$$
* A continuous distribution on the time $t \ge  0$ between Poisson events.
* Parameterized by $\lambda > 0$, the rate at which events occur (in expectation).
* The conjugate prior on $\lambda$ is Gamma$(\alpha, \beta)$. Given new data of $n$ events in total time $t$, the parameters update by
$$\begin{align}
\alpha &\mapsto \alpha + n \\
\beta &\mapsto \beta + t
\end{align}$$

## [Gamma](https://en.wikipedia.org/wiki/Gamma_distribution)
$$
p(t \given \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)}t^{\alpha - 1} e^{-\beta t}
$$
* A continuous distribution on the time $t \ge 0$ it takes for $\alpha$ events to occur.
* Parameterized by $\alpha > 0$ and $\beta > 0$ (the rate of events, i.e., $\lambda$).
* Another common parameterization is by $k > 0$ ($= \alpha$), and scale parameter $\theta \left(= \frac{1}{\beta}\right)$.
* Gamma(1, $\lambda$) = Exponential($\lambda$)
* Gamma$\left(\frac{\nu}{2}, \frac{1}{2}\right) = \chi^2(\nu)$
* If $x \sim $ Gamma$(\alpha, \beta)$, then $\frac{1}{x} \sim$ Inv-Gamma$(\alpha, \beta)$.

## [Inverse Gamma](https://en.wikipedia.org/wiki/Inverse-gamma_distribution)
$$
p(y \given \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)}y^{-\alpha - 1} e^{-\frac{\beta}{y}}
$$
* If $x \sim $ Gamma$(\alpha, \beta)$, then $\frac{1}{x} \sim$ Inv-Gamma$(\alpha, \beta)$.
* Inv-Gamma$\left(\alpha, \frac{1}{2}\right)$ = Inv-$\chi^2 (2 \alpha)$