![QMUL](Images/QMUL-logo.jpg)

# Introduction to Approximate Bayesian Computation in population genetics

## Prior distributions

### Intended Learning Outcomes

At the end of this part you will be able to:
* describe the pros and cons of using different priors (e.g. elicited, conjugate, ...),
* evaluate the interplay between prior and posterior distributions,
* calculate several quantities of interest from posterior distributions.

Prior distributions can
* be derived from past information or personal opinions from experts;
* be distributed as familiar distribution functions;
* bear little information.

### Elicited priors

The simplest approach to specify $\pi(\theta)$ is to define the collection of $\theta$ which are possible.

Then one can assign some probability to each one of these cases and make sure that they sum to $1$.

If $\theta$ is discrete, this looks like a natural approach.

Alternatively, we may assume that the prior distribution for $\theta$ belongs to a parametric distributional family $\pi(\theta|\nu)$.

Here we choose $\nu$ so that $\pi(\theta|\nu)$ closely matches our elicited beliefs.

This approach has several advantages:

* it reduces the effort to the elicitee (you don't have to decide a probability for each value $\theta$ can have);
* it overcomes the finite support problem (as in the case of the histogram);
* it may lead to simplifications in the computation of the posterior (as we will see later on).

As a rule of thumb, for elicited priors, it is recommended to focus on quantiles close to the middle of the distribution (e.g. the $50^{th}$, $25^{th}$ and $75^{th}$) rather than extreme quantiles (e.g. the $95^{th}$ and $5^{th}$).
You should also assess the symmetry of your prior.

Elicited priors can be updated and reassessed as new information is available.

They are very useful for experimental design where some ideas on the nature of the studied system is given in input.

### Conjugate priors

When choosing a prior distribution $\pi(\theta|\nu)$ some family distributions will make the calculation of posterior distributions more convenient than others will do.

It is possible to select a member of that family that is _conjugate_ with the likelihood $f(y|\theta)$, so that the posterior distribution $p(\theta|y)$ belongs to the same distributional family as the prior.

### Noninformative priors

If no reliable prior information on ${\theta}$ is available,  can we still employ a Bayesian approach?

It is still appropriate if we find a distribution $\pi({\theta})$ that contains "no information" about ${\theta}$, in the sense that it does not favour one value over another.

We refer to such a distribution as a _noninformative prior_ for ${\theta}$.

All the information in the posterior will arise from the data.

### Hierarchical modelling

A posterior distribution is typically obtained with two stages, one for $f({y},{\theta})$, the likelihood of the data, and one for $\pi({\theta}, {\nu})$, the prior distribution of ${\theta}$ given a vector of _hyperparameters_ ${\nu}$.

If we are uncertain about the values for ${\nu}$, we need an additional stage, a _hyperprior_ , defining the density
distribution of hyperparameters.

If we denote this distribution as $h({\nu})$, then the posterior distribution is
\begin{equation}
P({\theta}|{y}) = \frac{ \int f({y}|{\theta})\pi({\theta}|{\nu})
               h({\nu})d{\nu} }{ \int \int f({y}|{\theta})\pi({\theta}|
              {\nu})h({\nu})d{\nu}d{\theta} } 
\end{equation}