In [1]:
%run ../../common/import_all.py

from common.setup_notebook import set_css_style, setup_matplotlib, config_ipython
config_ipython()
setup_matplotlib()
set_css_style()

# The Maximum a Posteriori Estimation

## What is and how does it work

This Maximim a Posteriori (MAP) estimation method uses the [mode](../distribution-measures/quantiles-mode.ipynb#Mode) of the posterior to estimate the unknown population.

From [Bayes' theorem](../distribution-measures/bayes.ipynb), the posterior is expressed as 

$$
P(\theta \ | \ x) = \frac{P(x \ | \ \theta) P(\theta)}{\int d \theta' P(x \ | \ \theta') P(\theta')} \ ,
$$

with $\theta$ being the parameters of the statistical model and $x$ the observed data. The MAP method estimates $\theta$ as the one which maximises the posterior; note that the denominator is just a normalisation factor: 

$$
\hat{\theta}_{MAP}(x) = arg \max_\theta P(\theta \  | \  x) = arg \max_\theta P(x \ | \ \theta) P(\theta) \ .
$$

This means exactly taking the mode of the posterior distribution.

In the case of a uniform prior, the MAP estimation is equal to the [ML estimation](mle.ipynb) as we get to maximise the likelihood because the prior becomes just a factor. For the computation, [conjugate priors](../distribution-measures/conjugate-dist.ipynb) are particularly handy. 

As in the case of the MLE, what we really do is maximising the logarithm of the posterior rather than the posterior itself, so we do 

$$
\hat \theta_{MAP}(x) = arg \max_{\theta} \log P(\theta \ | \ x) = arg \max_{\theta} [\log P(x \ | \ \theta) + \log P(\theta)] \ .
$$

## MAP and ML

In the last equation, if we only had the first term to maximise, we would be doing a ML estimation. The second term is the one accounting for the presence of a prior: this is why the MAP method is considered as a regularised ML as prior knowledge is factored in the computation. 

While the ML method can be seen as responding to a frequentist approach, the MAP method responds to a Bayesian approach. 

## References

1. [An assignement on the method from Carnegie Mellon](http://www.cs.cmu.edu/~aarti/Class/10601/homeworks/hw2Solutions.pdf)