In [1]:
%run ../../common/import_all.py

from common.setup_notebook import set_css_style, setup_matplotlib, config_ipython
config_ipython()
setup_matplotlib()
set_css_style()

# Bayes' Theorem

## What is 

The Bayes' theorem, theorised by minister Thomas Bayes in the second half of the eigthteenth century, relates the probability of an event to prior knowledge around it.

We'll call $E$ and $H$ $E$ the evidence (what we see happening, the measurement) and $H$ the hypothesis, whose probability will be affected by gathering the evidence. Bayes' theorem states that the probability of $H$ after observing $E$ is computed as the ratio of the probability of $E$ given $H$ to the probability of $E$: 

$$
P(H|E) = \frac{P(E|H)P(H)}{P(E)} \ .
$$

Specifically,

* $P(H|E)$ is the *posterior*: the probability of observing the hypothesis given the evidence, that is, after data is collected;
* $P(E|H)$ is the probability of observing the evidence given the hypothesis; it is, as a function of $E$ with fixed $H$ the *likelihood*;
* $P(H)$ is the *prior*, that is, the probability of the hypothesis before gathering the evidence; it is one's estimate that $H$ is true before observing the data

The bit $\frac{P(E|H)}{P(E)}$ quantifies the impact of $E$ on the probability of $H$ and is called the *marginal likelihood* or *model evidence*.

The denominator, which is the probability of observing what we observe, can be written as the integration over all the possible hypotheses $H'$ of the terms $P(E|H')P(H')$, so that a rewriting of the theorem reads

$$
P(H|E) = \frac{P(E|H)P(H)}{\sum_{H'}P(E|H')P(H')} \ .
$$

Note that the hypothesis is encoded as a statistical model, that is, through its parameters. 

## Proof

<img src="../../imgs/sets-intersection.pdf" width="300" align="left" style="margin:0px 50px"/>

Can be derived from the definition of [conditional probability](joint-marg-conditional-prob.ipynb):

$$
P(A | B) =  \frac{P(A \cap B)}{P(B)}  \ \ \ \text{if} \ \ \ P(B)\neq 0
$$

and 

$$
P(B | A) =  \frac{P(A \cap B)}{P(A)}  \ \ \ \text{if} \ \ \ P(A)\neq 0
$$

Now, the intersection is commutative, so that $P(A \cap B) = P(B \cap A)$, so 

$$
P(A \cap B) = P(A | B)P(B) = P(B | A)P(A) \ , 
$$

which leads to 

$$
P(A | B) = \frac{P(B | A) P(A)}{P(B)} \ .
$$

## Examples

### Example: flipping a (fair) coin

The prior is $50\%$ to get H or T as our coin is fair. Flipping the coin several times and recording the observed outcomes will change the degree of belief (so that if the coin was unfair the posterior would be different from the prior).

### Example: Testing for a disease

Let's assume to have a testing method developed to test for a disease and to know that it is correct $99\%$ of the times in determining that you have or not have the disease. 

This means that if you have the disease it will output YES with probability $99\%$ and if you don't have the disease it will output NO with probability $99\%$. This is the likelihood. Let's say we also know that such disease occurs in the general population in one over $10^4$ people. This is the prior.

Question now is, if you take the test, and it comes up positive, what is the probability that you actually have the disease?

$P(E)$ at the denominator can be written as a sum of the joint probabilities of observing the evidence and the event over all the possible events (all the possible hypotheses):

* Event $A$: you have the disease;
* Event $B$: you don't have the disease

$$
P(E) = P(E|A)P(A) + P(E|B)P(B) = \sum_{\bar H} P(E|\bar H) P(\bar H)
$$

and by plugging in the numbers, the desired posterior is equal to $1\%$.

## References

1. <a name="1"></a> [Some more examples on WIkipedia](https://en.wikipedia.org/wiki/Bayes'_theorem#Examples)