## Bayes' Theorem

Bayes' Theorem is a simple rule about conditional probabilities that has profound consequences.

$$ \mathbb{P}[X=x \mid Y=y] = \frac{\mathbb{P}[Y=y \mid X=x] \cdot \mathbb{P}[X=x]}{\mathbb{P}[Y=y]} $$

Mathematically, it's simply derived from the fact that
$$ \mathbb{P}[X=x\mid Y=y] \cdot \mathbb{P}[Y=y] = \mathbb{P}[X=x, Y=y] = \mathbb{P}[Y=y \mid X=x] \cdot \mathbb{P}[X=x] \,.$$

While it seems trivial, it tells us what we know about one variable ($X$) given information about another ($Y$). 

**Example:** Assume we have  biased Bernoulli $X$ that takes on $1$ with probability $P$.  We don't know $P$; we can call that a random variable as well, and let's assume it's beta distributed with parameters $\alpha$ and $\beta$ (denoted $B(\alpha, \beta)$).  By observing flips of $X$, we can intuitively infer the value of $P$.

Using Bayes' Theorem, we have that the new distribution of
$$ 
\begin{align}
\mathbb{P}[P=p \mid X=1] \propto\,\, & \mathbb{P}[X=1 \mid P=p] \cdot \mathbb{P}[P=p] \\
 \propto\,\, & p p^{\alpha-1} (1-p)^{\beta-1} \\
 \propto\,\, & p^{\alpha} (1-p)^{\beta-1}
\end{align}
$$
It's not hard to show that
$$ \mathbb{P}[P=p \mid X=1] = B(\alpha+1, \beta)\,. $$
Similarly, 
$$ \mathbb{P}[P=p \mid X=0] = B(\alpha, \beta+1)\,. $$

Stated differently, if we have $P$ distributed as $B(\alpha, \beta)$ (the **prior**), and if we observe $X=1$, we update our estimate of the distribution of $P$ (the **posterior**) to $B(\alpha+1, \beta)$, and if we observe $X=0$, we update the posterior to $B(\alpha, \beta+1)$.  The fact that the Bayes' Theorem equations can be solved in closed form makes the Bernoulli and Beta distributions **Conjugate Priors**.

**Example:** We serve ads to individuals, and our prior belief that they will click on an ad is $P$ distributed $B(\alpha, \beta)$.  If we see them click on the ad, then our posterior for $P$ is $B(\alpha+1, \beta)$, and if they do not, then our posterior is $B(\alpha, \beta+1)$.

Continuing on, if we show another ad to the same individual, we can get this process to update again.  So if they click on 2 of the 10 ads we show them, then our posterior $P$ is $B(\alpha+2, \beta+8)$.

In fact, the canonical interpretation of $B(\alpha, \beta)$ is our belief about the odds of some $X$ being successful if we have seen $\alpha$ 1s and $\beta$ 0s.

Below is Bayes' Theorem in action:

In [2]:
%matplotlib inline
import matplotlib
import seaborn as sns
matplotlib.rcParams['savefig.dpi'] = 144

import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pylab as plt
# Beta and binomial are conjugate priors

def binomial_bayes(a_0, b_0, a_1, b_1):
    prior = sp.stats.beta(a=a_0, b=b_0)
    posterior = sp.stats.beta(a=a_0+a_1, b=b_0+b_1)

    prior_rvs = prior.rvs(size=10000)
    variates = sp.stats.binom(n=a_1+b_1, p=prior_rvs).rvs()
    posterior_rvs = prior_rvs[variates == a_1]

    ax1 = plt.subplot(2,1,1)
    plot_hist_dist(prior_rvs, prior, title="Prior Beta(a={a}, b={b})".format(**prior.kwds), ax=ax1)
    ax2 = plt.subplot(2,1,2, sharex=ax1)
    plot_hist_dist(posterior_rvs, posterior, title="Posterior Beta(a={a}, b={b})".format(**posterior.kwds), ax=ax2)
    
    plt.tight_layout()
    
binomial_bayes(a_0=2, b_0=2, a_1=6, b_1=3)

AttributeError: 'module' object has no attribute 'stats'