# Bayesian Framework

### Concepts

* (probabolistic) model
* data/sample
* likelihood
* maximum likelihood
* prior
* posterior
* maximum a posteriori (MAP)
* predictive distribution
* conjugate priors

### example: independent coin toss

* $P(\text{heads}) = p$
* then $P(HHT) = p \times p \times (1 - p)$

let 

$x_i = \begin{cases}
    1 & \text{if } i^{th} \text{ toss is heads} \\
    0 & \text{otherwise}
\end{cases}$

consider 3 cases

1. 300 heads, 200 tails
2. 3 heads, 2 tails
3. 5 heads, 0 tails

### maximum likelihood principle

pick the model (and parameters) that has the highest likelihood given the sample/data

$L(\theta) = P(\text{data} \mid \theta)$  
$\hat{\theta} = argmax_\theta P(\text{data} \mid \theta)$

let $\ell(\theta) = \log L(\theta)$. Then $argmax_\theta L(\theta) = argmax_{\theta} \ell(\theta)$

### back to the coin toss example

$L(p) = \binom{n}{x} p^x (1-p)^{n-x}$  
$\ell(p) = \log \binom{n}{x} + x \log p + (n-x) \log (1-p)$

where $x$ is the number of heads and $n$ is the total number of coin tosses

To maximize w.r.t. $p$, take the derivative and set to 0:

$0 = \frac{x}{p} - \frac{n-x}{1-p}$  
$\implies \hat{p} = \frac{x}{n}$

alternatively, consider the prior:

* $P(p=.5) = .9$
* $P(p=.6) = .1$
* $P(p \not\in \{.5, .6\}) = 0$

Another prior:

* probability density function $f(p) \propto p^2 (1-p)^2$ when $p \in (0, 1)$ and 0 otherwise
* $f(p) = \frac{p^2 (1-p)^2}{30}$

given a prior and no data, we can compute the probability of heads:  
$P(H) = \sum_\theta P(H | \theta) P(\theta)$  

for the first prior, we have $.9 \times .5 + .1 \times .6 = .51$

### posterior distribution

if we have data, then we can update our prior belief

$P(\theta \mid \text{data})$

To compute, we use ...

### Bayes' rule

$P(\theta \mid x) = \frac{P(x \mid \theta) P(\theta)}{P(x)}$

note that $P(x \mid \theta) = L(\theta)$, the likelihood

to compute the denominator:

$P(x) = \sum_\theta P(x | \theta) P(\theta)$

### back to the coin toss example ...

using the first prior, we can compute 
$P(p=.5 | x) = \frac{.9 \times .5^x .5^{n-x}}{.9 \times .5^x .5^{n-x} + .1 \times .6^x .4^{n-x}}$

for the second prior ...  
$f(p | x) = \frac{f(p) p^x (1-p)^{n-x}}{\int f(p) p^x (1-p)^{n-x} dp}$  

we can avoid doing the integral in the denominator by saying ...  
$\propto p^{x+2} (1-p)^{n-x+2}$  
and then noting that probabilities must sum up to or integrate to 1

### maximum a posteriori principle

choose the model that maximizes the posterior

note that this often doesn't require normalizing the posterior distribution (just need to compute the argmax)

### back to the coin toss example ...

using the prior $f(p) \propto p^2 (1-p)^2$, we have  
$f(p|x) = p^{x+2} (1-p)^{n-x+2}$  
and taking the derivative w.r.t. $p$ and setting to 0, we get  
$\hat{p} = \frac{x+2}{n+4}$

### conjugate prior

a prior distribution such that the the posterior distribution is of the same family

### back to the coin toss example ...

we started with

* $x \mid p \sim Binomial(p)$
* $p \sim Beta(2, 2)$

then we get

* $p \mid x \sim Beta(x + 2, n - x + 2)$