# Bayesian inference

A statistical inference method using Bayes' theorem to update hypothesis based on evidence (or observed data)

Baye's theorem:
$$p(z|x) = \frac{p(x|z)p(z)}{p(x)}$$

where $x$ is data or evidence, $p(x)$ is the model evidance or marginal likelihood, $𝑝(𝑧)$ is the prior or probability of the hypothesis, and $p(z|x)$ os the posterior, i.e. the probability of a hypothesis given the observed evidence.




# Frequentist vs Bayesian picture

- In a frequentist approach to inference, unknown parameters are often, but not always, treated as having fixed but unknown values

- In contrast, a Bayesian approach to inference does allow probabilities to be associated with unknown parameters. This allows these probabilities to have an interpretation as representing the scientist's belief


(e.g.) There exists two kinds of coins : loaded (70% head, 30% tail) and fair coin. You are asked to find the probability of a given coin to be loaded or fair coin by tossing it 4 times. After tossing it you got 2 heads. 

- Frequentist approach (using MLE), 
  - $L(\mathrm{fair}|2) =  \,_4C _2 \times (0.5)^4 = 0.375$
  - $L(\mathrm{loaded}|2) =  \,_4C _2 \times (0.7)^2 \times (0.3)^2 = 0.265$
  - Therefore it is most likey that the coin is fair. Note that it is *point estimate*. It cannot answer how sure are you.


- Bayesian approach. using $p(z|x) = p(x|z)p(z)/p(x)$
   
  - $p(\mathrm{fair}|2) \propto  0.375\times p(\mathrm{fair})$
  - $p(\mathrm{loaded}|2) \propto  0.265\times p(\mathrm{loaded})$
  - Here p(z) can be interpreted to be one's belief.

# Variational Bayesian method

$$
p( z \mid  x) = \frac{p( x \mid  z)p( z)}{p( x)} = \frac{p( x \mid  z)p( z)}{\int_{ z} p( x, z) \,d z}
$$

The marginalization over $z$ is typically intractable. 
Instead by introducing approximate posterior $q(z|x)\simeq p(z|x)$, Bayesian inference becomes optimization (of evidence lower bound) problem

# Evidence lower bound

$$
\log p(\mathbf{x}) =
D_{\mathrm{KL}}\left(q(\mathbf{z}|\mathbf{x}) \parallel p(\mathbf{z}|\mathbf{x})\right) - \mathbb{E}_{q(\mathbf{z}|\mathbf{x}) } \left[ \log q(\mathbf{z}|\mathbf{x}) -  \log p(\mathbf{z},\mathbf{x}) \right] = D_{\mathrm{KL}}\left(q(\mathbf{z}|\mathbf{x}) \parallel p(\mathbf{z}|\mathbf{x})\right) + \mathcal{L}(q)
$$

where $D_{KL} \ge 0$ is the [KL divergene](concept_KLdiv.ipynb)
