https://www.ritchievink.com/blog/2019/06/10/bayesian-inference-how-we-are-able-to-chase-the-posterior/

# Bayesian inference: How we are able to chase the Posterior
## Bayes' Formula
Bayes' Formula is a way to reverse a conditional probability:
$$
P(\theta | D) = \frac{P(D | \theta) P(\theta)}{P(D)}
$$
If we have a model architecture and we want to find the most likely parameters $\theta$ (which is a large vector, containing all parameters of the model) for that model, given a set of observed data points $D$. This is what we are interested in, and is called the *posterior* $P(\theta|D)$. We often have some prior belief about the parameters of our model: the *prior* $P(\theta)$. Given all values of $\theta$, we can compute the probability of observering the data points. This is the *likelihood* $P(D|\theta)$. The last term is the *evidence* $P(D)$. This term is hard to compute since it is the marginal likelihood when all parameters are marginalized:
$$
P(D) = \int_{\theta} P(D|\theta) P(\theta) d\theta
$$
Even for moderately high dimensions of $\theta$ the amount of numerical operations explodes.

## A Simple Example
Let’s base this post on a comprehensible example. We will do a full Bayesian analysis in Python by computing the posterior. Later we will assume that we cannot do this. Therefore we will approximate the posterior (we’ve computed) with MCMC and Variational Inference.

Assume we have observed two data points: $D = \{195, 182\}$. Both are the observed lenghts (in cm) of men in a basketball competion.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

lengths = np.array([195, 182])

### 1.1 Likelihood function

We'll assume that the distribution of the true weights (the posterior) follows a Gaussian distribution. A Gaussian is parametrized with mean $\mu$ and variance $\sigma^2$. For a reasonable domain of these parameters $\theta = \{\mu, \sigma \}$, we can compute the likelihood $P(D|\theta) = P(D|\mu, \sigma).

In [2]:
# computation domain

# lets create a grid of our two parameters
mu = np.linspace(150, 250)
sigma = np.linspace(0, 15)[::-1]

mm, ss = np.meshgrid(mu, sigma)  # just broadcasted parameters

# the likelihood