# Posterior Approximation

## The Law of Large Numbers

What if there's no well-defined PDF for our product of likehood and prior(s)? It means we cannot use `scipy.stats.norm`, and its convenient `mean()`, `pdf()`, `cdf()`, or `ppf()` functions to establish probabilistic answers to questions about the posterior. Is it possible to answer such questions in a different way? Assuming we have a sample $y_1,\ldots,y_n$ from an _unknown_ distribution:

In [None]:
import numpy as np

y = np.random.normal(loc=130, scale=10, size=10)  # although using samples from a normal distribution, pretend to not know the true distribution

The distribution is not defined by a PDF, but by a **sample**. According to the [Central Limit Theorem](https://en.wikipedia.org/wiki/Central_limit_theorem), it is possible to approximate the distribution mean with the sample mean, with large enough sample. Is $n=10$ good enough?

In [None]:
np.mean(y)

In [None]:
np.std(y)

As [mentioned before](./notebooks/01_bayes_rule_intro.ipynb#Continuous:), the probability of $Y$ being between e.g. 110 and 130 is

$$Pr(110 \le Y \le 130) = \int_{110}^{130}p(y)$$

However, $p(y)$ is undefined and the only thing we have is a sample. But the sample can be used to approximate the integral:

In [None]:
def sample_cdf(x, sample):
    return (sample <= x).sum() / sample.size  # "integrating" by counting

In [None]:
sample_cdf(130, y) - sample_cdf(110, y)

This can be compared with the probability obtained from the true normal distribution:

In [None]:
from scipy.stats import norm

unknown_norm = norm(loc=130, scale=10)
unknown_norm.cdf(130) - unknown_norm.cdf(110)

Samples also allow us to compute quantiles:

In [None]:
def sample_ppf(p, sample):
    p_index = int(np.round((sample.size - 1) * p))
    return np.sort(sample)[p_index]

So the range of $y$ for which the probability is 80% that it contains $y$'s true mean is:

In [None]:
f'[{sample_ppf(0.1, y)} - {sample_ppf(0.9, y)}]'

where the interval from the true normal distribution would be:

In [None]:
f'[{unknown_norm.ppf(0.1)} - {unknown_norm.ppf(0.9)}]'