In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

## Introduction

What happens when we hit the Inference Button (tm)? Is it gradient descent that is happening underneath the hood? What is this "sampler" we speak about, and what exactly is it doing?

As we take a detour away from PyMC3 for a moment,
those fundamental questions are the questions
that we are going to go through in this chapter.
It's my hope that you'll enjoy peeling back the covers
on _some_ of what happens underneath the hood.

## In the beginning...

First off, we must remember that with Bayesian statistical inference,
we are most concerned with computing the posterior distribution of parameters
conditioned on data, $P(H|D)$.
Here, $H$ refers to the parameter set and the model,
while $D$ refers to the data.

Because it is a conditional distribution,
by invoking the rules of probability, where

$$P(H,D)=P(H|D)P(D)=P(D|H)P(D)$$

if you were to treat each of the $P$s as algebraic elements,
then a simple rearrangement gives us:

$$P(H|D)=\frac{P(D|H)P(H)}{P(D)}$$

This, then, is Bayes' rule as applied to joint distributions
of data and model parameters.

One hiccup shows here, though,
in that we cannot analytically know how to calculate $P(D)$.
The reason for this is that we don't have an analytical form
for the probability distribution of how data could have been configured.
In practice, we treat that as a normalizing constant,
since philosophically, data are considered constant
while parameters are random variables. 
Hence, our posterior distribution
is calculated as a proportionality term:

$$P(H|D) \propto P(D|H)P(H)$$

### Let's look at an illustration

To make everything a bit more concrete, let's look at what I call
a "simplest complex example",
one that is not too hard to "grok" (geek slang for _understand_),
but one that is complex enough to be interesting.

We're going to inspect this particular model:


$$\mu_d \sim N(\mu=3, \sigma=1)$$
$$\sigma_d \sim Exp(\lambda=13)$$
$$d \sim N(\mu=\mu_d, \sigma=\sigma_d)$$

We have Gaussian-distributed data,
where the mean of the data distribution,
$\mu_d$ is a Gaussian-distributed random variable
that has a configuration that specifies our prior belief about it
having not seen any data,
while the variance of the data distribution,
$\sigma_d$ is an Exponentially-distributed random variable,
also configured in a way that specifies our prior without having seen any data.

The model's PGM looks something like this:

In [None]:
from daft import PGM

G = PGM()
G.add_node("mu", content=r"$\mu$")
G.add_node("sigma", content=r"$\sigma$", x=1)
G.add_node("d", content="d", x=0.5, y=-1)

G.add_edge("mu", "d")
G.add_edge("sigma", "d")
G.show()

### Exercise: Joint log-likelihood

Write down the joint log-likelihood between data and the model parameters under the pre-specified priors.

In [None]:
from scipy.stats import norm, expon
def log_like(mu, sigma, data):
    mu_like = norm(loc=0, scale=3).logpdf(mu)
    sigma_like = expon(scale=1).logpdf(sigma)
    data_like = norm(loc=mu, scale=sigma).logpdf(data).sum() # sum is important!
    return mu_like + sigma_like + data_like

Now, I'm going to give you some _actual_ data.

In [None]:
data = norm(-1, 1).rvs(100).reshape(-1, 1)
# data

Now, I'd like you to propose a $\mu$ and a $\sigma$,
and then evaluate their joint log-likelihood with the data.

In [None]:
import numpy as np

log_like(-1, 1, data)

Let's plot how the log likelihood varies with $\mu$ and $\sigma$.
This will give us a great way to visualize the posterior distribution space.

In [None]:
from ipywidgets import interact, FloatSlider

mu = FloatSlider(min=-3, max=5, value=0, step=0.1)
sigma = FloatSlider(min=0.1, max=10, value=1, step=0.1)

@interact(mu=mu, sigma=sigma)
def plot_univariate_posterior(mu, sigma):
    # mu = -1
    mu_range = np.linspace(-3, 5)
    # sigma = 1
    sigma_range = np.linspace(0.1, 10)
    ll_sigma = log_like(mu, sigma_range, data)
    ll_mu = log_like(mu_range, sigma, data)
    fig, ax = plt.subplots(figsize=(8,4), nrows=1, ncols=2)
    ax[0].plot(sigma_range, ll_sigma)
    ax[1].plot(mu_range, ll_mu)
    sns.despine()
    plt.tight_layout()

In [None]:
import pandas as pd
import seaborn as sns

lldf = pd.DataFrame(points)

sns.scatterplot(data=lldf, x="mu", y="sigma", hue="ll")

In [None]:
mus = np.linspace(0, 6, 100)
sigmas = np.linspace(0.1, 20, 100)

MUS, SIGMAS = np.meshgrid(mus, sigmas)

In [None]:
import matplotlib.pyplot as plt
ll = log_like(MUS, SIGMAS, data)
plt.contour(MUS, SIGMAS, ll, levels=100)
# plt.colorbar()
ll