In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

## Learning Objectives

In this notebook, we're going to get some practice writing data generating processes,
and calculating joint likelihoods between our data and model,
using the SciPy statistics library.

## Simulating coin flips (again!)

We're going to stick with coin flip simulations, because it's:

1. incredibly simple,
2. incredibly informative,
3. incredibly extensible.

This time, though, we're going to construct a model of coin flips
that no longer involves a fixed/known $p$,
but instead involves a $p$ that is not precisely known.

### Algorithmic protocol

If we have a $p$ that is not precisely known, we can set it up by instantiating a probability distribution for it, rather than a fixed value.

How do we decide what distribution to use?
Primarily, the criteria that should guide us is the _support_ of the distribution,
that is, the range of values for which the probability distribution is valid.

$p$ must be a value that is bounded between 0 and 1.
As such, the choice of probability distribution for $p$ is most intuitively the Beta distribution,
which provides a probability distribution over the interval $[0, 1]$.

Taking that value drawn from the Beta, we can pass it into the Bernoulli distribution,
and then draw an outcome (either 1 or 0).
In doing so, we now have the makings of a __generative model__ for our coin flip data!

### Generating in code

Let's see the algorithmic protocol above implemented in code!

In [None]:
from scipy import stats as sts
import numpy as np

def coin_flip_generator() -> np.ndarray:
    """
    Coin flip generator for a `p` that is not precisely known.
    """
    p = sts.beta(a=10, b=10).rvs(1)
    result = sts.bernoulli(p=p).rvs(1)
    return result

coin_flip_generator()

## Prior Information

The astute eyes amongst you will notice 
that the Beta distribution has parameters of its own,
so how do we instantiate that?
Well, one thing we can do is bring in some _prior information_ to the problem.

Is our mental model of this coin that it behaves like billions of other coins in circulation,
in that it will generate outcomes with basically equal probability?
Turns out, the Beta distribution can assign credibility in this highly opinionated fashion!
And by doing so, we are injecting _prior information_
by instantiating a Beta _prior distribution_.

In [None]:
from ipywidgets import FloatSlider, interact, Checkbox
import matplotlib.pyplot as plt
import numpy as np


alpha = FloatSlider(value=2, min=1.0, max=100, step=1, description=r'$\alpha$')
beta = FloatSlider(value=2, min=1.0, max=100, step=1, description=r'$\beta$')
equal = Checkbox(value=False, description=r"set $\beta$ to be equal to $\alpha$")

@interact(alpha=alpha, beta=beta, equal=equal)
def visualize_beta_distribution(alpha, beta, equal):
    if equal:
        beta = alpha
    dist = sts.beta(a=alpha, b=beta)
    xs = np.linspace(0, 1, 100)
    ys = dist.pdf(xs)
    plt.plot(xs, ys)
    plt.title(fr"$\alpha$={alpha}, $\beta$={beta}")


As you play around with the slider, notice how when you increase the $\alpha$ and $\beta$ sliders,
the width of the probability distribution decreases,
while the height of the maximum value increases,
thus reflecting greater _certianty_ in what values for $p$ get drawn.
Using this _prior distribution_ on $p$, we can express what we think is reasonable
given _prior knowledge_ of our system.

### Justifying priors

Some of you, at this point, might be wondering - is there an algorithmic protocol for justifying our priors too?
Can we somehow "just pass our priors into a machine and have it tell us if we're right or wrong"?

It's a great wish, but remains just that: wishful thinking.
Just like the "Aye Eye Drug", one for which a disease is plugged in,
and the target and molecule are spat out.
(I also find it to not be an inspiring goal,
as the fun of discovery is removed.)

Rather, as with all modelling exercises,
I advocate for human debate about the model.
After all, humans are the ones taking action based on, and being affected by, the modelling exercise.
There are a few questions we can ask to help us decide:

- Are the prior assumptions something a _reasonable_ person would make?
- Is there evidence that lie outside of our problem that can help us justify these priors?
- Is there a _practical_ difference between two different priors?
- In the limit of infinite data, do various priors converge? (We will see later how this convergence can happen.)

## Exercises

It's time for some exercises.

### Exercise: Control prior distribution

In this first exercise, I would like you to modify the `coin_flip_generator` function
such that it allows a user to control what the prior distribution on $p$ should look like
before returning outcomes drawn from the Bernoulli.

Be sure to check that the values of `alpha` and `beta` are valid values, i.e. floats greater than zero.

In [None]:
def coin_flip_generator_v2(alpha: float, beta: float) -> np.ndarray:
    """
    Coin flip generator for a `p` that is not precisely known.
    """
    if alpha < 0:
        raise ValueError(f"alpha must be positive, but you passed in {alpha}")
    if beta < 0:
        raise ValueError(f"beta must be positive, but you passed in {beta}.")
    p = sts.beta(a=alpha, b=beta).rvs(1)
    result = sts.bernoulli(p=p).rvs(1)
    return result

### Exercise: Simulate data

Now, simulate data generated from your new coin flip generator.

In [None]:
from typing import List
def generate_data(n_draws: int, alpha: float, beta: float) -> List[int]:
    """
    Generate n draws from the coin flip generator.
    """
    data = [coin_flip_generator_v2(alpha, beta) for _ in range(n_draws)]
    return np.array(data).flatten()

generate_data(50, alpha=5, beta=1)

With that written, we now have a "data generating" function!

## Joint likelihood

Remember back in the first notebook how we wrote about evaluating the joint likelihood of multiple coin flip data
against an assumed Bernoulli model?

We wrote a function that looked something like the following:

```python
from scipy import stats as sts
from typing import List

def likelihood(data: List[int]):
    c = sts.bernoulli(p=0.5)
    return np.product(c.pmf(data))
```

Now, if $p$ is something that is not precisely known,
then any "guesses" of $p$ will have to be subject to the Likelihood principle too,
which means that we need to jointly evaluate the likelihood of $p$ and our data.

Let's see that in code:

In [None]:
def joint_likelihood(data: List[int], p: float) -> float:
    p_like = sts.beta(a=10, b=10).pdf(p)  # our priors are stated here
    data_like = sts.bernoulli(p=p).pmf(data)
    
    return np.product(data_like) * np.product(p_like)

joint_likelihood([1, 1, 0, 1], 0.3)

## Joint _log_-likelihood

Because we are dealing with decimal numbers,
when multiplying them together,
we might end up with underflow issues.
As such, we often take the log of the likelihood.

### Exercise: Implementing joint _log_-likelihood

Doing this means we can use summations on our likelihood calculations,
rather than products:

In [None]:
def joint_loglike(data: List[int], p: float) -> float:
    p_loglike = sts.beta(a=10, b=10).logpdf(p)  # our priors are stated here
    data_loglike = sts.bernoulli(p=p).logpmf(data)
    
    return np.sum(data_loglike) + np.sum(p_loglike)
joint_loglike([1, 1, 0, 1], 0.3)

### Exercise: Confirm equality

Now confirm that the joint log-likelihood is of the same value as the log of the joint likelihood,
subject to machine precision error.

In [None]:
np.log(joint_likelihood([1, 1, 0, 1], 0.3))