In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

## Introduction

In this chapter, we are going to build off the knowledge you've learned in the previous chapters,
and introduce you to the wonderful world of probabilistic inference using PyMC3!

## Recap

Let's recap what you've learned so far.

Thus far, you have encountered:

- The basics of probability, and how joint probability links to joint modelling between data and parameters.
- How to simulate data generating processes, and evaluate the joint likelihood of data and parameter "guesses".

In short, you have learned all about the so-called "forward" pass of modelling.
(Take the term "forward" with a grain of salt, it is meant to be an idea,
not an official term that belongs to the discipline of statistics.)
We introduced you to the term **prior belief** as well.
In this chapter, we will be performing _inference_, or in other words,
taking principled guesses at what "set" of values of our parameters best explain our data,
conditioned on our original hypotheses about what their values should be,
or in other words, calculating posterior beliefs having seen the data.

## Probabilistic Modelling

In simulating the data generating process with probability distributions,
you wrote a **"probabilistic"** model.
Other names for this include a "stochastic" model, or a "non-deterministic" model.

In the spirit of sticking with simple complex examples, we are going to continue exploring the classic coin flip model.
Though it might seem like the example can be beaten-to-death, stick with me,
as it's a very useful pedagogical tool.
Once we've graduated from the classic coin flip,
you'll be equipped with the right abstractions to handle other models easily!

Let's take that coin flip model with Beta-distributed $p$, and implement it in PyMC3 code.

In mathy syntax, the model can be expressed as follows:

$$ p \sim Beta(\alpha=2, \beta=2)$$
$$ Y \sim Bernoulli(p=p)$$

where:

- $Y$ is the random variable modelling coin flip results,
- $p$ is the key parameter of the Bernoulli distribution likelihood, which is used to model the space of possibilities for $Y$,
- and $\alpha$ and $\beta$ are the key parameters of the Beta distribution, which is used to model the space of possibilities for $p$. Having them set to $2$ each is a modelling decision that I have taken.

To jog your memory, here's how we wrote the data generating process
in the previous notebook using `scipy.stats`:

In [None]:
from bayes_tutorial.solutions.simulation import coin_flip_generator_v2
from inspect import getsource

coin_flip_generator_v2??

Before we move on, we should also see what the state of our priors look like, having not seen any data.

### Exercise

Plot the PDF of the prior distribution of $p$. 

Hints to help you along:

- You'll need to access the `.pdf()` class method of the Beta distribution object.
- Remember that the `.pdf()` function takes in `x`, on which you evaluate the PDF.

In [None]:
import matplotlib.pyplot as plt
from scipy.stats import beta
import numpy as np
from bayes_tutorial.solutions.inference import plot_betadist_pdf

# This is the answer below
plot_betadist_pdf(2, 2)

# Your answer below

This probability density function describes what we believe about the likelhood of $p$,
having not seen the data.
It is centered on 0.5, which means we generally ascribe highest likelihood to a "fair coin",
but it is also wide, with quite a bit of density
outside of the vicinity of 0.5.
We ascribe very little likelihood at 0 and 1.

## Coin Flip in PyMC3

Let's now see how we can convert this coin flip model into a PyMC3 model.

In [None]:
import pymc3 as pm

with pm.Model() as model:
    p = pm.Beta("p", alpha=2, beta=2)
    data = pm.Bernoulli("data", p=p)

Just like that, we have specified a probabilistic model for coin flips!
Notice how, first of all, the syntax matches very closely
to how the probabilistic model is written in traditional math syntax,
as well as the `scipy.stats` syntax.
In particular,

- `p` is the random variable that models the possible space of Bernoulli parameter $p$s,
- `data` is the random variable that models the possible space of data that we could generate.

It should be clear that by expressing our model using the language of probability distributions,
we gain the ability to concisely write down statistical models.

Now, how does data come into play here?

Well, what we do is to "condition" the model on observe data
by passing in data to the random variable. Let's see this in action:

In [None]:
from bayes_tutorial.solutions.inference import coin_flip_data

data = coin_flip_data()  # 12 flips, comprising of 8 heads and 4 tails.

with pm.Model() as model:
    p = pm.Beta("p", alpha=2, beta=2)
    data = pm.Bernoulli("data", p=p, observed=data)

## Inferential Procedure

Now we come to the point you've all been waiting for, after all of the basics:
How do we perform inference on the key parameter $p$?
What should we believe about the parameter $p$,
conditioned on our priors?

To do this, we're going to follow an inferential procedure
that you will see over and over in this tutorial.

### The Inference Button! (tm)

First off, we're going to hit the inference button below.

In [None]:
with model:
    trace = pm.sample(2000)

If there are any warnings that show up, we can ignore them for a moment.

### What exactly is the inference button doing?

Thomas Wiecki, one of the core maintainers of the PyMC3 library,
coined the term "The Inference Button" to describe the spirit of the PyMC library.

Given a probabilistic model, in which we jointly model our parameters and data,
the posterior distribution is given by a single equation, Bayes' rule.
Calculating the posterior exactly is, in the vast majority of cases, intractable,
so we leverage Markov Chain Monte Carlo sampling
to help us figure out what the shape of the posterior looks like.
(You got a taste of Monte Carlo sampling in the last notebook, right at the end!)

PyMC3's `pm.sample(n_steps)` function does the MCMC sampling for us.
Along the way, it abstracts away and automates a bunch of steps
that we would otherwise have to do on our own.
_Are you curious to know more about what happens behind-the-scenes?_
_Check out [this introductory explainer that I wrote][essays]!_

For readers and learners who can deal with a bit more jargon,
here's a bit more detail, written such that the intuition is conveyed
(without the math).

Firstly, the MCMC sampler gets initialized in an arbitrary region of the posterior distribution space.
Then, the sampler "warms up" and tries to work its way
to the ["typical set"][typicalset] of the posterior distribution.
Finally, it begins sampling around the typical set,
in this way simulating/calculating the posterior distribution.
(Warning, the "typical set" Wikipedia page linked above contains a ton of math.)

[essays]: https://ericmjl.github.io/essays-on-data-science/machine-learning/computational-bayesian-stats/
[typicalset]: https://en.wikipedia.org/wiki/Typical_set

### ArviZ

Now that inference has completed, we will obtain a `trace` object, which will contain samples from the posterior distribution.
Together, those samples form an approximation of the true posterior.

To visualize the posterior, we are going to bring in a companion tool called [ArviZ][arviz],
which provides an API that facilitates the visual exploration of Bayesian model outputs.
The output of Bayesian inferential protocols is a rich, multi-dimensional data structure,
and the ArviZ devs have spent countless hours getting the core data structure right
so that the API built around it can be intuitive and helpful.

To get started, we have to convert the PyMC3 trace object into an ArviZ `InferenceData` object.

[arviz]: https://arviz-devs.github.io/arviz/

In [None]:
import arviz as az

with model:
    trace = az.from_pymc3(trace)

Let's now inspect that the trace `InferenceData` object looks like.

In [None]:
trace

Thanks to the beautiful HTML representation of the InferenceData object, we can interactively explore it.

There are a few things to look at.

- `posterior`: Holds the posterior distribution objects. This is what we will be most commonly interacting with.
- `log_likelihood`: Holds the log-likelihood calculations that happened at each step of sampling. In the vast majority of applied cases, we will not need to dig into this.
- `sample_stats`: Holds information about the MCMC sampler at each sampling step. In the vast majority of applied cases, we will not need to dig into this.
- `observed_data`: As per the name. Can be handy when debugging. But as with the previous two, in the vast majority of applied cases, we will not need to dig into this.

### Visualize your posterior... distribution

Now, let's get a feel for the tools that are used for visualization of the posterior distribution.

The first visualization tool that we can use to get a handle over our posterior distribution is the `az.plot_posterior` function.

In [None]:
az.plot_posterior(trace);

From this plot, we can tell that once we condition our model on our data,
we believe that our parameter `p` should be centered around 0.62,
with 94% of our credibility points being allotted between the values 0.41 and 0.84.

### Credible vs. Confidence Interval

I feel compelled at this point to immediately interject a point here:
the 94% of credibility points is the "highest density interval",
or "94% credible interval".
This interval has a very direct and simple interpretation:
having conditioned our prior belief on data,
we believe with 94% probability that the true parameter value
lives within this interval range.

**The credible interval has nothing to do with the "confidence interval"**
that you may have learned in classical statistics.
When you calculated a confidence interval,
the interpretation is that
_in the limit of large number of trials $N$_,
we will calculate $N$ $\alpha \%$ confidence intervals,
and $\alpha \%$ of them will contain the true value.
Does that sound convoluted to you, because of the "large number of trials",
and "large number of confidence intervals"?
If so, you're not alone. We think that's convoluted too :).

### Inspecting the chain

Whenever we do MCMC sampling, we must always inspect the sampling trace.
The sampling trace records every single value that was _accepted_ in MCMC sampling,
which gives us the ability to inspect whether the MCMC sampling went well.

To inspect the chain, we call upon the `az.plot_trace(trace)` function.

In [None]:
az.plot_trace(trace);

What are these plots and how do we interpret them?

The left plot shows a kernel density estimate (KDE) of the posterior distribution.
The x-axis is the posterior distribution support,
while the height is the KDE likelihood estimate.

The right plot shows the trace values.
The x-axis is sampling step (we did a total of 2000 above),
and the y-axis is the sampled value.

The left plot is essentially the right plot collapsed across the time axis.

As a matter of practice, the trace should look like a "hairy caterpillar",
and should show no trends (i.e. moving upwards or downwards) anywhere.
Trends indicate that the MCMC sampler has not fully "warmed up",
and is still trying to find its way to the typical set of the posterior distribution.

## Recapping thus far

Up till this point, we have done the following:

- Specified a joint probabilistic model for coin flips and its key parameter $p$, the probability of heads, and performed inference,
- Visualized the posterior distribution of $p$

## Further Exercises

To help you get further familiarity with the basics of PyMC3, here are a few more exercises to work through.

### Exercise: Estimate rate of car crashes

Step 1: Build the probabilistic model for car crashes in PyMC3

- Car crashes, which are integer counts of things that happen with a given rate, generally follow a [Poisson] distribution.
- The Poisson distribution has a "rate" parameter `mu`, which is only allowed to be positively distributed. The [Exponential] distribution is a pragmatic choice here.

[Poisson]: https://docs.pymc.io/api/distributions/discrete.html#pymc3.distributions.discrete.Poisson

[Exponential]: https://docs.pymc.io/api/distributions/continuous.html#pymc3.distributions.continuous.Exponential

In [None]:
from bayes_tutorial.solutions.inference import car_crash_data, car_crash_model_generator

data = car_crash_data()

# This is one of an infinite set of correct answers.
car_crash_model = car_crash_model_generator()

# Comment out the line above and specify your model below


Step 2: Perform inference

In [None]:
from bayes_tutorial.solutions.inference import model_inference_answer

# This is the correct answer
trace = model_inference_answer(car_crash_model)

# Comment out the line above and write your answer below


Step 3: Inspect model trace and posteriors

In [None]:
from bayes_tutorial.solutions.inference import model_trace_answer

model_trace_answer(trace)

# Your answer below


In [None]:
from bayes_tutorial.solutions.inference import model_posterior_answer

model_posterior_answer(trace)

Having seen the data, what do we believe about the rate of car crashes per week?

In [None]:
from bayes_tutorial.solutions.inference import car_crash_interpretation

print(car_crash_interpretation())

### Exercise: Estimate finch beaks mean and variance

Step 1: Build the probabilistic model for finch beaks in PyMC3

In [None]:
from bayes_tutorial.solutions.inference import finch_beak_data

data = finch_beak_data()
data

Step 2: Perform inference

In [None]:
from bayes_tutorial.solutions.inference import finch_beak_model_generator

# This is one of an infinite set of "correct" answers:
finch_beak_model = finch_beak_model_generator()

# Your answer below:

In [None]:
# This is the "correct" answer:
trace = model_inference_answer(finch_beak_model)

# Your answer below:


Step 3: Inspect model trace and posteriors

In [None]:
# This is the "correct" answer:
model_trace_answer(trace)

# Your answer below:


In [None]:
# This is the "correct" answer:
model_posterior_answer(trace)

# Your answer below:


What do we believe about the:

- Expected beak length of a finch, and
- Intrinsic variance in beak lengths across all finches

having seen the data?

In [None]:
from bayes_tutorial.solutions.inference import finch_beak_interpretation

print(finch_beak_interpretation())

## Conclusion

This notebook only gave you an introduction to the basics.
In particular, you learned:

- How to build a probabilistic model with PyMC3.
- How to use ArviZ' basic tooling to visualize posterior distributions
- Basic wording for reporting on posterior distributions.

Believe it or not, things get more complex, and hence more exciting, beyond here!

Estimation is an extremely core activity in statistics,
and when done in a Bayesian form,
we automatically obtain uncertainties that we can _report_.
(How we use them is a different story, but stick with us to learn more!)

We're now going to continue on to the next chapter,
which is on extending the estimation model to support "multiple groups"!

In [None]:
from bayes_tutorial.solutions import inference

inference??