In [None]:
# ! pip install --upgrade pip
# ! pip install --user numpy scipy matplotlib pyhf iminuit json corner
# ! pip install git+https://github.com/malin-horstmann/bayesian_pyhf.git

In [None]:
import numpy as np
import pyhf
import matplotlib.pyplot as plt
from bayesian_pyhf import infer, prepare_inference, plotting
import pymc as pm
import corner
import arviz as az

# Bayesian analysis with `pyhf`

The main job of `pyhf` is to supply a HistFactory likelihood template,

$$
        L(\boldsymbol{n}, \boldsymbol{a} \mid \boldsymbol{\eta}, \boldsymbol{\chi})=\underbrace{\operatorname{Pois}\left(
        \boldsymbol{n} \mid
        \boldsymbol{\nu}(\boldsymbol{\eta}, \boldsymbol{\chi})
        \right)}_{\text{data likelihood}}
        \underbrace{c\left(\boldsymbol{a} \mid \boldsymbol{\chi} \right)}_{\text{constraint likelihood}}.
$$

In addition, we have seen other features, which allow for its minimization, etc.

The likelihood is a very general object, used for both frequentist and Bayesian analyses.
The constraint likelihood is the frequentist way of injecting prior knowledge for certain parameters into the full likelihood.

Due to limited computational power in the past, HEP has widely commited to frequentist methods, but situations exist, where the commonly used tools are not the most handy. For example, dealing with multiple parameters of interest leads to non-trivial behaviour and the standard asymptotic formulae do not apply. Physically bounded parameters also introduce issues for expected asymptotic behaviours.

Bayesian methods have a slightly different approach to statistical inference. For example, a more handy way of dealing with multiple parameters of interest and less reliance on asymptotics for more complicated inference. The main object of a Bayesian analysis is the posterior distribution for a set of parameters $\boldsymbol{\theta}$, given some data $\boldsymbol{x}$,

$$
p(\boldsymbol{\theta} \mid \boldsymbol{x}) \propto p(\boldsymbol{x} \mid \boldsymbol{\theta}) p(\boldsymbol{\theta}),
$$

where $p(\boldsymbol{x} \mid \boldsymbol{\theta})$ is the likelihood and $p(\boldsymbol{\theta})$ are priors for our parameters. Any further statistical quantity can be derived from the posterior.

Using `pyhf` HistFactory likelihoods to obtain a posterior is as simple as translating the constriant likelihood into priors and introducing additional priors for the unconstrained parameters,

$$
    p\left( \boldsymbol{\eta}, \boldsymbol{\chi} \vert \boldsymbol{n}, \boldsymbol{a} \right) \propto
    \underbrace{\operatorname{Pois}\left(
    \boldsymbol{n} \mid
    \boldsymbol{\nu}(\boldsymbol{\eta}, \boldsymbol{\chi})
    \right)}_{\text{data likelihood}}
    \quad
    \underbrace{p\left( \boldsymbol{\chi} | \boldsymbol{a} \right)}_{\text{constraint prior}}
    \quad
    \underbrace{p\left( \boldsymbol{\eta} \right)}_{\text{unconstraint prior}}.
$$

<span style="color:red">
How would you construct the constraint prior?
</span>

A tool for this translation exists: [bayesian_pyhf](https://github.com/malin-horstmann/bayesian_pyhf) which is explained in detail [here](https://arxiv.org/pdf/2309.17005). We will explore it here on a simple example.

In [None]:
model = pyhf.simplemodels.correlated_background(
            signal=[50, 100], bkg=[500, 600], bkg_down=[490, 580], bkg_up=[510, 620]
        )

data = [600, 800]

In [None]:
# model = pyhf.simplemodels.uncorrelated_background(
#             signal=[50, 100], bkg=[500, 600], bkg_uncertainty=[20, 20]
#         )

# data = [600, 800]

In [None]:
model.spec

## 1. Fit `pyhf` model

Before evaluating the next cell: What result do you expect?

In [None]:
pyhf.set_backend("jax", pyhf.optimize.minuit_optimizer())
best_fit = pyhf.infer.mle.fit(data+model.config.auxdata, model, return_uncertainties=True)
best_fit.tolist()

## 2. Bayesian inference

One of the most controversial choices in a Bayesian analysis is the choice of a good prior. There are as many opinions on priors as Bayesian analysts (see eg. [here](https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations)).

<span style="color:red">
What prior would you chose for our signal strength?
</span>

In [None]:
unconstr_priors = {
    'mu': {'type': 'Normal', 'mu': [1.], 'sigma': [5.]},
    # 'mu': {'type': 'Uniform_Unconstrained', 'lower': [1.], 'upper': [3.]},
}

The cool thing is that the auxiliary data, used to constrain parameters in the HistFactory likelihood can be used to build the constraint prior
$$
p(\chi \mid a) \propto p(a \mid \chi) p(\chi).
$$

For one Gaussian distributed auxiliary measurement,
$$
p(a \mid \chi) = \mathcal{N}(a, \sigma_{\mathrm{aux}}), \qquad
p(\chi) = \mathcal{N}(\mu_0, \sigma_0).
$$
Where $\mu_0, \sigma_0$ are free choices for the ur-prior.

The constraint prior can then be derived analytically as
$$
p(\chi \mid a) = \mathcal{N}(\mu', \sigma'),
$$
where
$$
\mu^{\prime}=\frac{\sigma_{\mathrm{aux}}^2 \sigma_0^2}{\sigma_{\mathrm{aux}}^2+\sigma_0^2}\left(\frac{\mu_0}{\sigma_0^2}+\frac{a}{\sigma_{\mathrm{aux}}^2}\right), \quad \sigma^{\prime}=\frac{\sigma_{\mathrm{aux}}^2 \sigma_0^2}{\sigma_{\mathrm{aux}}^2+\sigma_0^2}
$$

The freedom to choose these compatible ur-priors is justified as the priors will be dominated by the auxiliary measurements. Especially,
$$
\sigma_{\mathrm{ur}} \gg \sigma_{\mathrm{aux}} \quad \longrightarrow \quad \mu^{\prime} \rightarrow a, \quad \sigma^{\prime} \rightarrow \sigma_{\mathrm{aux}}.
$$

For more details, and the equivalent derivation for the Poisson (Gamma) distribution, see [bayesian pyhf paper](https://arxiv.org/pdf/2309.17005) or the [original paper](https://www.cs.ubc.ca/~murphyk/Papers/bayesGauss.pdf).

Let us look at the priors of our simple model:


In [None]:
priorDict = prepare_inference.build_priorDict(model, unconstr_priors)
priorDict

<span style="color:red">
Check this for our background uncertainty:

Hint: in `bayesian_pyhf` $\mu_0= 0, \sigma_0=2$
</span>

In [None]:
# defaults set in bayesian_pyhf
mu_0 = 0
sigma_0 = 2

aux_obs = model.config.auxdata[0]
sugma_aux = float(model.constraint_model.constraints_gaussian.sigmas[0])

sigma_prime = ...
mu_prime = ...

mu_prime, sigma_prime

The sampling process is by far the most advanced step in a Bayesian analysis. For a long time Bayesian analyses have been limited by computational power. Nowadays, a number of libraries exist that implement very advanced sampling algorithms ([see here for a good overview]). Here we use [pymc](https://www.pymc.io/welcome.html) which makes use of Hamiltonian MC sampling ([more details here](https://arxiv.org/pdf/1701.02434)).

First we find the mode of the posterior, using a Maximum A Posteriori (MAP) estimate. This can be compared to the best fit point of a frequentist analysis.

Then we perform a minimal sampling, with 10000 draws and 1000 tune in samples in 4 chains. The chains are automatically distributed across different cores (unless otherwise specified).

<span style="color:red">

How many samples are enough?

Why are multiple chains important?

</span>

In [None]:
with infer.model(model, unconstr_priors, data):
    MAP = pm.find_MAP()
    post_data = pm.sample(draws=10000, tune=1000, chains=4)


<span style="color:red">
Does the mode of the posterior agree with the best fit point we found earlier?
</span>

In [None]:
...

## Visualizing the posterior

A common way to visualize a posterior is through **corner plots**. Usually, one only plots the marginal posterior,
$$
p\left( \boldsymbol{\eta} \vert \boldsymbol{x} \right) = \int d\boldsymbol{\chi} ~  \ p\left( \boldsymbol{\eta}, \boldsymbol{\chi} \vert \boldsymbol{x} \right)
$$
where we integrate out all nuisance parameters and keep only parameters of interest. Here we only have two parameters, so we plot all.

<span style="color:red">

Where does our fest fit point lie on this plot?

Are the parameters correlated? If so, so you understand the correlation?

</span>

In [None]:
corner.corner(post_data.posterior);

## Comparing uncertainties

The posterior is the central object of any Bayesian analysis. Conventions are slightly different compared to frequentist analyses. **Iff** the posterior distribution is Gaussian, it makes sense to quote results as $\mu \pm \sigma$.

Below is a summary of results derived from the posterior samples.

<span style="color:red">

Can you read off $\mu$ and $\sigma$, derived from the posterior?

Is the mode of a marginal posterior the same as the peaks of the posterior? If not, why not?

Does the distribution of $\mu$ deviate from a Gaussian distribution?

</span>

In [None]:
az.summary(post_data.posterior, hdi_prob=0.68)

In [None]:
best_fit

In [None]:
MAP