In [1]:
%matplotlib inline

In [2]:
%run notebook_setup

  get_ipython().magic('config InlineBackend.figure_format = "retina"')


# Sampling

`pymc3-ext` comes with some functions to make sampling more flexible in some cases and improve the default parameter choices for the types of problems encountered in astrophysics.
These features are accessed through the `pymc3_ext.sample` function that behaves mostly like the `pymc3.sample` function with a couple of different arguments.
The two main differences for all users is that the `pymc3_ext.sample` function defaults to a target acceptance fraction of `0.9` (which will be better for many models in astrophysics) and to adapting a full dense mass matrix (instead of diagonal).
Therefore, if there are covariances between parameters, this method will generally perform better than the PyMC3 defaults.

## Correlated parameters

A thorough discussion of this [can be found elsewhere online](https://dfm.io/posts/pymc3-mass-matrix/), but here is a simple demo where we sample a covariant Gaussian using `pymc3_ext.sample`.

First, we generate a random positive definite covariance matrix for the Gaussian:

In [3]:
import numpy as np

ndim = 5
np.random.seed(42)
L = np.random.randn(ndim, ndim)
L[np.diag_indices_from(L)] = 0.1 * np.exp(L[np.diag_indices_from(L)])
L[np.triu_indices_from(L, 1)] = 0.0
cov = np.dot(L, L.T)

And then we can set up this model using PyMC3:

In [4]:
import pymc3 as pm

with pm.Model() as model:
    pm.MvNormal("x", mu=np.zeros(ndim), chol=L, shape=ndim)

If we sample this using PyMC3 default sampling method, things don't go so well (we're only doing a small number of steps because we don't want it to take forever, but things don't get better if you run for longer!):

In [5]:
with model:
    trace = pm.sample(tune=500, draws=500, chains=2, cores=2)

Auto-assigning NUTS sampler...


Initializing NUTS using jitter+adapt_diag...


Multiprocess sampling (2 chains in 2 jobs)


NUTS: [x]


Sampling 2 chains for 500 tune and 500 draw iterations (1_000 + 1_000 draws total) took 83 seconds.


The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.


There were 104 divergences after tuning. Increase `target_accept` or reparameterize.


The acceptance probability does not match the target. It is 0.5659769070760404, but should be close to 0.8. Try to increase the number of tuning steps.


The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.


The estimated number of effective samples is smaller than 200 for some parameters.


But, we can use `pymc3_ext.sample` as a drop in replacement to get much better performance:

In [6]:
import pymc3_ext as pmx

with model:
    tracex = pmx.sample(tune=1000, draws=1000, chains=2, cores=2)

Multiprocess sampling (2 chains in 2 jobs)


NUTS: [x]


Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 5 seconds.


As you can see, this is substantially faster (even though we generated twice as many samples).

We can compare the sampling summaries to confirm that the default method did not produce reliable results in this case, while the `pymc3_ext` version did:

In [7]:
import arviz as az

az.summary(trace).head()

Got error No model on context stack. trying to find log_likelihood in translation.


Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
x[0],0.086,0.126,-0.228,0.241,0.044,0.032,8.0,114.0,1.95
x[1],-0.087,0.427,-0.819,0.945,0.06,0.09,38.0,78.0,1.94
x[2],-0.329,0.485,-1.053,0.81,0.155,0.113,9.0,126.0,1.96
x[3],-0.503,0.875,-1.731,1.72,0.231,0.167,20.0,117.0,1.79
x[4],0.923,1.491,-2.136,2.904,0.653,0.49,7.0,61.0,1.89


In [8]:
az.summary(tracex).head()

Got error No model on context stack. trying to find log_likelihood in translation.


Unnamed: 0,mean,sd,hdi_3%,hdi_97%,mcse_mean,mcse_sd,ess_bulk,ess_tail,r_hat
x[0],-0.003,0.167,-0.314,0.301,0.004,0.004,2125.0,1367.0,1.0
x[1],-0.002,0.553,-1.114,0.986,0.012,0.013,2165.0,1384.0,1.0
x[2],0.017,0.665,-1.291,1.176,0.014,0.014,2144.0,1600.0,1.0
x[3],0.031,1.193,-2.124,2.325,0.026,0.025,2104.0,1611.0,1.0
x[4],0.001,2.105,-3.811,3.935,0.044,0.048,2256.0,1402.0,1.0


In this particular case, you could get similar performance using the `init="adapt_full"` argument to the `sample` function in PyMC3, but the implementation in `pymc3-ext` is somewhat more flexible.
Specifically, `pymc3_ext` implements a tuning procedure that it more similar to [the one implemented by the Stan project](https://mc-stan.org/docs/2_24/reference-manual/hmc-algorithm-parameters.html).
The relevant parameters are:

- `warmup_window`: The length of the initial "fast" window. This is called "initial buffer" in the Stan docs.
- `adapt_window`: The length of the initial "slow" window. This is called "window" in the Stan docs.
- `cooldown_window`: The length of the final "fast" window. This is called "term buffer" in the Stan docs.

Unlike the Stan implementation, here we have support for updating the mass matrix estimate every `recompute_interval` steps based on the previous window and all the steps in the current window so far.
This can improve warm up performance substantially so the default value is `1`, but this might be intractable for high dimensional models.
To only recompute the estimate at the end of each window, set `recompute_interval=0`.

If you run into numerical issues, you can try increasing `adapt_window` or use the `regularization_steps`and `regularization_variance` to regularize the mass matrix estimator.
The `regularization_steps` parameter sets the effective number of steps that are used for regularization and `regularization_variance` is the effective variance for those steps.

## Parameter groups

If you are fitting a model with a large number of parameters, it might not be computationally or numerically tractable to estimate the full dense mass matrix.
But, sometimes you might know something about the covariance structure of the problem that you can exploit.
Perhaps some parameters are correlated with each other, but not with others.
In this case, you can use the `parameter_groups` argument to exploit this structure.

Here is an example where `x`, `y`, and `z` are all independent with different covariance structure.
We can take advantage of this structure using `pmx.ParameterGroup` specifications in the `parameter_groups` argument.
Note that by default each group will internally estimate a dense mass matrix, but here we specifically only estimate a diagonal mass matrix for `z`.

In [9]:
with pm.Model():
    x = pm.MvNormal("x", mu=np.zeros(ndim), chol=L, shape=ndim)
    y = pm.MvNormal("y", mu=np.zeros(ndim), chol=L, shape=ndim)
    z = pm.Normal("z", shape=ndim)  # Uncorrelated

    tracex2 = pmx.sample(
        tune=1000,
        draws=1000,
        chains=2,
        cores=2,
        parameter_groups=[
            [x],
            [y],
            pmx.ParameterGroup([z], "diag"),
        ],
    )

Multiprocess sampling (2 chains in 2 jobs)


NUTS: [z, y, x]


Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 8 seconds.
