This is a Cython port of Jesse Windle's code at https://github.com/jwindle/BayesLogit. It provides a Python interface for efficiently sampling Pólya-gamma random variates. Install with:
pip install pypolyagamma
Please open issues if you have any trouble!
Pólya-gamma augmentation is a method of performing fast and simple Bayesian inference in models with Gaussian latent variables and count observations. While such models are non-conjugate, if it has the right form (specifically, if it is a Bernoulli, binomial, negative binomial, or multinomial with a logistic link function), we can introduce a set of Pólya-gamma auxiliary variables that render it conditionally conjugate. This facilitates fast Gibbs sampling algorithms on an extended space of Gaussian latent variables and Pólya-gamma auxiliary variables, where integrating out the auxiliary variables leaves the original model intact.
Given the auxiliary variables, the latent Gaussian variables have a Gaussian conditional distribution. Likewise, given the Gaussian latent variables and the observed count data, the auxiliary variables have a Pólya-gamma conditional distribution. Thus, to implement the Gibbs sampling algorithm, we must be able to efficiently sample Pólya-gamma random variates. This library provides code to do exactly that.
The augmented density, the non-Gaussian marginal, and the Gaussian conditionals are illustrated in the figure below. In this case, the posterior is from a simple binomial model. Next, we'll show how to perform Gibbs sampling for such a model.
See below for more references and links.
For convenience, we have created classes for simple count regression models, like the Bernoulli, binomial, negative binomial, and multinomial (with stick breaking) observation models. For example, you can fit a Bernoulli regression as follows:
from pypolyagamma import BernoulliRegression D_out = 1 # Output dimension D_in = 2 # Input dimension reg = BernoulliRegression(D_out, D_in) # Given X, an NxD_in array of real-valued inputs, # and Y, and NxD_out array of binary observations. Fit # the linear model y_n ~ Bern(sigma(A x_n + b)), # where sigma is the logistic function. A is D_out x D_in, # b is D_out x 1, and the entries in y_n are cond. indep. samples =  for _ in range(100): reg.resample((X,Y)) samples.append((reg.A, reg.b))
Under the hood, this will instantiate Pólya-gamma auxiliary variables and perform conditionally-conjugate Gibbs sampling. We can visualize the inferred parameters in terms of the implied probability for each point in the input space.
Here's how you can manually perform inference in a simple binomial model
N=10 counts and probability
a standard normal prior on
First, sample a count from the model:
from pypolyagamma import logistic, PyPolyaGamma # Consider a simple binomial model with unknown probability # Model the probability as the logistic of a scalar Gaussian. N = 10 mu = 0.0 sigmasq = 1.0 x_true = npr.normal(mu, np.sqrt(sigmasq)) p_true = logistic(x_true) y = npr.binomial(N, p_true)
Now we can run a Gibbs sampler to estimate the posterior
# Gibbs sample the posterior distribution p(x | y) # Introduce PG(N,0) auxiliary variables to render # the model conjugate. First, initialize the PG # sampler and the model parameters. N_samples = 10000 pg = PyPolyaGamma(seed=0) xs = np.zeros(N_samples) omegas = np.ones(N_samples) # Now run the Gibbs sampler for i in range(1, N_samples): # Sample omega given x, y from its PG conditional omegas[i] = pg.pgdraw(N, xs[i-1]) # Sample x given omega, y from its Gaussian conditional sigmasq_hat = 1./(1. / sigmasq + omegas[i]) mu_hat = sigmasq_hat * (mu / sigmasq + (y - N / 2.)) xs[i] = npr.normal(mu_hat, np.sqrt(sigmasq_hat))
If you're a developer, you can also install from source:
git clone firstname.lastname@example.org:slinderman/pypolyagamma.git cd pypolyagamma pip install -e .
To check if it worked, run:
If all the tests pass then you're good to go!
Under the hood, the installer will download
untar it, and place it in
deps/gsl. It will then configure GSL and
compile the Pólya-gamma code along with the required GSL source files.
This way, you don't need GSL to be installed and available on your
Parallel sampling with OpenMP
By default, the simple installation above will not support
parallel sampling. If you are compiling with GNU
you can enable OpenMP support with the flag:
USE_OPENMP=True pip install -e .
Mac users: you can install
g++ with Homebrew. Just
make sure that they are your default compilers, e.g. by setting
the environment variables
CXX to point to the GNU versions
g++, respectively. With Homebrew, these versions
will be in
/usr/local/bin by default.
To sample in parallel, call the
n = 10 # Number of variates to sample b = np.ones(n) # Vector of shape parameters c = np.zeros(n) # Vector of tilting parameters out = np.empty(n) # Outputs # Construct a set of PolyaGamma objects for sampling nthreads = 8 seeds = np.random.randint(2**16, size=nthreads) ppgs = [pypolyagamma.PyPolyaGamma(seed) for seed in seeds] # Sample in parallel pypolyagamma.pgdrawvpar(ppgs, b, c, out)
If you haven't installed with OpenMP, this function will revert to the serial sampler.
Polson, Nicholas G., James G. Scott, and Jesse Windle. "Bayesian inference for logistic models using Pólya–Gamma latent variables." Journal of the American statistical Association 108.504 (2013): 1339-1349.
Linderman, Scott, Matthew Johnson, and Ryan P. Adams. "Dependent Multinomial Models Made Easy: Stick-Breaking with the Polya-gamma Augmentation." Advances in Neural Information Processing Systems (NIPS). 2015. Check out our github repo, pgmult
Linderman, Scott W., Ryan P. Adams, and Jonathan W. Pillow. "Bayesian latent structure discovery from multi-neuron recordings." Advances in Neural Information Processing Systems (NIPS) 2016. Check out our github repo, pyglm)
Linderman, Scott W., Matthew Johnson, Andrew C. Miller, Ryan P. Adams, David M. Blei, and Liam Paninski. "Bayesian learning and inference in recurrent switching linear dynamical systems." Artificial Intelligence and Statistics (AISTATS) 2017.