Bayesian Regression Models
This is an attempt to implement a brms-like library in Python.
It allows Bayesian regression models to be specified using (a subset of) the lme4 syntax. Given such a description and a pandas data frame, the library generates model code and design matrices, targeting either Pyro or NumPyro.
Here are some example formulae that the system can handle:
||Interaction between variables|
||No correlation between group coefficients|
||Grouping by multiple factors (untested)|
||Combinations of the above|
Custom priors can be specified at various levels of granularity. For example, users can specify:
- A prior to be used for every population-level coefficient.
- A prior to be used for a particular population-level coefficient. (The system is aware of the coding used for categorical columns/factors in the data frame, which allows priors to be assigned to the coefficient corresponding to a particular level of a factor.)
- A prior to be used for all columns of the standard deviation vector in every group.
- A prior to be used for all columns of the standard deviation vector in a particular group.
- A prior to be used for a particular coefficient of the standard deviation vector in a particular group.
Users can give multiple such specifications and they combine in a sensible way.
The library supports models with either (uni-variate) Gaussian or Binomial (inc. Bernoulli) distributed responses.
The Pyro back end supports both NUTS and SVI for inference. The NumPyro backend supports only NUTS.
The library includes the following functions for working with posteriors:
marginals(...): This produces a model summary similar to that obtained by doing
fit <- brm(...) ; fit$fitin brms.
fitted(...): This implements some of the functionality available in brms through the
- All formula terms must be column names. Expressions such as
I(x1*x2)are not supported.
*operator is not supported. (Though the model
y ~ 1 + x1*x2can be specified with the formula
y ~ 1 + x1 + x2 + x1:x2.)
/operator is not supported. (Though the model
y ~ ... | g1/g2can be specified with the formula
y ~ (... | g1) + (... | g1:g2).)
- The syntax for removing columns is not supported. e.g.
y ~ x - 1
- The response is always uni-variate.
- Parameters of the response distribution cannot take their values from the data. e.g. The number of trials parameter of Binomial can only be set to a constant, and cannot vary across rows of the data.
- Only a limited number of response families are supported. In particular, Categorical responses (beyond the binary case) are not supported.
- Some priors used in the generated code don't match those generated by brms. e.g. There's no Half Student-t distribution, setting prior parameters based on the data isn't supported.
- The centering data transform, performed by brms to improve sampling efficiency, is not implemented.
- This doesn't include any of the fancy stuff brms does, such as its extensions to the lme4 grouping syntax, splines, monotonic effects, GP terms, etc.
fittedfunction does not implement all of the functionality of its analogue in brms.
- There are no tools to help with MCMC diagnostics, posterior checks, hypothesis testing, etc.
- Lots more, probably...