An R package to perform projection predictive variable selection for generalized linear models. Compatible with rstanarm and brms but other reference models can also be used.
The method is described in detail in Piironen et al. (2018) and evaluated in comparison to many other methods in Piironen and Vehtari (2017).
Currently, the supported models (family objects in R) include Gaussian, Binomial and Poisson families. See the quickstart-vignette for examples.
- mc-stan.org/projpred (online documentation, vignettes)
- Ask a question (Stan Forums on Discourse)
- Open an issue (GitHub issues for bug reports, feature requests)
- Install the latest release from CRAN:
install.packages('projpred')
- Install latest development version from GitHub (requires devtools package):
if (!require(devtools)) {
install.packages("devtools")
library(devtools)
}
devtools::install_github('stan-dev/projpred', build_vignettes = TRUE)
rm(list=ls())
library(projpred)
library(rstanarm)
options(mc.cores = parallel::detectCores())
set.seed(1)
# Gaussian and Binomial examples from the glmnet-package
data('df_gaussian', package = 'projpred')
#data('df_binom', package = 'projpred')
# fit the full model with a sparsifying prior
fit <- stan_glm(y ~ x, family = gaussian(), data = df_gaussian,
prior = hs(df = 1, global_scale=0.01), iter = 500, seed = 1)
#fit <- stan_glm(y ~ x, family = binomial(), data = df_binom
# prior = hs(df = 1, global_scale=0.01), iter = 500, seed = 1)
# perform the variable selection
vs <- varsel(fit)
# print the results
varsel_stats(vs)
# project the parameters for model sizes nv = 3,5 variables
projs <- project(vs, nv = c(3, 5))
# predict using only the 5 most relevant variables
pred <- proj_linpred(vs, xnew=df_gaussian$x, nv=5, integrated=T)
# perform cross-validation for the variable selection
cvs <- cv_varsel(fit, cv_method='LOO')
# plot the validation results
varsel_plot(cvs)
Dupuis, J. A. and Robert, C. P. (2003). Variable selection in qualitative models via an entropic explanatory power. Journal of Statistical Planning and Inference, 111(1-2):77–94.
Goutis, C. and Robert, C. P. (1998). Model choice in generalised linear models: a Bayesian approach via Kullback–Leibler projections. Biometrika, 85(1):29–37.
Piironen, Juho and Vehtari, Aki (2017). Comparison of Bayesian predictive methods for model selection. Statistics and Computing, 27(3):711-735. doi:10.1007/s11222-016-9649-y. (online).
Piironen, Juho, Paasiniemi, Markus and Vehtari, Aki (2018). Projective inference in high-dimensional problems: prediction and feature selection. (preprint).