Welcome to version 2.0 of the flexBART package! flexBART (>= 2.0.0) is a new implementation of BART and VCBART that is designed to fit flexible varying coefficient models using ensembles of binary regression trees. In addition to the flexible priors for categorical decision rules introduced in earlier versions, this new version introduces a formula interface and implements a lot of data pre-processing that (hopefully) makes it easier than ever fit BART models.
It is highly recommended that you install R version 4.0.0 or later before installing flexBART. Before installing flexBART, ensure that you have set up an appropriate C++ toolchain for your system.
- For macOS: we recommend using the macrtools package
- For Windows: we recommend using Rtools, which can be downloaded here. Please make sure you download the version of Rtools that corresponds to your R version (e.g., RTools45 for R version 4.5.x)
- For Linux: we recommend following these instructions from the Stan development team.
Once your C++ toolchain is configured, you can install flexBART using devtools::install_github:
devtools::install_github(repo = "skdeshpande91/flexBART")
Starting in version 2.0.0, flexBART features a formula interface and allows users to pass their data as data.frame or tibble objects.
So, given a data frame train_data containing named columns for an outcome (e.g., Y) and predictors, you can fit a simple BART model to predict Y using all the predictors by running
flexBART(formula = Y ~ bart(.), train_data = train_data)
flexBART also supports fitting VCBART models of the form
where each coefficient function Y ~ bart(.) + Z1 * bart(.) + Z2 * bart(.), including a separate bart() for each coefficient function.
The formula interface also provides fine control over the predictor variables used in each ensemble.
To allow an ensemble to only split on a few variables (e.g., X1, X2, and X3), you would specify bart(X1 + X2 + X3) and to allow an ensemble to split on all variables except X1 and X2, you would specify bart(.-X1-X2).
Note that when it detects multiple ensembles in the formula, flexBART will not include any of the .
So, to include, say, a piecewise linear function, X1 * bart(X1) in the formula argument.
By default, flexBART simulates 4 Markov chains with 2,000 iterations each and discards the first 1,000 iterations as "burn-in."
The numbers of chains, burn-in iterations, and post-burn-in iterations can be adjusted using the optional arguments n.chains, burn, and nd.
Like earlier version (e.g., 1.2.0 and earlier), the latest version of flexBART assumes that all continuous predictors are re-scaled to the interval [-1,1] and represents the distinct values of categorical predictors with non-negative integers. But unlike those earlier versions, which required users to perform such re-scaling and conversion themselves, flexBART now automates the pre-processing.
Internally, flexBART treats all predictors passed as a factor or character as categorical.
It then checks whether each numerical predictor is discrete (e.g., age measured in years) or whether it is continuous by looking at the number of pairwise differences between consecutive values.
Decision rules based on numerical predictors take the form
If flexBART detects that
If, on the other hand, flexBART determines that
In flexBART ensembles, decision rules based on categorical predictors take the form
Internally, flexBART determines the set of available values of levels() of all predictors saved as factor() variables.
As a result, flexBART is able to make predictions at new values of a categorical predictor not present in the training data so long as these values are included as levels of that predictor.
flexBART also includes support for network-structured categorical predictors (e.g., spatial areas with known adjacency structure).
To force the "cutset" adjacency_list argument.
This argument should be a named list with one element per network-structured predictor.
Each element should be a binary or weighted adjacency matrix whose row and column names correspond to the levels of the predictor.
flexBART implements four different priors over decision rues for network-structured predictors. See the documentation and Section 3.2 of Deshpande (2024) for details about these priors.