# 8.3.5 Bayesian Additive Regression Trees

In [1]:
library(ISLR2)
set.seed(1)
train <- sample(1:nrow(Boston), nrow(Boston) / 2)

In this section we use the `BART` package, and within it the `gbart()` function, to fit a Baysian additive regression tree model to the `Boston` housing data set. The `gbart()` function is designed or quantitative outcome variables. For binary outcomes, `lbart()` and `pbart()` are available.

To run the `gbart()` function, we must first create matrices of predictors for the training and test data. We run BART with default settings.

In [2]:
library(BART)
x <- Boston[, 1:12]
y <- Boston[, "medv"]
xtrain <- x[train,]
ytrain <- y[train]
xtest <- x[-train,]
ytest <- y[-train]
set.seed(1)
bartfit <- gbart(xtrain, ytrain, x.test = xtest)

Loading required package: nlme

Loading required package: nnet

Loading required package: survival



*****Calling gbart: type=1
*****Data:
data:n,p,np: 253, 12, 253
y1,yn: 0.213439, -5.486561
x1,x[n*p]: 0.109590, 20.080000
xp1,xp[np*p]: 0.027310, 7.880000
*****Number of Trees: 200
*****Number of Cut Points: 100 ... 100
*****burn,nd,thin: 100,1000,1
*****Prior:beta,alpha,tau,nu,lambda,offset: 2,0.95,0.795495,3,3.71636,21.7866
*****sigma: 4.367914
*****w (weights): 1.000000 ... 1.000000
*****Dirichlet:sparse,theta,omega,a,b,rho,augment: 0,0,1,0.5,1,12,0
*****printevery: 100

MCMC
done 0 (out of 1100)
done 100 (out of 1100)
done 200 (out of 1100)
done 300 (out of 1100)
done 400 (out of 1100)
done 500 (out of 1100)
done 600 (out of 1100)
done 700 (out of 1100)
done 800 (out of 1100)
done 900 (out of 1100)
done 1000 (out of 1100)
time: 3s
trcnt,tecnt: 1000,1000


Next we compute the test error.

In [3]:
yhat.bart <- bartfit$yhat.test.mean
mean((ytest - yhat.bart)^2)

On this data set, the test error of BART is lower than the test error of random forests and boosting.

Now we can check how many times each variable appeared in the collection of trees.

In [4]:
ord <- order(bartfit$varcount.mean, decreasing = T)
bartfit$varcount.mean[ord]