wildclusterboot
is a package designed to implement both clustered standard errors and the wild cluster bootstrap. This includes an implemention similar to the cluster
option available in Stata, the standard wild cluster boostrap, based on Cameron, Gelbach, and Miller (2008) along with "multi-way" versions of each.
The core of this package is implemented in C++, providing decent performance, especially if boostrapping large datasets.
If you already have R and Rstudio installed, the easiest way to install this package is via devtools
. To install devtools
, run the following code in R:
install.packages('devtools')
Once devtools
is installed, run the following code to install the package:
devtools::install_github(repo = 'mattdwebb/wildclusterboot')
The library provides two main functions. The first is clustered_se
, which can be used to get the clustered standard errors out of an lm
object. This includes both the standard, "one-way" cluster-robust standard errors, as well as "multi-way" standard errors, due to Cameron, Gelbach and Miller (2011).
An example for one-way standard errors:
#Create dataframe of random normal variables
test_data <- data.frame(Y = rnorm(10), X = rnorm(10), G = c(1,1,1,2,2,2,3,3,3,3))
#Fit model for test data
test_model <- lm(data = test_data, formula = 'Y ~ X')
#Get a vector of clustered standard errors for model
SEs <- clustered_se(model = test_model, clusterby = 'G')
And an example for multi-way standard errors:
#Create dataframe of random normal variables
test_data <- data.frame(Y = rnorm(10),
X = rnorm(10),
G = c(1,1,1,2,2,2,3,3,3,3),
H = (1, 2, 3, 1, 2, 3, 1, 2, 3))
#Fit model for test data
test_model <- lm(data = test_data, formula = 'Y ~ X')
# In this case, supply a formula containing all dimension names to cluster by both G and H
SEs <- clustered_se(model = test_model, clusterby = ~ G + H)
The second function is wild_cluster_boot
, which can be used to perform a wild clustered bootstrap on an lm object and a set of data for a given independent variable of interest. This includes the standard wild cluster boostrap, as below:
p_val <- wild_cluster_boot(data = test_data,
model = test_model,
x_interest = 'X',
clusterby = 'G',
boot_reps = 399)
It also provides a number of extensions to the original wild cluster boostrap. This includes a multi-way version:
p_val <- wild_cluster_boot(data = test_data,
model = test_model,
x_interest = 'X',
clusterby = ~ G + H,
bootby = 'G',
boot_reps = 399)
Note that the bootby
variable must be specified separately in this case, or an error will be thrown.