Skip to content
/ rART Public

R Package for Approximate Randomization Tests with a Small Number of Clusters

License

Notifications You must be signed in to change notification settings

mwt/rART

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rART

The rART package provides functions for Approximate Randomization Tests with a Small Number of Clusters introduced in “Randomization tests under an approximate symmetry assumption” (Canay et al. 2017) and further described in “A User’s Guide to Approximate Randomization Tests with a Small Number of Clusters” (Cai et al. 2022).

Installation

You can install rART using the following command:

devtools::install_github("mwt/rART")

Example

The package is centered around a single function artlm which runs the main regression. Several companion functions are provided in order to use the method. In order to demonstrate this, we generate some data.

cn <- 100 # cluster size
nc <- 10   # number of clusters

x1 <- rnorm(cn * nc, sd = 2)
x2 <- rnorm(cn * nc, mean = x1)
x3 <- runif(cn * nc)

eps <- matrix(runif(cn * nc), cn, nc)

# make some correlation within groups
eps <- as.vector(eps + eps[sample(cn),] + eps[sample(cn),])
group = factor(rep(1:nc, each = cn))

# true model
y <- 1 + 10*x1 + x2/20 + eps

df <- data.frame(y, x1, x2, x3, group)

head(df)
#>           y         x1          x2        x3 group
#> 1 -8.990301 -1.1209513 -2.11675002 0.3044642     1
#> 2 -1.943071 -0.4603550 -1.50031002 0.8328188     1
#> 3 34.619987  3.1174166  3.09943639 0.5936475     1
#> 4  3.566540  0.1410168  0.00884165 0.8071966     1
#> 5  4.400315  0.2585755 -2.29076730 0.2940508     1
#> 6 37.156422  3.4301300  4.47070343 0.1410852     1

Regression

The linear ART regression is artlm. It is exactly the same as lm. However, it requires a cluster variable or vector to be specified.

(artlm1 <- artlm(y ~ x1 + x2, cluster=group, data=df))
#> 
#> Call:
#> artlm(formula = y ~ x1 + x2, data = df, cluster = group)
#> 
#> Coefficients:
#> (Intercept)           x1           x2  
#>     2.51695     10.02792      0.03328

It supports all the features that lm supports. For example, you can add fixed effects.

(artlm2 <- artlm(y ~ x1 + x2 + group - 1, cluster=group, data=df))
#> 
#> Call:
#> artlm(formula = y ~ x1 + x2 + group - 1, data = df, cluster = group)
#> 
#> Coefficients:
#>       x1        x2    group1    group2    group3    group4    group5    group6  
#> 10.02793   0.03229   2.58429   2.53483   2.43459   2.43422   2.51628   2.59488  
#>   group7    group8    group9   group10  
#>  2.36401   2.55289   2.53291   2.62133

You can also specify that you are only interested in a subset of the variables. For example, suppose I am only interested in x1.

(artlm3 <- artlm(y ~ x1 + x2 + group - 1, cluster=group, select = "x1", data=df))
#> 
#> Call:
#> artlm(formula = y ~ x1 + x2 + group - 1, data = df, cluster = group, 
#>     select = "x1")
#> 
#> Coefficients:
#>       x1        x2    group1    group2    group3    group4    group5    group6  
#> 10.02793   0.03229   2.58429   2.53483   2.43459   2.43422   2.51628   2.59488  
#>   group7    group8    group9   group10  
#>  2.36401   2.55289   2.53291   2.62133

Including the select option allows you to choose the parameters you are interested in. The result will not display in the regression object itself. However, chosen variables will not de displayed in summary etc.

Summarizing models

You can run the simplest form of ART by running summary on any regression object. For example, we can apply it to our weighted fixed effects regression.

summary(artlm2)
#> 
#> Call:
#> artlm(formula = y ~ x1 + x2 + group - 1, data = df, cluster = group)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.49058 -0.38165  0.01422  0.37043  1.30953 
#> 
#> Coefficients:
#>    Estimate Crit. value t value Pr(>|t|)   
#> x1 10.02793    19.03458  31.700  0.00195 **
#> x2  0.03229     0.11896   0.113  0.07031 . 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.504 on 988 degrees of freedom
#> Multiple R-squared:  0.9994, Adjusted R-squared:  0.9994 
#> F-statistic: 1.332e+05 on 12 and 988 DF,  p-value: < 2.2e-16

This gives us the OLS estimates, the t-statistic, the p-value, and the critical value of the t-test for 95% confidence.

If we summarize the regression where we selected to view only x1, then we will only see this one variable in the summary. Note that no selection is required to ignore the group fixed effects because ART cannot be applied to these coefficients.

summary(artlm3)
#> 
#> Call:
#> artlm(formula = y ~ x1 + x2 + group - 1, data = df, cluster = group, 
#>     select = "x1")
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.49058 -0.38165  0.01422  0.37043  1.30953 
#> 
#> Coefficients:
#>    Estimate Crit. value t value Pr(>|t|)   
#> x1    10.03       19.03    31.7  0.00195 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.504 on 988 degrees of freedom
#> Multiple R-squared:  0.9994, Adjusted R-squared:  0.9994 
#> F-statistic: 1.332e+05 on 12 and 988 DF,  p-value: < 2.2e-16

Confidence intervals

Confidence intervals are computed using the confint command as usual.

confint(artlm2)
#>           2.5 %      97.5 %
#> x1  9.987563195 10.06423501
#> x2 -0.003333396  0.07527805

You can adjust the level of significance using the level parameter.

confint(artlm2, level = 0.98)
#>            1 %        99 %
#> x1  9.97521696 10.07359180
#> x2 -0.01446102  0.08521455

The command will only display selected variables. So, if we use artlm3 where we selected only x1, then we will only see the confidence interval for this one term. More importantly, the function will not compute intervals for unselected parameters.

confint(artlm3)
#>     2.5 %    97.5 % 
#>  9.987563 10.064235

It is also possible to select variables in confint instead of in the regression.

confint(artlm2, parm = "x2")
#>        2.5 %       97.5 % 
#> -0.003333396  0.075278053

Linear tests

You can conduct arbitrary linear tests using ARTHypothesis. For example, suppose I wanted to test to see if β1 = β2. Then, I can run.

ARTHypothesis(artlm2, "x1 = x2")
#> Linear ART hypothesis test
#> 
#> Hypothesis:
#> x1 - x2 = 0
#> 
#>   Crit. value t value Pr(>|t|)   
#> 1      18.986  31.587 0.001953 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

You can also supply the constraint vector and constant value manually like so.

ARTHypothesis(artlm2, c(1,-1), 0)
#> Linear ART hypothesis test
#> 
#> Hypothesis:
#> x1 - x2 = 0
#> 
#>   Crit. value t value Pr(>|t|)   
#> 1      18.986  31.587 0.001953 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

You cannot construct a linear test using a parameter that was not selected in the original regression.

About

R Package for Approximate Randomization Tests with a Small Number of Clusters

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages