simpr
provides a general, simple, and tidyverse-friendly framework for
generating simulated data, fitting models on simulations, and tidying
model results. The full workflow can happen in a single tidy pipeline
without creating external functions, global values, or using loops. It’s
useful for power analysis, design analysis, simulation studies, and for
teaching statistics.
-
Easily readable simulation specifications. You can specify simulations in a few lines, including referring to other simulation variables and to simulation parameters that you’re varying (such as sample size).
simpr
takes care of all the details of generating your simulation across varying parameters. -
Sensibly handle errors.
simpr
has various options to keep going even when simulation or model-fitting hits errors, so that you don’t need to start over if a single iteration hits fatal numerical issues. -
Reproducible workflows. Individual simulations can be reproduced exactly without needed to perform the whole simulation again.
-
Easy-to-use parallel processing. Building on
furrr
, parallel processing forsimpr
can usually be turned on with a couple lines of code.
The hardest part of any simulation is designing the data-generating
process and deciding what values of parameters you want to explore.
simpr
takes care of the rest so you can focus on these central issues.
## Install stable CRAN version
install.packages("simpr")
## Install latest development version
remotes::install_github("statisfactions/simpr")
library(simpr)
The simpr
workflow, inspired by the
infer
package, distills a simulation
study into five primary steps:
-
specify()
your data-generating process -
define()
parameters that you want to systematically vary across your simulation design (e.g. n, effect size) -
generate()
the simulation data -
fit()
models to your data (e.g.lm()
) -
tidy_fits()
for further processing usingbroom::tidy()
, such as computing power or Type I Error rates
simpr
makes no assumptions about your data and is not specialized to
any particular type of data generating process or model. If R can
generate it and if R can fit models, you can use simpr
to run your
simulation. (The tidying step is limited by the models supported
broom::tidy()
, although you can also supply your own tidying function
or expression.)
Suppose we are calculating the power for a two-sample t-test where the
data is log-normally distributed, which can be generated by
stats::rlnorm()
.
set.seed(100)
## Data-generating mechanism
specify(a = ~ rlnorm(n, mean = 0),
b = ~ rlnorm(n, mean = 0.5)) %>%
## Vary n from 30 to 100
define(n = seq(30, 100, by = 10)) %>%
## 100 repetitions
generate(100) %>%
## fit t-tests
fit(t_test = ~ t.test(a, b)) %>%
## bring model results into a tidy tibble
tidy_fits()
#> # A tibble: 800 × 14
#> .sim_id n rep Source estimate estimate1 estimate2 statistic p.value
#> <int> <dbl> <int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 30 1 t_test -0.953 1.73 2.68 -1.60 0.117
#> 2 2 40 1 t_test -0.249 1.64 1.89 -0.581 0.563
#> 3 3 50 1 t_test -0.616 1.67 2.29 -1.19 0.237
#> 4 4 60 1 t_test -1.75 1.28 3.03 -3.30 0.00146
#> 5 5 70 1 t_test -0.876 1.61 2.48 -1.96 0.0525
#> 6 6 80 1 t_test -0.780 1.71 2.49 -2.13 0.0352
#> 7 7 90 1 t_test -0.818 1.60 2.42 -2.51 0.0129
#> 8 8 100 1 t_test -0.878 1.51 2.38 -2.61 0.00988
#> 9 9 30 2 t_test -0.487 1.96 2.44 -0.713 0.479
#> 10 10 40 2 t_test -2.29 1.37 3.66 -1.76 0.0851
#> # … with 790 more rows, and 5 more variables: parameter <dbl>, conf.low <dbl>,
#> # conf.high <dbl>, method <chr>, alternative <chr>
specify()
creates two variables a
and b
that are distributed
lognormally (any R expression that generates data can be used here). The
specify
expressions refer to the sample size, n
. define()
clarifies that n
varies between 30 and 100 by 10s. generate()
actually does the data generation, with 100 simulated datasets for each
possible value of define()
. fit()
applies an arbitrary R expression
to each simulated dataset, and tidy_fits()
brings things together in a
tidy tibble that can be easily aggregated and plotted to calculate bias,
power, etc.
See vignette("simpr")
to get started on using the package, or view the
simpr
showcase
for several applied examples.