mverse is an extension to multiverse package (Sarma et al. 2021) which allows users create explorable multiverse analysis (Steegen et al. 2016) in R. This extension provides user friendly abstraction and a set of examples for researchers, educators, and students in statistics.
You can install the released version of mverse from CRAN with:
install.packages("mverse")
You can install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("mverseanalysis/mverse", build_vignettes = TRUE)
The following demonstration performs a multiverse analysis using
hurricane
dataset (Jung et al. 2014) included in the library. We first
create 6 universes as described in Figure 1. A filter branch with 2
options and a mutate branch with 3 options results in 6
universes in total. We then fit a Poisson regression model across the
multiverse and inspect a coefficient estimate. See
vignette("hurricane")
for a detailed analysis as well as the
terminologies used.
#> Warning: Using alpha for a discrete variable is not advised.
#> Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
#> ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
First, we start by loading the library and defining a mverse
object
with the dataset of interest.
library(mverse)
mv <- mverse(hurricane)
We use the *_branch()
methods to define branches. filter_branch()
defines filtering operations using dplyr::filter()
with different
options for the filter.
outliers <- filter_branch(
! Name %in% c("Katrina"),
! Name %in% c("Katrina", "Audrey")
)
mutate_branch()
multiplexes dplyr::mutate()
to add a new column in
the dataset.
strength <- mutate_branch(
NDAM, HighestWindSpeed, Minpressure_Updated_2014)
In order to fit a Poisson regression, we need to specify the model using
R’s formula syntax and the underlying distribution using family
. In
mverse
, we provide the specifications using formula_branch()
and
family_branch()
. In this demonstration, we only define a single option
for both formula and family but it is possible to provide multiple
options for them as well.
model <- formula_branch(alldeaths ~ strength * MasFem)
distribution <- family_branch(poisson)
After defining the branches, we can add the branch objects to the
mverse
object using add_*_branch()
methods.
mv <- mv %>%
add_filter_branch(outliers) %>%
add_mutate_branch(strength) %>%
add_formula_branch(model) %>%
add_family_branch(distribution)
glm_mverse()
multiplexes stats::glm()
function and fits a GLM in
each universe according to the specifications provided by
add_fomula_branch()
and add_family_branch()
.
mv <- mv %>% glm_mverse()
After completing the analysis, we can extract the results using
summary()
. The method returns a table with branching options,
estimates, 95% confidence intervals for all regression terms across the
multiverse.
res <- summary(mv)
res
#> # A tibble: 24 × 16
#> universe outliers_br…¹ stren…² model…³ distr…⁴ term estimate std.e…⁵ stati…⁶
#> <fct> <fct> <fct> <fct> <fct> <chr> <dbl> <dbl> <dbl>
#> 1 1 outliers_1 streng… model_1 distri… (Int… 2.13e+0 8.04e-2 26.5
#> 2 1 outliers_1 streng… model_1 distri… stre… 3.02e-5 2.63e-6 11.5
#> 3 1 outliers_1 streng… model_1 distri… MasF… 6.23e-2 1.01e-2 6.19
#> 4 1 outliers_1 streng… model_1 distri… stre… 7.96e-7 3.20e-7 2.49
#> 5 2 outliers_1 streng… model_1 distri… (Int… -8.59e-2 2.65e-1 -0.324
#> 6 2 outliers_1 streng… model_1 distri… stre… 2.35e-2 2.17e-3 10.8
#> 7 2 outliers_1 streng… model_1 distri… MasF… 5.31e-2 3.20e-2 1.66
#> 8 2 outliers_1 streng… model_1 distri… stre… 3.32e-4 2.60e-4 1.28
#> 9 3 outliers_1 streng… model_1 distri… (Int… 4.74e+1 3.17e+0 15.0
#> 10 3 outliers_1 streng… model_1 distri… stre… -4.69e-2 3.34e-3 -14.0
#> # … with 14 more rows, 7 more variables: p.value <dbl>, conf.low <dbl>,
#> # conf.high <dbl>, outliers_branch_code <fct>, strength_branch_code <fct>,
#> # model_branch_code <fct>, distribution_branch_code <fct>, and abbreviated
#> # variable names ¹outliers_branch, ²strength_branch, ³model_branch,
#> # ⁴distribution_branch, ⁵std.error, ⁶statistic
The resulting data is a tibble
object and we can use regular
tidyverse
grammar to manipulate the data. In the code below, we
specifically focus on the estimated coefficient for MasFem
and its
confidence intervals.
library(dplyr)
res %>%
filter(term == "MasFem") %>%
select(outliers_branch, strength_branch, term, estimate, conf.low, conf.high)
#> # A tibble: 6 × 6
#> outliers_branch strength_branch term estimate conf.low conf.high
#> <fct> <fct> <chr> <dbl> <dbl> <dbl>
#> 1 outliers_1 strength_1 MasFem 0.0623 0.0427 0.0822
#> 2 outliers_1 strength_2 MasFem 0.0531 -0.00942 0.116
#> 3 outliers_1 strength_3 MasFem -0.845 -1.59 -0.103
#> 4 outliers_2 strength_1 MasFem 0.0623 0.0427 0.0822
#> 5 outliers_2 strength_2 MasFem 0.0956 0.0301 0.161
#> 6 outliers_2 strength_3 MasFem -1.02 -1.81 -0.247
We can also inspect the result graphically using spec_curve()
. The
method builds a specification curve (Simonsohn, Simmons, and Nelson
2020) for a term in the regression model specified by var
. The method
also allows multiple ways of sorting the estimates. See ?spec_curve
for details.
spec_summary(mv, var = "MasFem") %>%
spec_curve(spec_matrix_spacing = 4) +
labs(colour = "Significant at 0.05")
Jung, Kiju, Sharon Shavitt, Madhu Viswanathan, and Joseph M. Hilbe. 2014. “Female Hurricanes Are Deadlier Than Male Hurricanes” 111 (24): 8782–87. https://doi.org/10.1073/pnas.1402786111.
Sarma, Abhraneel, Alex Kale, Michael Moon, Nathan Taback, Fanny Chevalier, Jessica Hullman, and Matthew Kay. 2021. “Multiverse: Multiplexing Alternative Data Analyses in R Notebooks (Version 0.5.0).” https://github.com/MUCollective/multiverse.
Simonsohn, Uri, Joseph P. Simmons, and Leif D. Nelson. 2020. “Specification Curve Analysis” 4 (July): 1208–14. https://doi.org/10.1038/s41562-020-0912-z.
Steegen, Sara, Francis Tuerlinckx, Andrew Gelman, and Wolf Vanpaemel. 2016. “Increasing Transparency Through a Multiverse Analysis” 11 (5): 702–12. https://doi.org/10.1177/1745691616658637.