Reproducibility Diagnostics for Statistical Modeling
ReproStat helps you answer a practical question that standard model summaries do not answer well:
If the data changed a little, would the substantive result still look the same?
The package repeatedly perturbs a dataset, refits the model, and measures how stable the outputs remain. It summarizes that behavior through:
- coefficient stability
- p-value stability
- selection stability
- prediction stability
- a composite Reproducibility Index (RI) on a 0-100 scale
- cross-validation ranking stability for comparing multiple candidate models
This makes ReproStat useful when you want to move beyond single-fit inference and assess how sensitive your modeling conclusions are to ordinary data variation.
ReproStat is designed for analysts who want to know whether a model result is:
- stable under bootstrap resampling
- sensitive to sample composition
- sensitive to small measurement noise
- robust across competing modeling choices
Typical use cases include:
- regression diagnostics for research analyses
- checking whether selected predictors remain important across perturbations
- comparing candidate models by how consistently they rank best in repeated CV
- producing a transparent reproducibility summary for reports, theses, or papers
ReproStat does not claim to prove scientific reproducibility on its own. It quantifies the stability of model outputs under controlled perturbation schemes so you can assess how fragile or dependable the modeling result appears.
If the package is available on CRAN:
install.packages("ReproStat")To install the development version from GitHub:
remotes::install_github("ntiGideon/ReproStat")| Package | Purpose |
|---|---|
MASS |
Robust M-estimation backend via backend = "rlm" |
glmnet |
Penalized regression backend via backend = "glmnet" |
ggplot2 |
ggplot-based plotting helpers |
The package follows a simple workflow:
original data
->
perturb data many times
->
refit the model each time
->
measure how much results change
->
summarize the stability of those changes
If coefficient signs, significance decisions, selected variables, predictions, and model rankings remain similar across perturbations, the analysis is more reproducible in the sense ReproStat measures. If they vary substantially, the result may be fragile.
library(ReproStat)
set.seed(1)
diag_obj <- run_diagnostics(
mpg ~ wt + hp + disp,
data = mtcars,
B = 200,
method = "bootstrap"
)
print(diag_obj)
coef_stability(diag_obj)
pvalue_stability(diag_obj)
selection_stability(diag_obj)
prediction_stability(diag_obj)
reproducibility_index(diag_obj)
ri_confidence_interval(diag_obj, R = 500, seed = 1)run_diagnostics() is the main entry point. It fits the original model, perturbs the data B times, refits the model, and stores the results for downstream summaries.
diag_obj <- run_diagnostics(
mpg ~ wt + hp + disp,
data = mtcars,
B = 200,
method = "bootstrap"
)coef_stability(diag_obj)
pvalue_stability(diag_obj)
selection_stability(diag_obj)
prediction_stability(diag_obj)These functions answer different questions:
coef_stability(): Do coefficient magnitudes fluctuate a lot?pvalue_stability(): Do significance decisions stay consistent?selection_stability(): Do predictors keep the same sign or selection pattern?prediction_stability(): Do predictions stay similar across perturbations?
ri <- reproducibility_index(diag_obj)
riThe RI is a compact summary of multiple stability components, scaled to 0-100.
oldpar <- par(mfrow = c(2, 2))
plot_stability(diag_obj, "coefficient")
plot_stability(diag_obj, "pvalue")
plot_stability(diag_obj, "selection")
plot_stability(diag_obj, "prediction")
par(oldpar)| Method | What it tests | Good when you want to assess |
|---|---|---|
"bootstrap" |
resampling variability | ordinary data-driven stability |
"subsample" |
sample composition sensitivity | robustness to who enters the sample |
"noise" |
measurement perturbation | sensitivity to noisy recorded values |
The RI is best treated as an interpretive summary, not as a hard universal threshold.
| RI range | Interpretation |
|---|---|
90-100 |
Highly stable under the chosen perturbation design |
70-89 |
Moderately stable; overall pattern is fairly dependable |
50-69 |
Mixed stability; some conclusions may be sensitive |
< 50 |
Low stability; investigate model dependence and data fragility |
Two important cautions:
- The RI depends on your perturbation scheme, backend, and tuning choices.
- RI values are not directly comparable across all backends, especially
glmnet, because not all components are defined the same way.
ReproStat supports four model-fitting backends through the same interface:
| Backend | Model family | Notes |
|---|---|---|
"lm" |
ordinary least squares | default |
"glm" |
generalized linear models | use family = ... |
"rlm" |
robust regression | requires MASS |
"glmnet" |
penalized regression | requires glmnet |
Examples:
# Logistic regression
diag_glm <- run_diagnostics(
am ~ wt + hp + qsec,
data = mtcars,
B = 100,
family = stats::binomial()
)
# Robust regression
if (requireNamespace("MASS", quietly = TRUE)) {
diag_rlm <- run_diagnostics(
mpg ~ wt + hp,
data = mtcars,
B = 100,
backend = "rlm"
)
}
# Penalized regression
if (requireNamespace("glmnet", quietly = TRUE)) {
diag_lasso <- run_diagnostics(
mpg ~ wt + hp + disp + qsec,
data = mtcars,
B = 100,
backend = "glmnet",
en_alpha = 1
)
}ReproStat also helps evaluate model-selection stability, not just single-model stability.
cv_ranking_stability() repeatedly runs cross-validation across competing formulas and records how often each model ranks best.
models <- list(
baseline = mpg ~ wt + hp,
medium = mpg ~ wt + hp + disp,
full = mpg ~ wt + hp + disp + qsec
)
cv_obj <- cv_ranking_stability(models, mtcars, v = 5, R = 50)
cv_obj$summary
plot_cv_stability(cv_obj, metric = "top1_frequency")
plot_cv_stability(cv_obj, metric = "mean_rank")This is useful when the lowest average error and the most consistently top-ranked model are not the same thing.
library(ReproStat)
set.seed(42)
diag_obj <- run_diagnostics(
mpg ~ wt + hp + disp,
data = mtcars,
B = 200,
method = "bootstrap"
)
# Individual summaries
coef_stability(diag_obj)
pvalue_stability(diag_obj)
selection_stability(diag_obj)
prediction_stability(diag_obj)$mean_variance
# Composite summary
ri <- reproducibility_index(diag_obj)
cat(sprintf("RI = %.1f\n", ri$index))
ri_confidence_interval(diag_obj, R = 500, seed = 42)
# Visuals
oldpar <- par(mfrow = c(2, 2))
plot_stability(diag_obj, "coefficient")
plot_stability(diag_obj, "pvalue")
plot_stability(diag_obj, "selection")
plot_stability(diag_obj, "prediction")
par(oldpar)The pkgdown site is organized for different user needs:
- Get started: a narrative introduction and first analysis
- Articles: interpretation guidance, backend choices, and workflow patterns
- Reference: individual function documentation
- Changelog: package evolution over releases
Key pages:
vignette("ReproStat-intro")- the interpretation article
- the backend guide
- the workflow patterns article
If you use ReproStat in published work, cite the package and any associated manuscript or software record that accompanies the release you used.
GPL (>= 3)