-
Notifications
You must be signed in to change notification settings - Fork 1
/
models.Rmd
86 lines (64 loc) · 6.74 KB
/
models.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
title: "Models and model comparisons"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Models and model comparisons}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
# Implemented model types
The risks package implements all major regression models that have been proposed for relative risks and risk differences. By default (`approach = "auto"`), `riskratio()` and `riskdiff()` estimate the most efficient valid model that converges; in more numerically challenging cases, they default to marginal standardization, which does not return parameters for covariates.
The following models are implemented in the risks package:
#^1^ | `approach =`| RR | RD | Model | Reference
--|-------------|-----|-----|-----------------------------|----------------------------------
1 | `glm` | `riskratio` | `riskdiff` | Binomial model with a log or identity link | Wacholder S. Binomial regression in GLIM: Estimating risk ratios and risk differences. [Am J Epidemiol 1986;123:174-184](https://pubmed.ncbi.nlm.nih.gov/3509965).
2 | `glm_startp` | `riskratio` | `riskdiff` | Binomial model with a log or identity link, convergence-assisted by starting values from Poisson model | Spiegelman D, Hertzmark E. Easy SAS calculations for risk or prevalence ratios and differences. [Am J Epidemiol 2005;162:199-200](https://pubmed.ncbi.nlm.nih.gov/15987728).
3 | `margstd_delta` | `riskratio` | `riskdiff` | Marginally standardized estimates using binomial models with a logit link (logistic model) with standard errors calculated via the delta method. | This package.
4 | `margstd_boot` | `riskratio` | `riskdiff` | Marginally standardized estimates using binomial models with a logit link (logistic model) with bias-corrected accelerated (BC~a~) confidence intervals from parametric bootstrapping (see [Marginal standardization](margstd.html)). | This package. \ For marginal standardization with *nonparametric* bootstrapping, see: Localio AR, Margolis DJ, Berlin JA. Relative risks and confidence intervals were easily computed indirectly from multivariable logistic regression. [J Clin Epidemiol 2007;60(9):874-82](https://pubmed.ncbi.nlm.nih.gov/17689803).
-- | `glm_cem` | `riskratio` | --- | Binomial model with log-link fitted via combinatorial expectation maximization instead of Fisher scoring | Donoghoe MW, Marschner IC. logbin: An R Package for Relative Risk Regression Using the Log-Binomial Model. [J Stat Softw 2018;86(9)](http://dx.doi.org/10.18637/jss.v086.i09).
-- | `glm_cem` | --- | `riskdiff` | Additive binomial model (identity link) fitted via combinatorial expectation maximization instead of Fisher scoring | Donoghoe MW, Marschner IC. Stable computational methods for additive binomial models with application to adjusted risk differences. [Comput Stat Data Anal 2014;80:184-96](https://doi.org/10.1016/j.csda.2014.06.019).
--| `robpoisson` | `riskratio` | `riskdiff` | Log-linear (Poisson) model with robust/sandwich/empirical standard errors | Zou G. A modified Poisson regression approach to prospective studies with binary data. [Am J Epidemiol 2004;159(7):702-6](https://pubmed.ncbi.nlm.nih.gov/15033648)
--| `duplicate` | `riskratio` | -- | Case-duplication approach, fitting a logistic model with cluster-robust standard errors | Schouten EG, Dekker JM, Kok FJ, Le Cessie S, Van Houwelingen HC, Pool J, Vandenbroucke JP. Risk ratio and rate ratio estimation in case-cohort designs: hypertension and cardiovascular mortality. [Stat Med 1993;12:1733–45](https://pubmed.ncbi.nlm.nih.gov/8248665).
--| `glm_startd` | `riskratio` | --- | Binomial model with a log link, convergence-assisted by starting values from case-duplication logistic model | This package.
--| `logistic` | `riskratio`, for comparison only | --- | Binomial model with logit link (*i.e.*, the logistic model), returning odds ratios | Included for comparison purposes only.
^1^ Indicates the priority with which the legacy modelling strategy (`approach = "legacy"`) attempts model fitting (`glm_startp`: only for RR).
Which model was fitted is always indicated in the first line of the output of `summary()` and in the `model` column of `tidy()`. In methods sections of manuscripts, the approach can be described in detail as follows: "Risk ratios (or risk differences) were obtained via (method listed in the first line of model `summary.risks(...)`) using the `risks` R package (reference to this package and/or the article listed in the column 'reference')."
For example: "Risk ratios were obtained from binomial models with a log link, convergence-assisted by Poisson models (ref. Spiegelman and Hertzmark, AJE 2005), using the `risks` R package (https://stopsack.github.io/risks/)."
# Model choice
By default, automatic model fitting (`approach = "auto"`) reports results from marginal standardization using a logistic model with delta method standard errors (equivalent to `approach = "margstd_delta"`). An exception is made if interaction terms between exposure and confounders are included. This case, confidence intervals are calculated using bootstrapping (equivalent to requesting `approach = "margstd_boot"`). Alternatively, any of the options listed under `approach =` in the table can be requested directly. However, unlike with `approach = "auto"` (the default) or `approach = "legacy"`, the selected model may not converge.
We load the same example data as in the [Get Started vignette](risks.html#an-example-cohort-study).
```{r load, message = FALSE}
library(risks) # provides riskratio(), riskdiff(), postestimation functions
library(dplyr) # For data handling
library(broom) # For tidy() model summaries
data(breastcancer)
```
We then select a binomial model with starting values from the Poisson model:
```{r selectapproach}
riskratio(formula = death ~ stage + receptor,
data = breastcancer,
approach = "glm_startp")
```
\
However, the binomial model without starting values (`approach = "glm"`) does not converge, as expected.
# Model comparisons
With `approach = "all"`, all model types listed in the tables are fitted. The fitted object, *e.g.*, `fit`, is one of the converged models. A summary of the convergence status of all models is displayed at the beginning of `summary(fit)`:
```{r allmodels}
fit_all <- riskratio(formula = death ~ stage + receptor,
data = breastcancer,
approach = "all")
summary(fit_all)
```
\
Individual models can be accessed as `fit$all_models[[1]]` through `fit$all_models[[6]]` (or `[[7]]` if fitting a risk ratio model). `tidy()` shows coefficients and confidence intervals from all models that converged:
```{r allmodels2}
tidy(fit_all) %>%
select(-statistic, -p.value) %>%
print(n = 50)
```