vignettes/models.Rmd

---
title: "Models and model comparisons"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Models and model comparisons}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Implemented model types

The risks package implements all major regression models that have been proposed for relative risks and risk differences. By default (`approach = "auto"`), `riskratio()` and `riskdiff()` estimate the most efficient valid model that converges; in more numerically challenging cases, they default to marginal standardization, which does not return parameters for covariates.

The following models are implemented in the risks package:

#^1^ | `approach =`| RR  | RD  | Model                       | Reference
--|-------------|-----|-----|-----------------------------|----------------------------------
1 | `glm`       | `riskratio` | `riskdiff` | Binomial model with a log or identity link | Wacholder S. Binomial regression in GLIM: Estimating risk ratios and risk differences. [Am J Epidemiol 1986;123:174-184](https://pubmed.ncbi.nlm.nih.gov/3509965).
2 | `glm_startp` | `riskratio` | `riskdiff` | Binomial model with a log or identity link, convergence-assisted by starting values from Poisson model | Spiegelman D, Hertzmark E. Easy SAS calculations for risk or prevalence ratios and differences. [Am J Epidemiol 2005;162:199-200](https://pubmed.ncbi.nlm.nih.gov/15987728).
3 | `margstd_delta` | `riskratio` | `riskdiff` | Marginally standardized estimates using binomial models with a logit link (logistic model) with standard errors calculated via the delta method. | This package.
4 | `margstd_boot`   | `riskratio` | `riskdiff` | Marginally standardized estimates using binomial models with a logit link (logistic model) with bias-corrected accelerated (BC~a~) confidence intervals from parametric bootstrapping (see [Marginal standardization](margstd.html)). | This package. \ For marginal standardization with *nonparametric* bootstrapping, see: Localio AR, Margolis DJ, Berlin JA. Relative risks and confidence intervals were easily computed indirectly from multivariable logistic regression. [J Clin Epidemiol 2007;60(9):874-82](https://pubmed.ncbi.nlm.nih.gov/17689803).
-- | `glm_cem`   | `riskratio` | --- | Binomial model with log-link fitted via combinatorial expectation maximization instead of Fisher scoring | Donoghoe MW, Marschner IC. logbin: An R Package for Relative Risk Regression Using the Log-Binomial Model. [J Stat Softw 2018;86(9)](http://dx.doi.org/10.18637/jss.v086.i09).
-- | `glm_cem`   | --- | `riskdiff` | Additive binomial model (identity link) fitted via combinatorial expectation maximization instead of Fisher scoring | Donoghoe MW, Marschner IC. Stable computational methods for additive binomial models with application to adjusted risk differences. [Comput Stat Data Anal 2014;80:184-96](https://doi.org/10.1016/j.csda.2014.06.019).
--| `robpoisson` | `riskratio` | `riskdiff` | Log-linear (Poisson) model with robust/sandwich/empirical standard errors | Zou G. A modified Poisson regression approach to prospective studies with binary data. [Am J Epidemiol 2004;159(7):702-6](https://pubmed.ncbi.nlm.nih.gov/15033648)
--| `duplicate` | `riskratio` | -- | Case-duplication approach, fitting a logistic model with cluster-robust standard errors | Schouten EG, Dekker JM, Kok FJ, Le Cessie S, Van Houwelingen HC, Pool J, Vandenbroucke JP. Risk ratio and rate ratio estimation in case-cohort designs: hypertension and cardiovascular mortality. [Stat Med 1993;12:1733–45](https://pubmed.ncbi.nlm.nih.gov/8248665).
--| `glm_startd` | `riskratio` | --- | Binomial model with a log link, convergence-assisted by starting values from case-duplication logistic model | This package.
--| `logistic`   | `riskratio`, for comparison only  | --- | Binomial model with logit link (*i.e.*, the logistic model), returning odds ratios | Included for comparison purposes only.
^1^  Indicates the priority with which the legacy modelling strategy (`approach = "legacy"`) attempts  model fitting (`glm_startp`: only for RR).

Which model was fitted is always indicated in the first line of the output of `summary()` and in the `model` column of `tidy()`. In methods sections of manuscripts, the approach can be described in detail as follows: "Risk ratios (or risk differences) were obtained via (method listed in the first line of model `summary.risks(...)`) using the `risks` R package (reference to this package and/or the article listed in the column 'reference')."

For example: "Risk ratios were obtained from binomial models with a log link, convergence-assisted by Poisson models (ref. Spiegelman and Hertzmark, AJE 2005), using the `risks` R package (https://stopsack.github.io/risks/)."


# Model choice

By default, automatic model fitting (`approach = "auto"`) reports results from marginal standardization using a logistic model with delta method standard errors (equivalent to `approach = "margstd_delta"`). An exception is made if interaction terms between exposure and confounders are included. This case, confidence intervals are calculated using bootstrapping (equivalent to requesting `approach = "margstd_boot"`). Alternatively, any of the options listed under `approach =` in the table can be requested directly. However, unlike with `approach = "auto"` (the default) or `approach = "legacy"`, the selected model may not converge.


We load the same example data as in the [Get Started vignette](risks.html#an-example-cohort-study).

```{r load, message = FALSE}
library(risks)  # provides riskratio(), riskdiff(), postestimation functions
library(dplyr)  # For data handling
library(broom)  # For tidy() model summaries
data(breastcancer)
```

We then select a binomial model with starting values from the Poisson model:

```{r selectapproach}
riskratio(formula = death ~ stage + receptor, 
          data = breastcancer, 
          approach = "glm_startp")
```

\
However, the binomial model without starting values (`approach = "glm"`) does not converge, as expected.


# Model comparisons

With `approach = "all"`, all model types listed in the tables are fitted. The fitted object, *e.g.*, `fit`, is one of the converged models. A summary of the convergence status of all models is displayed at the beginning of `summary(fit)`:

```{r allmodels}
fit_all <- riskratio(formula = death ~ stage + receptor, 
                     data = breastcancer, 
                     approach = "all")
summary(fit_all)
```

\
Individual models can be accessed as `fit$all_models[[1]]` through `fit$all_models[[6]]` (or `[[7]]` if fitting a risk ratio model). `tidy()` shows coefficients and confidence intervals from all models that converged:

```{r allmodels2}
tidy(fit_all) %>%
  select(-statistic, -p.value) %>%
  print(n = 50)
```