-
Notifications
You must be signed in to change notification settings - Fork 4
/
quickstart.Rmd
413 lines (336 loc) · 16.3 KB
/
quickstart.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
---
title: "rbmi: Quickstart"
author: Craig Gower-Page, Alessandro Noci, and Marcel Wolbers
output:
bookdown::html_document2:
toc: true
toc_depth: 4
number_sections: true
citation_package: natbib
base_format: rmarkdown::html_vignette
bibliography: "references.bib"
link-citations: true
linkcolor: blue
pkgdown:
as_is: true
vignette: >
%\VignetteIndexEntry{rbmi: Quickstart}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
# Introduction
The purpose of this vignette is to provide a 15 minute quickstart guide to the core functions of the rbmi package.
The rbmi package consists of 4 core functions (plus several helper functions) which are typically called in sequence:
- `draws()` - fits the imputation models and stores their parameters
- `impute()` - creates multiple imputed datasets
- `analyse()` - analyses each of the multiple imputed datasets
- `pool()` - combines the analysis results across imputed datasets into a single statistic
# The Data
We use a publicly available example dataset from an antidepressant clinical trial of an active drug versus placebo. The relevant endpoint is the Hamilton 17-item depression rating scale (HAMD17) which was assessed at baseline and at weeks 1, 2, 4, and 6. Study drug discontinuation occurred in 24% of subjects from the active drug and 26% of subjects from placebo. All data after study drug discontinuation are missing and there is a single additional intermittent missing observation.
```{r}
library(rbmi)
library(dplyr)
data("antidepressant_data")
dat <- antidepressant_data
```
We consider an imputation model with the mean change from baseline in the HAMD17 score as the outcome (variable `CHANGE` in the dataset). The following covariates are included in the imputation model: the treatment group (`THERAPY`), the (categorical) visit (`VISIT`), treatment-by-visit interactions, the baseline HAMD17 score (`BASVAL`), and baseline HAMD17 score-by-visit interactions. A common unstructured covariance matrix structure is assumed for both groups. The analysis model is an ANCOVA model with the treatment group as the primary factor and adjustment for the baseline HAMD17 score.
`rbmi` expects its input dataset to be complete; that is, there must be one row per subject for each visit. Missing outcome values should be coded as `NA`, while missing covariate values are not allowed. If the dataset is incomplete, then the `expand_locf()` helper function can be used to add any missing rows, using LOCF imputation to carry forward the observed baseline covariate values to visits with missing outcomes. Rows corresponding to missing outcomes are not present in the antidepressant trial dataset. To address this we will therefore use the `expand_locf()` function as follows:
```{r message=FALSE}
# Use expand_locf to add rows corresponding to visits with missing outcomes to the dataset
dat <- expand_locf(
dat,
PATIENT = levels(dat$PATIENT), # expand by PATIENT and VISIT
VISIT = levels(dat$VISIT),
vars = c("BASVAL", "THERAPY"), # fill with LOCF BASVAL and THERAPY
group = c("PATIENT"),
order = c("PATIENT", "VISIT")
)
```
# Draws
The `draws()` function fits the imputation models and stores the corresponding parameter estimates or Bayesian posterior parameter draws.
The three main inputs to the `draws()` function are:
- `data` - The primary longitudinal data.frame containing the outcome variable and all covariates.
- `data_ice` - A data.frame which specifies the first visit affected by an intercurrent event (ICE) and the imputation strategy for handling missing outcome data after the ICE. At most one ICE which is to be imputed by a non-MAR strategy is allowed per subject.
- `method` - The statistical method used to fit the imputation models and to create imputed datasets.
For the antidepressant trial data, the dataset `data_ice` is not provided. However, it can be derived because, in this dataset,
the subject's first visit affected by the ICE "study drug discontinuation" corresponds to the first terminal missing observation.
We first derive the dateset `data_ice` and then create 150 Bayesian posterior draws of the imputation model parameters.
For this example, we assume that the imputation strategy after the ICE is Jump To Reference (JR) for all subjects
and that 150 multiple imputed datasets using Bayesian posterior draws from the imputation model are to be created.
```{r}
# create data_ice and set the imputation strategy to JR for
# each patient with at least one missing observation
dat_ice <- dat %>%
arrange(PATIENT, VISIT) %>%
filter(is.na(CHANGE)) %>%
group_by(PATIENT) %>%
slice(1) %>%
ungroup() %>%
select(PATIENT, VISIT) %>%
mutate(strategy = "JR")
# In this dataset, subject 3618 has an intermittent missing values which does not correspond
# to a study drug discontinuation. We therefore remove this subject from `dat_ice`.
# (In the later imputation step, it will automatically be imputed under the default MAR assumption.)
dat_ice <- dat_ice[-which(dat_ice$PATIENT == 3618),]
dat_ice
# Define the names of key variables in our dataset and
# the covariates included in the imputation model using `set_vars()`
# Note that the covariates argument can also include interaction terms
vars <- set_vars(
outcome = "CHANGE",
visit = "VISIT",
subjid = "PATIENT",
group = "THERAPY",
covariates = c("BASVAL*VISIT", "THERAPY*VISIT")
)
# Define which imputation method to use (here: Bayesian multiple imputation with 150 imputed datsets)
method <- method_bayes(
burn_in = 200,
burn_between = 5,
n_samples = 150,
seed = 675442751
)
# Create samples for the imputation parameters by running the draws() function
set.seed(987)
drawObj <- draws(
data = dat,
data_ice = dat_ice,
vars = vars,
method = method,
quiet = TRUE
)
drawObj
```
Note the use of `set_vars()` which specifies the names of the key variables
within the dataset and the imputation model. Additionally, note that whilst `vars$group` and `vars$visit`
are added as terms to the imputation model by default, their interaction is not,
thus the inclusion of `group * visit` in the list of covariates.
Available imputation methods include:
- Bayesian multiple imputation - `method_bayes()`
- Approximate Bayesian multiple imputation - `method_approxbayes()`
- Conditional mean imputation (bootstrap) - `method_condmean(type = "bootstrap")`
- Conditional mean imputation (jackknife) - `method_condmean(type = "jackknife")`
- Bootstrapped multiple imputation - `method = method_bmlmi()`
For a comparison of these methods, we refer to the `stat_specs` vignette (Section 3.10).
"statistical specifications" vignette (Section 3.10): `vignette("stat_specs",package="rbmi")`.
Available imputation strategies include:
- Missing At Random - `"MAR"`
- Jump to Reference - `"JR"`
- Copy Reference - `"CR"`
- Copy Increments from Reference - `"CIR"`
- Last Mean Carried Forward - `"LMCF"`
# Impute
The next step is to use the parameters from the imputation model to generate the imputed datasets. This is
done via the `impute()` function. The function only has two key inputs: the imputation
model output from `draws()` and the reference groups relevant to reference-based imputation methods. It's usage is thus:
```{r}
imputeObj <- impute(
drawObj,
references = c("DRUG" = "PLACEBO", "PLACEBO" = "PLACEBO")
)
imputeObj
```
In this instance, we are specifying that the `PLACEBO` group should be the reference group for itself as well as for the `DRUG` group (as is standard for imputation using reference-based methods).
Generally speaking, there is no need to see or directly interact with the imputed
datasets. However, if you do wish to inspect them, they can be extracted from the imputation
object using the `extract_imputed_dfs()` helper function, i.e.:
```{r}
imputed_dfs <- extract_imputed_dfs(imputeObj)
head(imputed_dfs[[10]], 12) # first 12 rows of 10th imputed dataset
```
Note that in the case of `method_bayes()` or `method_approxbayes()`, all imputed datasets correspond to random imputations on the original dataset.
For `method_condmean()`, the first imputed dataset will always correspond to the completed original dataset containing all subjects.
For `method_condmean(type="jackknife")`, the remaining datasets correspond to conditional mean imputations on leave-one-subject-out datasets,
whereas for `method_condmean(type="bootstrap")`, each subsequent dataset corresponds to a conditional mean imputation on a bootstrapped datasets.
For `method_bmlmi()`, all the imputed datasets correspond to sets of random imputations on bootstrapped datasets.
# Analyse
The next step is to run the analysis model on each imputed dataset. This is done by defining
an analysis function and then calling `analyse()` to apply this function to each
imputed dataset. For this vignette we use the `ancova()` function provided by the rbmi
package which fits a separate ANCOVA model for the outcomes from each visit and returns a treatment
effect estimate and corresponding least square means for each group per visit.
```{r}
anaObj <- analyse(
imputeObj,
ancova,
vars = set_vars(
subjid = "PATIENT",
outcome = "CHANGE",
visit = "VISIT",
group = "THERAPY",
covariates = c("BASVAL")
)
)
anaObj
```
Note that, similar to `draws()`, the `ancova()` function uses the `set_vars()`
function which determines the names of the key variables within the data and the covariates
(in addition to the treatment group) for which the analysis model will be adjusted.
Please also note that the names of the analysis estimates contain "ref" and "alt" to refer to the two treatment arms. In particular "ref" refers to the first factor level of `vars$group` which does not necessarily
coincide with the control arm. In this example, since `levels(dat[[vars$group]]) = c("DRUG", PLACEBO`), the results associated with "ref" correspond to the intervention arm, while those associated with "alt" correspond to the control arm.
Additionally, we can use the `delta` argument of `analyse()` to perform a delta adjustments of the imputed datasets prior to the analysis.
In brief, this is implemented by specifying a data.frame that contains the amount
of adjustment to be added to each longitudinal outcome for each subject and visit, i.e.
the data.frame must contain the columns `subjid`, `visit`, and `delta`.
It is appreciated that carrying out this procedure is potentially tedious, therefore the
`delta_template()` helper function has been provided to simplify it. In particular,
`delta_template()` returns a shell `data.frame` where the delta-adjustment is set to 0 for all
patients. Additionally `delta_template()` adds several meta-variables onto the shell
`data.frame` which can be used for manual derivation or manipulation of the delta-adjustment.
For example lets say we want to add a delta-value of 5 to all imputed values (i.e. those values
which were missing in the original dataset) in the drug arm. That could then be implemented as follows:
```{r}
# For reference show the additional meta variables provided
delta_template(imputeObj) %>% as_tibble()
delta_df <- delta_template(imputeObj) %>%
as_tibble() %>%
mutate(delta = if_else(THERAPY == "DRUG" & is_missing , 5, 0)) %>%
select(PATIENT, VISIT, delta)
delta_df
anaObj_delta <- analyse(
imputeObj,
ancova,
delta = delta_df,
vars = set_vars(
subjid = "PATIENT",
outcome = "CHANGE",
visit = "VISIT",
group = "THERAPY",
covariates = c("BASVAL")
)
)
```
# Pool
Finally, the `pool()` function can be used to summarise the analysis results across multiple
imputed datasets to provide an overall statistic with a standard error, confidence intervals and a p-value for
the hypothesis test of the null hypothesis that the effect is equal to 0.
Note that the pooling method is automatically derived based on the method that was specified
in the original call to `draws()`:
- For `method_bayes()` or `method_approxbayes()` pooling and inference are based on Rubin's rules.
- For `method_condmean(type = "bootstrap") ` inference is either based on a normal approximation using the bootstrap standard error (`pool(..., type = "normal")`) or on the bootstrap percentiles (`pool(..., type = "percentile")`).
- For `method_condmean(type = "jackknife")` inference is based on a normal approximation using the jackknife estimate of the standard error.
- For `method = method_bmlmi()` inference is according to the methods described by von Hippel and Bartlett (see the `stat_specs` vignette for details)
Since we have used Bayesian multiple imputation in this vignette, the `pool()` function will automatically use Rubin's rules.
```{r}
poolObj <- pool(
anaObj,
conf.level = 0.95,
alternative = "two.sided"
)
poolObj
```
The table of values shown in the print message for `poolObj` can also be extracted using the `as.data.frame()` function:
```{r}
as.data.frame(poolObj)
```
These outputs gives an estimated difference of
`r paste(formatC(poolObj$pars$trt_7$est,format="f",digits=3)," (95% CI ",
formatC(poolObj$pars$trt_7$ci[1],format="f",digits=3), " to ",
formatC(poolObj$pars$trt_7$ci[2],format="f",digits=3),")",sep="")`
between the two groups at the last visit with an associated p-value of `r formatC(poolObj$pars$trt_7$pval,format="f",digits=3)`.
# Code
We report below all the code presented in this vignette.
```{r, eval=FALSE}
library(rbmi)
library(dplyr)
data("antidepressant_data")
dat <- antidepressant_data
# Use expand_locf to add rows corresponding to visits with missing outcomes to the dataset
dat <- expand_locf(
dat,
PATIENT = levels(dat$PATIENT), # expand by PATIENT and VISIT
VISIT = levels(dat$VISIT),
vars = c("BASVAL", "THERAPY"), # fill with LOCF BASVAL and THERAPY
group = c("PATIENT"),
order = c("PATIENT", "VISIT")
)
# Create data_ice and set the imputation strategy to JR for
# each patient with at least one missing observation
dat_ice <- dat %>%
arrange(PATIENT, VISIT) %>%
filter(is.na(CHANGE)) %>%
group_by(PATIENT) %>%
slice(1) %>%
ungroup() %>%
select(PATIENT, VISIT) %>%
mutate(strategy = "JR")
# In this dataset, subject 3618 has an intermittent missing values which does not correspond
# to a study drug discontinuation. We therefore remove this subject from `dat_ice`.
# (In the later imputation step, it will automatically be imputed under the default MAR assumption.)
dat_ice <- dat_ice[-which(dat_ice$PATIENT == 3618),]
# Define the names of key variables in our dataset using `set_vars()`
# and the covariates included in the imputation model
# Note that the covariates argument can also include interaction terms
vars <- set_vars(
outcome = "CHANGE",
visit = "VISIT",
subjid = "PATIENT",
group = "THERAPY",
covariates = c("BASVAL*VISIT", "THERAPY*VISIT")
)
# Define which imputation method to use (here: Bayesian multiple imputation with 150 imputed datsets)
method <- method_bayes(
burn_in = 200,
burn_between = 5,
n_samples = 150,
seed = 675442751
)
# Create samples for the imputation parameters by running the draws() function
set.seed(987)
drawObj <- draws(
data = dat,
data_ice = dat_ice,
vars = vars,
method = method,
quiet = TRUE
)
# Impute the data
imputeObj <- impute(
drawObj,
references = c("DRUG" = "PLACEBO", "PLACEBO" = "PLACEBO")
)
# Fit the analysis model on each imputed dataset
anaObj <- analyse(
imputeObj,
ancova,
vars = set_vars(
subjid = "PATIENT",
outcome = "CHANGE",
visit = "VISIT",
group = "THERAPY",
covariates = c("BASVAL")
)
)
# Apply a delta adjustment
# Add a delta-value of 5 to all imputed values (i.e. those values
# which were missing in the original dataset) in the drug arm.
delta_df <- delta_template(imputeObj) %>%
as_tibble() %>%
mutate(delta = if_else(THERAPY == "DRUG" & is_missing , 5, 0)) %>%
select(PATIENT, VISIT, delta)
# Repeat the analyses with the adjusted values
anaObj_delta <- analyse(
imputeObj,
ancova,
delta = delta_df,
vars = set_vars(
subjid = "PATIENT",
outcome = "CHANGE",
visit = "VISIT",
group = "THERAPY",
covariates = c("BASVAL")
)
)
# Pool the results
poolObj <- pool(
anaObj,
conf.level = 0.95,
alternative = "two.sided"
)
```