/
Time_to_event_analysis.Rmd
225 lines (165 loc) · 9.45 KB
/
Time_to_event_analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
---
title: "Survival Analysis with visR"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Survival Analysis with visR}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Introduction
This tutorial illustrates a typical use case in clinical development - the analysis of time to a certain event (e.g., death) in different groups. Typically, data obtained in randomized clinical trials (RCT) can be used to estimate the overall survival of patients in one group (e.g., treated with drug X) vs another group (e.g., treated with drug Y) and thus determine if there is a treatment difference.
For a more thorough introduction to Survival Analysis, we recommend the following [tutorial](https://bioconnector.github.io/workshops/r-survival.html).
In this example, we will work with patient data from NCCTG Lung Cancer dataset that is part of the `survival` package. Another vignette presents an example using a data set following the [CDISC ADaM standard](https://www.cdisc.org/standards/foundational/adam/adam-basic-data-structure-bds-time-event-tte-analyses-v1-0).
```{r imports, echo=TRUE, warning=FALSE, message=FALSE}
library(ggplot2)
library(visR)
```
## Global Document Setup
```{r globalSetup}
# Metadata Title
DATASET <- paste0("NCCTG Lung Cancer Dataset (from survival package ",
packageVersion("survival"), ")")
# Save original options()
old <- options()
# Global formatting options
options(digits = 3)
# Global ggplot settings
theme_set(theme_bw())
# Global table settings
options(DT.options = list(pageLength = 10,
language = list(search = 'Filter:'),
scrollX = TRUE))
lung_cohort <- survival::lung
# Change gender to be a factor and rename some variables to make output look nicer
lung_cohort <- lung_cohort %>%
dplyr::mutate(sex = as.factor(ifelse(sex == 1, "Male", "Female"))) %>%
dplyr::rename(Age = "age", Sex = "sex", Status = "status", Days = "time")
# Restore original options()
options(old)
```
## Cohort Overview (Table one)
Visualizing tables, like the table one or risk tables, is a two-step process in visR . First, a data.frame (or tibble) is created by a `get_XXX()` function (e.g. `get_tableone()`). Secondly, the data.frame can be displayed by calling the function `render()`. The advantage of this process is that data summaries can be created, used and adjusted throughout an analysis, while at every step data summaries can be displayed or even be downloaded.
Populations are usually displayed as a so-called table one. Function `get_tableone` creates a tibble that includes populations summaries.
```{r table1_get_default}
# Select variables of interest and change names to look nicer
lung_cohort_tab1 <- lung_cohort %>%
dplyr::select(Age, Sex)
# Create a table one
tab1 <- visR::get_tableone(lung_cohort_tab1)
# Render the tableone
visR::render(tab1, title = "Overview over Lung Cancer patients", datasource = DATASET)
```
Function `render` nicely displays the tableone. Additionally, visR includes a wrapper function to create and display a `tableone` in only one function call.
```{r table1_render_default}
# Use wrapper functionality to create and display a tableone
visR::tableone(lung_cohort_tab1, title = "Overview over Lung Cancer patients", datasource = DATASET)
```
Creating and visualizing a tableone with default settings is very simple and can be done with one line of code. However, there are further customization options.
In both the get and the wrapper functions, a stratifier can be defined and the column displaying total information can be removed.
```{r table1_get_options}
# Create and render a tableone with a stratifier and without displaying the total
visR::tableone(lung_cohort_tab1, strata = "Sex", overall = FALSE,
title = "Overview over Lung Cancer patients", datasource = DATASET)
```
visR's `render` supports three different rendering engines to be as flexible as possible. By default, `render` uses `gt`. Additional engines are `datatable` (`dt`) to include easy downloading options...
```{r table1_render_options_dt}
# Create and render a tableone with with dt as an engine
visR::tableone(lung_cohort_tab1, strata = "Sex", overall = FALSE,
title = "Overview over Lung Cancer patients", datasource = DATASET,
engine = "dt")
```
...and `kable` for flexible displaying in various output formats (`html` by default, `latex` supported).
```{r table1_render_options_kable}
# Create and render a tableone with with kable as an engine and html as output format
visR::tableone(lung_cohort_tab1, strata = "Sex", overall = FALSE,
title = "Overview over Lung Cancer patients", datasource = DATASET,
engine = "kable", output_format="html")
```
Called with `html` as an output format, a `html` view is displayed; called with `latex` a string containing latex code is printed.
## Time-to-event analysis
### Survival estimation
visR provides a wrapper function to estimate a Kaplan-Meier curve and several functions to visualize the results. This wrapper function is compatible with `%>%` and purrr::map functions without losing traceability of the dataset name.
```{r km_est}
# Select variables of interest and change names to look nicer
lung_cohort_survival <- lung_cohort %>%
dplyr::select(Age, Sex, Status, Days)
# For the survival estimate, the censor must be 0 or 1
lung_cohort_survival$Status <- lung_cohort_survival$Status - 1
# Estimate the survival curve
lung_suvival_object <- lung_cohort_survival %>%
visR::estimate_KM(strata = "Sex", CNSR = "Status", AVAL = "Days")
lung_suvival_object
```
### Survival visualization
There are two frequently used ways to estimate time-to-event data: As a risk table and as a Kaplan-Meier curve. In principle, visR allows to either visualize a risk table and a Kaplan-Meier curve separately, or both together in one plot.
#### Displaying the risktable
Creating and visualizing a risk table separately works in the exact same way as for the tableone (above): First, `get_risktable()` creates a tibble with risk information that can still be changed. Secondly, the risk table can be rendered to be displayed.
```{r km_tab}
# Create a risktable
rt <- visR::get_risktable(lung_suvival_object)
# Display the risktable
visR::render(rt, title = "Overview over survival rates of Lung Cancer patients", datasource = DATASET)
```
The risktable is only one piece of information that can be extracted from a survival object with a `get_XXX` to then be rendered.
```{r km_tab_options_1}
# Display a summary of the survival estimate
visR::render(lung_suvival_object %>% visR::get_summary(), title = "Summary", datasource = DATASET)
```
```{r km_tab_options_2}
# Display test statistics associated with the survival estimate
visR::render(lung_suvival_object %>% visR::get_pvalue(), title = "P-values", datasource = DATASET)
```
```{r km_tab_options_3}
# Display qunatile information of the survival estimate
visR::render(lung_suvival_object %>% visR::get_quantile(), title = "Quantile Information", datasource = DATASET)
```
```{r km_tab_options_4}
# Display a cox model estimate associated with the survival estimate
visR::render(lung_suvival_object %>% visR::get_COX_HR(), title = "COX estimate", datasource = DATASET)
```
#### Plotting the Kaplan-Meier
Alternatively, the survival data can be plotted as a Kaplan-Meier curve. In `visR`, a plot is in most cases a ggplot object and adapting the plot follows the general principle of creating a plot and then adding visual contents step-by-step.
```{r km_plot_1}
# Create and display a Kaplan-Meier from the survival object
gg <- visR::visr(lung_suvival_object)
gg
```
```{r km_plot_2}
# Add a confidence interval to the Kaplan-Meier and display the plot
gg %>% visR::add_CI()
```
```{r km_plot_3}
# Add a confidence interval and the censor ticks to the Kaplan-Meier and display the plot
gg %>% visR::add_CI() %>% visR::add_CNSR(shape = 3, size = 2)
```
visR includes a wrapper function to create a risktable and then add it directly to a Kaplan-Meier plot.
```{r km_add}
# Add a confidence interval and the censor ticks and a risktable to the Kaplan-Meier and display the plot
gg %>% visR::add_CI() %>% visR::add_CNSR(shape = 3, size = 2) %>% visR::add_risktable()
```
## Competing Risks
In addition to classic right-censored data, the {visR} package supports the estimation of time-to-event outcomes in the presence of competing events.
The package wraps the [{tidycmprsk}](https://mskcc-epi-bio.github.io/tidycmprsk/) package, and exports functions for cumulative incidence estimation and visualization.
The function `estimate_cuminc()` estimates the cumulative incidence of the competing event or outcome of interest.
The syntax is nearly identical to `estimate_KM()`; however, the outcome status variable (passed to the `CNSR=` argument) must be a factor where the first level indicates censoring, the second level the competing event of interest, and subsequent levels are the other competing events. Visualization functions, `visr()`, `add_CI()`, `add_CNSR()`, and `add_risktable()` share the same syntax as the Kaplan-Meier variants.
```{r cuminc_1}
visR::estimate_cuminc(
tidycmprsk::trial,
strata = "trt",
CNSR = "death_cr",
AVAL = "ttdeath"
) %>%
visR::visr(
legend_position = "bottom",
x_label = "Months from Treatment",
y_label = "Risk of Death"
) %>%
visR::add_CI() %>%
visR::add_risktable(statlist = c("n.risk", "cum.event"))
```