-
Notifications
You must be signed in to change notification settings - Fork 7
/
tidytlg.Rmd
422 lines (301 loc) · 19.7 KB
/
tidytlg.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
---
title: "Get Started"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Get Started}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "figures"
)
working_dir <- tempdir()
```
## Overview of TLG Programming
`tidytlg` provides a framework of creating TLG outputs for clinical study report. The TLG programming workflow includes the following steps:
* Prep environment: set up the R environment for the I/O paths.
* Process data: filter analysis data, perform data manipulation (e.g. convert character variable to factor), and define column variable.
* Generate results: create analysis rows of summary statistics (for tables) or plots (for graphs).
* Output results: output analysis results in designated format such as rtf or html.
We will illustrate the above steps by creating a demographic table first, and then follow by examples of creating listing and graph.
## Prep environment
To set up the R environment, you can set the path objects of the input folder and output folder consistently for all TLG programs. The analysis datasets and other required inputs such as the titles file and column metadata file are placed in the input folder, while the output folder will be used to store the output files. The [envsetup](https://pharmaverse.github.io/envsetup/main/) package can be used to set up the R environment for TLG programming.
### titles and footnotes
The information for titles and footnotes for each TLG can be stored in an excel file called titles.xls (see below snapshot), which will be used later to create the outputs.
```{r, echo=FALSE, out.width = "600px"}
knitr::include_graphics("titles.PNG")
```
### column metadata
Column metadata provides the column structure of the table layout and includes the following variables:
* tbltype: identifier used to group a table column layout
* coldef: distinct variable values used, typically numeric and typically a treatment variable, think TRT01PN
* decode: decode of coldef that will display as a column header in the table
* span1: spanning header to display across multiple columns (the lowest level)
* span2: spanning header to display across multiple columns, second level
* span3: spanning header to display across multiple columns, third level
Please see below for a snapshot of column_metadata.xlsx.
```{r, echo=FALSE, out.width = "500px"}
knitr::include_graphics("column_metadata.PNG")
```
Different types of column layouts identified by different `tbltype` can be stored in an excel file called column_metadata.xlsx. Within each `tbltype`, the `coldef` variable defines the order of the column based on the column variable used for creating the output (typically the numeric treatment variable, TRT01PN, is used as the column variable). For example, there are 3 columns for `tbltype` = "type1" in the above snapshot and the column layout is defined as follows: the first column of summary statistics represents the treatment group of `TRT01PN = 0` with the column header of Placebo defined by `decode`, the second and third columns represent the Low Dose and High Dose groups respectively with the spanning header of Xanomeline defined by the `span1` variable.
Users can also include the column that is derived from combination of individual columns. For example, the `tbltype` of type3 include the 4th column of combined Low Dose and High Dose as well as the 5th column of total group. Please see below for the snapshot of column headers defined by type3.
```{r, echo=FALSE, out.width = "600px", out.height = "60px"}
knitr::include_graphics("column_header.PNG")
```
We will use the adsl data from the [PHUSE Test Data Factory](https://github.com/phuse-org/TestDataFactory/tree/main/Updated/TDF_ADaM) to illustrate the creation of a demographic table.
```{r, message=FALSE}
# Prep Environment -------------------------------------------------------------------------------------
library(dplyr)
library(haven)
library(tidytlg)
# read adsl from PhUSE test data factory
testdata <- "https://github.com/phuse-org/TestDataFactory/raw/main/Updated/TDF_ADaM/"
adsl <- read_xpt(url(paste0(testdata,"adsl.xpt")))
```
## Process data
Before generating analysis summary, the analysis data need to be processed first as shown in the code below.
```{r, message=FALSE}
# Process Data -----------------------------------------------------------------------------------------
adsl <- adsl %>%
filter(ITTFL == "Y") %>%
mutate(SEX = factor(SEX, levels = c("M", "F", "U"), labels = c("Male", "Female", "Unknown"))) %>%
tlgsetup(var = "TRT01PN",
column_metadata_file = system.file("extdata/column_metadata.xlsx", package = "tidytlg"),
tbltype = "type3")
```
The above code perform the tasks below:
* filtering analysis population
* convert the `SEX` variable from character type to factor type. So all the factor levels of `SEX` will be displayed in the analysis summary even there are no records for the factor level of "Unknown".
* create the column variable, `colnbr`, through the `tlgsetup` function call: `tlgsetup` is using the numeric treatment variable (e.g. `TRT01PN`) to match with `coldef` in column metadata defined by `column_metadata_file` and `tbltype` to create the column variable, `colnbr`, in `adsl` for reflecting the column layout. Please see the `vignette("tlgsetup")` for more details. The column variable, `colnbr`, will be used in the subsequent analysis function calls for creating analysis results.
If you need multiple analysis datasets for creating TLG, `tlgsetup` will need to be applied to each dataset. Therefore, you will have a consistent column variable of `colnbr` for creating analysis summary.
## Generate results
`tidytlg` provides 3 functions, `univar`, `freq`, and `nested_freq`, to generate analysis summary of descriptive statistics (univariate statistics and count (percentages)). For more details, please see the frequency analysis `vignette("freq")` and the univariate statistical analysis `vignette("univar")`.
```{r, message=FALSE}
# Generate Results -------------------------------------------------------------------------------------
## Analysis set row
t1 <- adsl %>%
freq(colvar = "colnbr",
rowvar = "ITTFL",
statlist = statlist("n"),
subset = ITTFL == "Y",
rowtext = "Analysis set: ITT")
## Univariate summary for AGE
t2 <- adsl %>%
univar(colvar = "colnbr",
rowvar = "AGE",
statlist = statlist(c("N", "MEANSD", "MEDIAN", "RANGE", "IQRANGE")),
decimal = 0,
row_header = "Age, years")
## Count (percentages) for SEX
t3 <- adsl %>%
freq(colvar = "colnbr",
rowvar = "SEX",
statlist = statlist(c("N","n (x.x%)")),
row_header = "Gender")
```
The above function calls generate the requested analysis rows for the table output sequentially and store the results in individual objects (i.e. `t1`, `t2`, `t3`). The next step is to combine analysis results into a single `tbl` dataframe through the `bind_table` function call.
```{r}
# Format Results ---------------------------------------------------------------------------------------
tbl <- bind_table(t1, t2, t3,
column_metadata_file = system.file("extdata/column_metadata.xlsx", package = "tidytlg"),
tbltype = "type3")
```
The above `bind_table` function call performs the following tasks:
* bind the analysis rows from `t1`, `t2`, `t3` into `tbl`
* add formatting variables (`indentme`, `newrows`, `newpage`), which will be used in the `gentlg` function call below for creating the output.
* attach the column metadata specified by `column_metadata_file` and `tbltype` as an attribute of the `tbl`. So the column header and the spanning headers (i.e. decode, span1, span2, span3) defined in the column metadata can be used automatically in the `gentlg` function call.
## Output results
The `tbl` data frame is the main input to the gentlg function for creating the RTF/HTML outputs.
The basic structure of tbl includes label, col1, col2, ..., coln, where
* label: row text displayed on the 1st column of the table
* col1: statistic results displayed on the 2nd column of the table
* col2: statistic results displayed on the 3rd column of the table.
All other columns contain formatting instructions to create the RTF/HTML outputs. For tweaking the formatting variables to customize the table layout, please see the `vignette("tbl_manipulation")` for more details.
```{r}
knitr::kable(tbl)
```
The `gentlg` function call below will create the rtf output using the `tblid` as the file name in the folder defined by the `opath` argument. Please ensure that the `titles.xls` file contains the records of titles and footnotes for the specified `tblid`.
```{r}
tblid <- "Table01"
gentlg(huxme = tbl,
opath = file.path(working_dir),
file = tblid,
orientation = "landscape",
title_file = system.file("extdata/titles.xls", package = "tidytlg"))
```
To create the html output, users need to specify the `format` argument as "HTML" and `print.hux` argument as FALSE in the `gentlg` call.
```{r}
gentlg(huxme = tbl,
format = "HTML",
print.hux = FALSE,
file = tblid,
orientation = "landscape",
title_file = system.file("extdata/titles.xls", package = "tidytlg"))
```
Users can also include superscripts, subscripts, or line breaks via unicode. Please see the `vignette("symbols")` for more details. Besides using `univar`, `freq`, and `nested_freq` functions to create the `tbl` dataframe, users can use other R packages to create analysis results and perform data wrangling to fit the `tbl` structure, which can be passed into the `gentlg` function call for generating the desired outputs.
## Listing programming
The above workflow can also be used to create listings. Users need to prepare the data and assign it to `tbl`. In the `gentlg` function, users need to pay attention to:
* specify the `tlf` argument to `Listing` (i.e. `tlf = "Listing"`)
* specify the `idvars` argument for identifying variables (such as treatment variable and USUBJID) where repeated values will be removed
* specify the `colheader` (column header) argument; if not specified, the column labels will be used as the column headers. For the below example, if `colheader` argument is not specified, some column headers will use variable names since these columns are newly created without labels.
* user has the option to control the column width by passing a vector of column width to the `wcol` argument. Please ensure that the length of the column width vector is the same as the number of columns in your data. For the example below, there are 8 columns in the data and users can specify customized column width such as `c(0.15, 0.10, 0.05, 0.15, 0.20, 0.15, 0.05, 0.05)` for the `wcol` argument to create the rtf output. However, for the html output shown here, the `wcol` argument can only take a single number as the column width and apply to every column.
```{r}
# Prep Environment ---------------------------------------------------------------------------------------
library(dplyr)
library(haven)
library(tidytlg)
adsl <- cdisc_adsl
adae <- cdisc_adae
# Process Data --------------------------------------------------------------------------------------------
adsl <- adsl %>%
filter(SAFFL == "Y") %>%
select(USUBJID, SAFFL, TRT01AN, TRT01A)
adae <- adae %>%
filter(SAFFL == "Y" & TRTEMFL == "Y") %>%
mutate(BSPT = paste(AEBODSYS, "[", AEDECOD, "]"),
SAEFL = if_else(AESER == "Y", "Yes", "No"),
DTHFL = if_else(AEOUT == "FATAL", "Yes", "No")) %>%
select(USUBJID, ASTDY, TRTA, BSPT, AETERM, SAEFL, DTHFL)
tbl <- inner_join(adsl, adae, by = "USUBJID") %>%
arrange(TRT01AN, USUBJID, ASTDY) %>%
select(TRT01A, USUBJID, ASTDY, TRTA, BSPT, AETERM, SAEFL, DTHFL) %>%
filter(USUBJID %in% c("01-701-1015", "01-701-1023"))
# Output Results ------------------------------------------------------------------------------------------
gentlg(huxme = tbl,
tlf = "l",
format = "HTML",
print.hux = FALSE,
orientation = "landscape",
file = "Listing01",
title = "Listing of Adverse Events",
idvars = c("TRT01A", "USUBJID"),
wcol = 0.15,
colheader = c("Treatment Group",
"Subject ID",
"Study Day of AE",
"Treatment Period",
"Body System [Preferred Term]",
"Verbatim Term",
"Serious",
"Fatal"))
```
## Graph programming
To create the graph output, `tidytlg` provides a framework of integrating the png file with titles and footnotes for producing the rtf or html output.
In the `gentlg` function, users need to:
* specify the `tlf` argument to `g` for graph
* specify the `plotnames` argument with the full path of the png file
* define the `plotwidth` and `plotheight`: it's advised here that users keep the aspect ratio of plot width and height approximately the same as the png image.
The code below will create the rtf output of the plot.
```{r}
# Prep Environment ---------------------------------------------------------------------------------------
library(dplyr)
library(haven)
library(ggplot2)
library(tidytlg)
# read adsl from PhUSE test data factory
testdata <- "https://github.com/phuse-org/TestDataFactory/raw/main/Updated/TDF_ADaM/"
adsl <- read_xpt(url(paste0(testdata,"adsl.xpt")))
tblid <- "Graph01"
# Process Data --------------------------------------------------------------------------------------------
adsl <- adsl %>%
filter(ITTFL == "Y") %>%
select(USUBJID, ITTFL, TRT01PN, TRT01P, AGE, SEX, HEIGHTBL, WEIGHTBL) %>%
mutate(SEX = factor(SEX, levels = c("M", "F"), labels = c("Male", "Female")))
# Generate Results ----------------------------------------------------------------------------------------
plot <- ggplot(data = adsl, aes(x = HEIGHTBL, y = WEIGHTBL)) +
geom_point() +
labs(x = "Baseline Height (cm)",
y = "Baseline Weight (kg)") +
facet_wrap(~SEX, nrow=1)
# create png file
png(file.path(working_dir, paste0(tblid,".png")), width=2800, height=1300, res=300, type = "cairo")
plot
dev.off()
# Output Results ------------------------------------------------------------------------------------------
gentlg(tlf = "g",
plotnames = file.path(system.file("extdata", package = "tidytlg"), paste0(tblid,".png")),
plotwidth = 10,
plotheight = 5,
orientation = "landscape",
opath = file.path(working_dir),,
file = tblid,
title_file = system.file("extdata/titles.xls", package = "tidytlg"))
```
```{r, echo=FALSE, out.width = "750px"}
knitr::include_graphics("graph01.PNG")
```
## Metadata method
Besides building the table section-by-section as shown above, we can use the table metadata approach as an efficient alternative for generating outputs. Table metadata is a data frame describing the data, functions and arguments needed to produce your table results. The table metadata shown below can be used to create the same table output as above. Each row in the table metadata describes how a `tbl` chunk will be created by the function defined in the `func` column. The rest of the columns defines the arguments (i.e. `df`, `colvar`, `rowvar`, `statlist`, `rowtext`, `row_header`) that will be passed into the function.
Once table metadata is defined, users just need to call the `generate_results` function with the column metadata define in the `column_metadata_file` and `tbltype` arguments to create the `tbl` dataframe. In the processing data step, users don't need to call `tlgsetp`, since `tlgsetup` is embedded within the `generate_results` function. That's why we need to specify the column metadata in the `generate_results` call.
```{r}
library(dplyr)
library(haven)
library(tidytlg)
# read adsl from PhUSE test data factory
testdata <- "https://github.com/phuse-org/TestDataFactory/raw/main/Updated/TDF_ADaM/"
adsl <- read_xpt(url(paste0(testdata,"adsl.xpt")))
# Process data
adsl <- adsl %>%
filter(ITTFL == "Y") %>%
mutate(SEX = factor(SEX, levels = c("M", "F", "U"), labels = c("Male", "Female", "Unknown")))
# define table metadata
table_metadata <- tibble::tribble(
~func, ~df, ~rowvar, ~decimal, ~rowtext, ~row_header, ~statlist, ~subset,
"freq", "adsl", "ITTFL", NA, "Analysis set: ITT", NA, statlist("n"), "ITTFL == 'Y'",
"univar", "adsl", "AGE", 0, NA, "Age (Years)", NA, NA,
"freq", "adsl", "SEX", NA, NA, "Gender", statlist(c("N", "n (x.x%)")), NA
) %>%
mutate(colvar = "TRT01PN")
# Generate results
tbl <- generate_results(table_metadata,
column_metadata_file = system.file("extdata/column_metadata.xlsx", package = "tidytlg"),
tbltype = "type3")
# Output results
tblid <- "Table01"
gentlg(huxme = tbl,
format = "HTML",
print.hux = FALSE,
file = tblid,
orientation = "landscape",
title_file = system.file("extdata/titles.xls", package = "tidytlg"))
```
## By processing
There are two types of by-processing that `tidytlg` functions can provide:
* `rowbyvar`: split the summary statistics of `rowvar` by other variable(s) specified in `rowbyvar`. Please see the by processing section of frequency analysis `vignette("freq")` for further details. The `rowbyvar` argument can also be used in the `univar` function. A typical use case is to summarize the lab values by parameter and analysis visit, where we can call the `univar` function with `rowvar = AVAL` and `rowbyvar = c("PARAM","AVISIT")`. It is advised in this use case to turn on the `.ord` argument (i.e. `.ord = TRUE`) in the `univar` function call. So the numeric sorting columns associated with the by variables (`PARAM_ord` and `AVISIT_ord`) can be created and used for sorting the interleaved summary results of `AVAL` and `CHG`.
* `tablebyvar`: the argument of `tablebyvar` is designed to facilitate the sub-group analysis, which repeats a table summary by the sub-group variable. A typical use case is the summary of demographics table by country. For creating the sub-group analysis version of the same table, we just need to add the argument of `tablebyvar` with the sub-group variable in each function call.
The code below provides an example of summarizing age and race by gender using `tablebyvar`.
```{r, message=FALSE}
library(dplyr)
library(haven)
library(tidytlg)
# read adsl from PhUSE test data factory
testdata <- "https://github.com/phuse-org/TestDataFactory/raw/main/Updated/TDF_ADaM/"
adsl <- read_xpt(url(paste0(testdata,"adsl.xpt")))
# Process data
adsl <- adsl %>%
filter(ITTFL == "Y") %>%
mutate(SEX = factor(SEX, levels = c("M", "F"), labels = c("Male", "Female")))
# define table metadata
table_metadata <- tibble::tribble(
~func, ~df, ~rowvar, ~decimal, ~rowtext, ~row_header, ~statlist, ~subset, ~tablebyvar,
"univar", "adsl", "AGE", 0, NA, "Age (Years)", NA, NA, "SEX",
"freq", "adsl", "RACE", NA, NA, "Race", statlist(c("N", "n (x.x%)")), NA, "SEX"
) %>%
mutate(colvar = "TRT01PN")
# Generate results
tbl <- generate_results(table_metadata,
column_metadata_file = system.file("extdata/column_metadata.xlsx", package = "tidytlg"),
tbltype = "type3")
# Output results
tblid <- "Table01"
gentlg(huxme = tbl,
format = "HTML",
print.hux = FALSE,
file = tblid,
orientation = "landscape",
title_file = system.file("extdata/titles.xls", package = "tidytlg"))
```
In summary, `rowbyvar` is used to create the by-variable summary for one `rowvar` in a single function call. To perform sub-group analysis, users need to specify `tablebyvar` in every function calls except the analysis population row.