-
Notifications
You must be signed in to change notification settings - Fork 48
/
format_precedence.Rmd
391 lines (324 loc) · 11.6 KB
/
format_precedence.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
---
title: "Format Precedence and NA Handling"
author: "Wojciech Wójciak and Gabriel Becker"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Format Precedence and NA Handling}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: console
---
```{r, echo=FALSE}
knitr::opts_chunk$set(comment = "#")
```
```{css, echo=FALSE}
.reveal .r code {
white-space: pre;
}
```
## Formats Precedence
Users of the `rtables` package can specify the format in which the
numbers in the reporting tables are printed. Formatting functionality
is provided by the
[`formatters`](https://insightsengineering.github.io/formatters/) R
package. See `formatters::list_valid_format_labels()` for a list of
all available formats. The format can be
specified by the user in a few different places. It may happen that,
for a single table layout, the format is specified in more than one
place. In such a case, the final format that will be applied depends
on format precedence rules defined by `rtables`. In this vignette, we
describe the basic rules of `rtables` format precedence.
The examples shown in this vignette utilize the example `ADSL`
dataset, a demographic table that summarizes the variables
content for different population subsets (encoded in the columns).
```{r, message=FALSE}
library(rtables)
ADSL <- ex_adsl
```
Note that all `ex_*` data which is currently attached to the `rtables`
package is provided by the
[`formatters`](https://insightsengineering.github.io/formatters/)
package and was created using the publicly available
[`random.cdisc.data`](https://insightsengineering.github.io/random.cdisc.data/)
R package.
### Format Precedence and Inheritance Rules
The format in which numbers are printed can be specified by the user
in a few different places. In the context of precedence, it is
important which level of the split hierarchy formats are specified
at. In general, there are two such levels: the **cell** level
and the so-called **parent table** level. The concept of the cell and
the parent table results from the way in which the `rtables` package
stores resulting tables. It models the resulting tables as
hierarchical, tree-like objects with the cells (as leaves) containing
multiple values. Particularly noteworthy in this context is the fact
that the actual table splitting occurs in a row-dominant way (even if
column splitting is present in the layout). `rtables` provides
user-end function `table_structure()` that prints the structure of a
given table object.
For a simple illustration, consider the following example:
```{r}
lyt <- basic_table() %>%
split_cols_by("ARM") %>%
split_rows_by("SEX") %>%
analyze(vars = "AGE", afun = mean)
adsl_analyzed <- build_table(lyt, ADSL)
adsl_analyzed
table_structure(adsl_analyzed)
```
In this table, there are 4 sub-tables under the `SEX` table.
These are: `F`, `M`, `U`, and `UNDIFFERENTIATED`. Each of these
sub-tables has one sub-table `AGE`. For example, for the first
`AGE` sub-table, its parent table is `F`.
The concept of hierarchical, tree-like representations of resulting
tables translates directly to format precedence and inheritance
rules. As a general principle, the format being finally applied for
the cell is the one that is the most specific, that is, the one which
is the closest to the cell in a given path in the tree. Hence, the
precedence-inheritance chain looks like the following:
```
parent_table -> parent_table -> ... -> parent_table -> cell
```
In such a chain, the outermost `parent_table` is the least specific
place to specify the format, while the `cell` is the most specific
one. In cases where the format is specified by the user in more than
one place, the one which is most specific will be applied in the cell.
If no specific format has been selected by the user for the split, then
the default format will be applied. The default format is `"xx"` and it
yields the same formatting as the `as.character()` function. In the
following sections of this vignette, we will illustrate the format
precedence rules with a few examples.
### Standard Format
Below is a simple layout that does not explicitly set a format for the
output of the analysis function. In such a case, the default format
is applied.
```{r}
lyt0 <- basic_table() %>%
split_cols_by("ARM") %>%
analyze(vars = "AGE", afun = mean)
build_table(lyt0, ADSL)
```
### Cell Format
The format of a cell can be explicitly specified via the `rcell()`
or `in_rows()` functions. The former is essentially a collection of
data objects while the latter is a collection of `rcell()` objects. As
previously mentioned, this is the most specific place where the format
can be specified by the user.
```{r}
lyt1 <- basic_table() %>%
split_cols_by("ARM") %>%
analyze(vars = "AGE", afun = function(x) {
rcell(mean(x), format = "xx.xx", label = "Mean")
})
build_table(lyt1, ADSL)
lyt1a <- basic_table() %>%
split_cols_by("ARM") %>%
analyze(vars = "AGE", afun = function(x) {
in_rows(
"Mean" = rcell(mean(x)),
.formats = "xx.xx"
)
})
build_table(lyt1a, ADSL)
```
If the format is specified in both of these places at the same time,
the one specified via `in_rows()` takes highest precedence.
Technically, in this case, the format defined in `rcell()` will simply
be overwritten by the one defined in `in_rows()`. This is
because the format specified in `in_rows()` is applied to the cells
not the rows (overriding the previously specified cell-specific
values), which indicates that the precedence rules described above are
still in place.
```{r}
lyt2 <- basic_table() %>%
split_cols_by("ARM") %>%
analyze(vars = "AGE", afun = function(x) {
in_rows(
"Mean" = rcell(mean(x), format = "xx.xxx"),
.formats = "xx.xx"
)
})
build_table(lyt2, ADSL)
```
### Parent Table Format and Inheritance
In addition to the cell level, the format can be specified at the
parent table level. If no format has been set by the user for a cell,
the most specific format for that cell is the one defined at its
innermost parent table split (if any).
```{r}
lyt3 <- basic_table() %>%
split_cols_by("ARM") %>%
analyze(vars = "AGE", mean, format = "xx.x")
build_table(lyt3, ADSL)
```
If the cell format is also specified for a cell, then the parent table
format is ignored for this cell since the cell format is more specific
and therefore takes precedence.
```{r}
lyt4 <- basic_table() %>%
split_cols_by("ARM") %>%
analyze(
vars = "AGE", afun = function(x) {
rcell(mean(x), format = "xx.xx", label = "Mean")
},
format = "xx.x"
)
build_table(lyt4, ADSL)
lyt4a <- basic_table() %>%
split_cols_by("ARM") %>%
analyze(
vars = "AGE", afun = function(x) {
in_rows(
"Mean" = rcell(mean(x)),
"SD" = rcell(sd(x)),
.formats = "xx.xx"
)
},
format = "xx.x"
)
build_table(lyt4a, ADSL)
```
In the following, slightly more complicated, example, we can observe
partial inheritance. That is, only `SD` cells inherit the parent
table's format while the `Mean` cells do not.
```{r}
lyt5 <- basic_table() %>%
split_cols_by("ARM") %>%
analyze(
vars = "AGE", afun = function(x) {
in_rows(
"Mean" = rcell(mean(x), format = "xx.xx"),
"SD" = rcell(sd(x))
)
},
format = "xx.x"
)
build_table(lyt5, ADSL)
```
## `NA` Handling
Consider the following layout and the resulting table created:
```{r}
lyt6 <- basic_table() %>%
split_cols_by("ARM") %>%
split_rows_by("SEX") %>%
analyze(vars = "AGE", afun = mean, format = "xx.xx")
build_table(lyt6, ADSL)
```
In the output the cell corresponding to the `UNDIFFERENTIATED` level
of `SEX` and the `B: Placebo` level of `ARM` is displayed as `NA`.
This occurs because there were no non-`NA` values under this facet
that could be used to compute the mean. `rtables` allows the user to
specify a string to display when cell values are `NA`. Similar to
formats for numbers, the user can specify a string to replace `NA`
with the parameter `format_na_str` or `.format_na_str`. This can be
specified at the cell or parent table level. `NA` string
precedence and inheritance rules are the same as those for number
format precedence, described in the previous section of this
vignette. We will illustrate this with a few examples.
### Replacing `NA` Values at the Cell Level
At the cell level, it is possible to replace `NA` values with a custom
string by means of the `format_na_str` parameter in `rcell()` or
`.format_na_str` parameter in `in_rows()`.
```{r}
lyt7 <- basic_table() %>%
split_cols_by("ARM") %>%
split_rows_by("SEX") %>%
analyze(vars = "AGE", afun = function(x) {
rcell(mean(x), format = "xx.xx", label = "Mean", format_na_str = "<missing>")
})
build_table(lyt7, ADSL)
lyt7a <- basic_table() %>%
split_cols_by("ARM") %>%
split_rows_by("SEX") %>%
analyze(vars = "AGE", afun = function(x) {
in_rows(
"Mean" = rcell(mean(x), format = "xx.xx"),
.format_na_strs = "<MISSING>"
)
})
build_table(lyt7a, ADSL)
```
If the `NA` string is specified in both of these places at the same
time, the one specified with `in_rows()` takes precedence.
Technically, in this case the `NA` replacement string
defined in `rcell()` will simply be overwritten by the one defined in
`in_rows()`. This is because the `NA` string specified in `in_rows()`
is applied to the cells, not the rows (overriding the previously
specified cell specific values), which means that the precedence rules
described above are still in place.
```{r}
lyt8 <- basic_table() %>%
split_cols_by("ARM") %>%
split_rows_by("SEX") %>%
analyze(vars = "AGE", afun = function(x) {
in_rows(
"Mean" = rcell(mean(x), format = "xx.xx", format_na_str = "<missing>"),
.format_na_strs = "<MISSING>"
)
})
build_table(lyt8, ADSL)
```
### Parent Table Replacement of `NA` Values and Inheritance Principles
In addition to the cell level, the string replacement for `NA` values
can be specified at the parent table level. If no replacement
string has been specified by the user for a cell, the most specific
`NA` string for that cell is the one defined at its innermost parent
table split (if any).
```{r}
lyt9 <- basic_table() %>%
split_cols_by("ARM") %>%
split_rows_by("SEX") %>%
analyze(vars = "AGE", mean, format = "xx.xx", na_str = "not available")
build_table(lyt9, ADSL)
```
If an `NA` value replacement string was also specified at the
cell level, then the one set at the parent table level is ignored for
this cell as the cell level format is more specific and therefore
takes precedence.
```{r}
lyt10 <- basic_table() %>%
split_cols_by("ARM") %>%
split_rows_by("SEX") %>%
analyze(
vars = "AGE", afun = function(x) {
rcell(mean(x), format = "xx.xx", label = "Mean", format_na_str = "<missing>")
},
na_str = "not available"
)
build_table(lyt10, ADSL)
lyt10a <- basic_table() %>%
split_cols_by("ARM") %>%
split_rows_by("SEX") %>%
analyze(
vars = "AGE", afun = function(x) {
in_rows(
"Mean" = rcell(mean(x)),
"SD" = rcell(sd(x)),
.formats = "xx.xx",
.format_na_strs = "<missing>"
)
},
na_str = "not available"
)
build_table(lyt10a, ADSL)
```
In the following, slightly more complicated example, we can observe
partial inheritance of NA strings. That is, only `SD` cells inherit
the parent table's `NA` string, while the `Mean` cells do not.
```{r}
lyt11 <- basic_table() %>%
split_cols_by("ARM") %>%
split_rows_by("SEX") %>%
analyze(
vars = "AGE", afun = function(x) {
in_rows(
"Mean" = rcell(mean(x), format_na_str = "<missing>"),
"SD" = rcell(sd(x))
)
},
format = "xx.xx",
na_str = "not available"
)
build_table(lyt11, ADSL)
```