vignettes/advanced_usage.Rmd

---
title: "{rtables} Advanced Usage"
author: "Gabriel Becker"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{{rtables} Advanced Usage}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

## NOTE

This vignette is currently under development. Any code or prose which
appears in a version of this vignette on the `main` branch of the
repository will work/be correct, but they likely are not in their
final form.

Initialization
```{r, message=FALSE}
library(rtables)
```

## Control splitting with provided function (limited customization)

rtables provides an array of functions to control the splitting logic without creating an entirely new split functions. By default `split_*_by` facets data based on categorical variable.

```{r}
d1 <- subset(ex_adsl, AGE < 25)
d1$AGE <- as.factor(d1$AGE)
lyt1 <- basic_table() %>%
  split_cols_by("AGE") %>%
  analyze("SEX")

build_table(lyt1, d1)
```

For continuous variables, the `split_*_by_cutfun` can be leveraged to create categories and the corresponding faceting, when the break points are dependent from the data.

```{r}
sd_cutfun <- function(x) {
  cutpoints <- c(
    min(x),
    mean(x) - sd(x),
    mean(x) + sd(x),
    max(x)
  )

  names(cutpoints) <- c("", "Low", "Medium", "High")
  cutpoints
}

lyt1 <- basic_table() %>%
  split_cols_by_cutfun("AGE", cutfun = sd_cutfun) %>%
  analyze("SEX")

build_table(lyt1, ex_adsl)
```

Alternatively, `split_*_by_cuts` can be used when breakpoints are predefined and `split_*_by_quartiles` when the data should be faceted by quantile.

```{r}
lyt1 <- basic_table() %>%
  split_cols_by_cuts(
    "AGE",
    cuts = c(0, 30, 60, 100),
    cutlabels = c("0-30 y.o.", "30-60 y.o.", "60-100 y.o.")
  ) %>%
  analyze("SEX")

build_table(lyt1, ex_adsl)
```


## Custom Split Functions

### Adding an Overall Column Only When The Split Would Already Define 2+ Facets

Our custom split functions can do anything, including conditionally
applying one or more other existing custom split functions.

Here we define a function constructor which accepts the variable name
we want to check, and then return a custom split function that has the
behavior you want using functions provided by rtables for both cases:

```{r}
picky_splitter <- function(var) {
  function(df, spl, vals, labels, trim) {
    orig_vals <- vals
    if (is.null(vals)) {
      vec <- df[[var]]
      vals <- if (is.factor(vec)) levels(vec) else unique(vec)
    }
    if (length(vals) == 1) {
      do_base_split(spl = spl, df = df, vals = vals, labels = labels, trim = trim)
    } else {
      add_overall_level(
        "Overall",
        label = "All Obs", first = FALSE
      )(df = df, spl = spl, vals = orig_vals, trim = trim)
    }
  }
}


d1 <- subset(ex_adsl, ARM == "A: Drug X")
d1$ARM <- factor(d1$ARM)

lyt1 <- basic_table() %>%
  split_cols_by("ARM", split_fun = picky_splitter("ARM")) %>%
  analyze("AGE")
```

This gives us the desired behavior in both the one column corner case:
```{r}
build_table(lyt1, d1)
```

and the standard multi-column case:

```{r}
build_table(lyt1, ex_adsl)
```

Notice we use add_overall_level which is itself a function
constructor, and then immediately call the constructed function in the
more-than-one-columns case.

## Leveraging `.spl_context`

### What Is `.spl_context`?

`.spl_context` (see `?spl_context`) is a mechanism by which the
`rtables` tabulation machinery gives custom split, analysis or content
(row-group summary) functions information about the overarching
facet-structure the splits or cells they generate will reside in.

In particular `.spl_context` ensures that your functions know (and
thus do computations based on) the following types of information:

- 


### Different Formats For Different Values Within A Row-Split

```{r}
dta_test <- data.frame(
  USUBJID = rep(1:6, each = 3),
  PARAMCD = rep("lab", 6 * 3),
  AVISIT = rep(paste0("V", 1:3), 6),
  ARM = rep(LETTERS[1:3], rep(6, 3)),
  AVAL = c(9:1, rep(NA, 9)),
  CHG = c(1:9, rep(NA, 9))
)

my_afun <- function(x, .spl_context) {
  n <- sum(!is.na(x))
  meanval <- mean(x, na.rm = TRUE)
  sdval <- sd(x, na.rm = TRUE)

  ## get the split value of the most recent parent
  ## (row) split above this analyze
  val <- .spl_context[nrow(.spl_context), "value"]
  ## do a silly thing to decide the different format precisiosn
  ## your real logic would go here
  valnum <- min(2L, as.integer(gsub("[^[:digit:]]*", "", val)))
  fstringpt <- paste0("xx.", strrep("x", valnum))
  fmt_mnsd <- sprintf("%s (%s)", fstringpt, fstringpt)
  in_rows(
    n = n,
    "Mean, SD" = c(meanval, sdval),
    .formats = c(n = "xx", "Mean, SD" = fmt_mnsd)
  )
}

lyt <- basic_table() %>%
  split_cols_by("ARM") %>%
  split_rows_by("AVISIT") %>%
  split_cols_by_multivar(vars = c("AVAL", "CHG")) %>%
  analyze_colvars(my_afun)

build_table(lyt, dta_test)
```

### Simulating 'Baseline Comparison' In Row Space


```{r}
my_afun <- function(x, .var, .spl_context) {
  n <- sum(!is.na(x))
  meanval <- mean(x, na.rm = TRUE)
  sdval <- sd(x, na.rm = TRUE)

  ## get the split value of the most recent parent
  ## (row) split above this analyze
  val <- .spl_context[nrow(.spl_context), "value"]
  ## we show it if its not a CHG within V1
  show_it <- val != "V1" || .var != "CHG"
  ## do a silly thing to decide the different format precisiosn
  ## your real logic would go here
  valnum <- min(2L, as.integer(gsub("[^[:digit:]]*", "", val)))
  fstringpt <- paste0("xx.", strrep("x", valnum))
  fmt_mnsd <- if (show_it) sprintf("%s (%s)", fstringpt, fstringpt) else "xx"
  in_rows(
    n = if (show_it) n, ## NULL otherwise
    "Mean, SD" = if (show_it) c(meanval, sdval), ## NULL otherwise
    .formats = c(n = "xx", "Mean, SD" = fmt_mnsd)
  )
}

lyt <- basic_table() %>%
  split_cols_by("ARM") %>%
  split_rows_by("AVISIT") %>%
  split_cols_by_multivar(vars = c("AVAL", "CHG")) %>%
  analyze_colvars(my_afun)

build_table(lyt, dta_test)
```


We can further simulate the formal modeling of reference row(s) using the `extra_args` machinery


```{r}
my_afun <- function(x, .var, ref_rowgroup, .spl_context) {
  n <- sum(!is.na(x))
  meanval <- mean(x, na.rm = TRUE)
  sdval <- sd(x, na.rm = TRUE)

  ## get the split value of the most recent parent
  ## (row) split above this analyze
  val <- .spl_context[nrow(.spl_context), "value"]
  ## we show it if its not a CHG within V1
  show_it <- val != ref_rowgroup || .var != "CHG"
  fmt_mnsd <- if (show_it) "xx.x (xx.x)" else "xx"
  in_rows(
    n = if (show_it) n, ## NULL otherwise
    "Mean, SD" = if (show_it) c(meanval, sdval), ## NULL otherwise
    .formats = c(n = "xx", "Mean, SD" = fmt_mnsd)
  )
}

lyt2 <- basic_table() %>%
  split_cols_by("ARM") %>%
  split_rows_by("AVISIT") %>%
  split_cols_by_multivar(vars = c("AVAL", "CHG")) %>%
  analyze_colvars(my_afun, extra_args = list(ref_rowgroup = "V1"))

build_table(lyt2, dta_test)
```