Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New split function - a multidimensional generalization of current split family functions #203

Closed
ghost opened this issue May 12, 2021 · 17 comments

Comments

@ghost
Copy link

ghost commented May 12, 2021

Hi @gmbecker ( @danielinteractive - FYI )

The issue I am raising here is quite similar to recent #109 (trim_levels_by_map), but it is more generic.

Sometimes, when splitting rows with split_rows_by based on a certain variable, we may want to keep/drop/remove the levels of not only the split variable, but also the levels of other variables in data split. As an example for 'keep' scenario; consider a simple data set defined below

> d <- data.frame(V1 = factor(rep(c("V1L1", "V1L2"), each = 2), levels = c("V1L1", "V1L2")),
                  V2 = factor(c("V2L1", "V2L2", "V2L3", "V2L4"), levels = c("V2L1", "V2L2", "V2L3", "V2L4")))
> d
#     V1   V2
# 1 V1L1 V2L1
# 2 V1L1 V2L2
# 3 V1L2 V2L3
# 4 V1L2 V2L4

Lets say I split rows of d with split_rows_by(V1). Now, after the split I would like to keep the following levels only

  • V1L1 for V1 and V2L1, V2L2 for V2 for the first data split (rows 1-2)
  • V1L2 for V1 and V2L3, V2L4 for V2 for the second data split (rows 3-4).

Similar examples can be constructed for drop/remove scenarios. So this is a kind of multidimensional generalization of current split functions for variables other than just split variable.

I wrote this below function for the 'keep' scenario, as an example. It is far from perfect but works well for at least some basics scenarios I was able to create.

# Custom split function to use with split_rows_by. 
# It is a generalization of rtables::keep_split_levels.
# It not only keeps a specific levels of splitting variable but also 
# allows to keep levels of other variables after the split, 
# conditionally on the level of the split variable, as specified by map
keep_split_levels_of_vars <- function(map = NULL) {
  
  if (is.null(map))
    stop("No map dataframe was provided.")
  
  myfun <- function(df, spl, vals = NULL, labels = NULL, trim = FALSE) {
    
    splvar <- rtables:::spl_payload(spl)
    spllev <- df[[splvar]]
    spllevkeep <- as.character(na.omit(unique(map[[splvar]]))) # levels of split variable to keep 
    
    if (is.factor(spllev) && !all(spllevkeep %in% levels(spllev))) 
      stop("Attempted to keep invalid factor level(s) in split ", setdiff(spllevkeep, levels(spllev)))
    
    df2 <- df[spllev %in% spllevkeep, ]
    rtables:::spl_child_order(spl) <- spllevkeep
    
    ret <- rtables:::.apply_split_inner(spl, df2, vals = spllevkeep, labels = labels, trim = trim)
    
    # keep non-split variables levels
    vars <- setdiff(colnames(map), splvar)  # variable names except split variable
    ret$datasplit <- lapply(ret$values, function(l) {
      
      df3 <- ret$datasplit[[l]]
      
      for (v in vars) {
        lev <- df3[[v]]
        levkeep <- as.character(na.omit(unique(map[map[, splvar] == l, v])))
        if (is.factor(lev) && !all(levkeep %in% levels(lev))) 
          stop("Attempted to keep invalid factor level(s) in split ", setdiff(levkeep, levels(lev)))
        
        df3 <- df3[lev %in% levkeep, , drop = FALSE]
        
        if (is.factor(lev))
          df3[[v]] <- factor(as.character(df3[[v]]), levels = levkeep)
      }
      
      df3
    })
    names(ret$datasplit) <- ret$values
    ret
  }
  
  myfun
}

I would be thankful, if you could share your thoughts on this idea, especially in case we could achieve the same results with some existing rtables functionality? or if not - how feasible would be to add this functionality into rtables.

Thank you.

W.

@gmbecker
Copy link
Collaborator

Hi @w-wojtek, thanks for reaching out and for digging into how splitting works in rtables. I must admit to being a bit confused. From what I can see this is a straightforward application of the trim_levels_by_map split function constructor I posted in a comment in #109. This does act as a good reminder, though, that I need to add that function to the package, it seems I never did so, which was my mistake.

The function (copied from #109 ) is:

#' @rdname split_funcs
#' @param outervar character(1). Parent split variable to trim \code{innervar} levels within. Must appear in map
#' @param map data.frame. Data frame mapping \code{outervar} values  to allowable \code{innervar} values. If no map exists a-priori, use
#' @export
trim_levels_by_map = function(innervar, outervar, map = NULL) {
    if(is.null(map))
        stop("no map dataframe was provided. Use trim_levels_in_group to trim combinations present in the data being tabulated.")
    myfun = function(df, spl, vals = NULL, labels = NULL, trim = FALSE) {
        ret = .apply_split_inner(spl, df, vals = vals, labels = labels, trim = trim)

        outval <- unique(as.character(df[[outervar]]))
        oldlevs <- spl_child_order(spl)
        newlevs <- oldlevs[oldlevs %in% map[as.character(map[[outervar]]) == outval, innervar, drop =TRUE]]

        keep <- ret$values %in% newlevs
        ret <- lapply(ret, function(x) x[keep])
        ret$datasplit <- lapply(ret$datasplit, function(df) {
            df[[innervar]] <- factor(as.character(df[[innervar]]), levels = newlevs)
            df
        })
        ret$labels <- as.character(ret$labels) # TODO
        ret
    }
    myfun
}

With that function, however, you'd simply do

splfun <- trim_levels_by_map("V1", "V2",  map = d)
lyt <- basic_table %>%
  split_rows_by("V1") %>%
  split_rows_by("V2", split_fun = splfun)
...

I have added trim_levels_by_map to my local version and so it will be available in the next push that happens.

Please let me know if you have use cases beyond those covered by this function

@ghost
Copy link
Author

ghost commented May 13, 2021

Hallo @gmbecker

Thank you very much for your prompt answer. My case is that the split must remain at the V1 level and cannot go any further, i.e. I must not split further by V2. The reason behind it is that after split_rows_by("V1"), I pass the data split further down to a counting function ( tern::count_occurrences() with argument .stats = "count_fraction" ) for subsequent computations. The key requirement is that records (patients) within the same levels of V1 must share the same value of denominator for different levels of V2. Therefore I cannot split data more after V1. On the other hand, I need to make sure that V2 is spanned only on selected levels (even for the case of no records for some of these levels), conditionally on the levels of V1 - hence the need of mapping.

Please let me know if you would like to get more details about it.
Thanks.

@anajens
Copy link
Contributor

anajens commented May 13, 2021

@gmbecker just want to add some examples.

Using two split variables and trim_levels_by_map we do have the right structure in the table but it's not possible to get the right denominator.

d <- data.frame(
  V1 = factor(c("V1L1", "V1L1", "V1L2", "V1L2", "V1L2"), levels = c("V1L1", "V1L2")),
  V2 = factor(c("V2L1", "V2L2", "V2L3", "V2L4", "V2L5"), levels = c("V2L1", "V2L2", "V2L3", "V2L4", "V2L5"))
)

cfun <- function(x, labelstr) {
  denom <- length(x)
  num <- length(unique(x))
  
  in_rows(
    c(num, denom),
    .formats = "xx / xx",
    .labels = sprintf("%s", labelstr)
  )
}

basic_table() %>%
  split_rows_by("V1") %>%
  summarize_row_groups(format = "xx") %>%
  split_rows_by("V2", split_fun = trim_levels_by_map(innervar = "V2", outervar = "V1", map = d)) %>%
  summarize_row_groups("V2", cfun = cfun) %>%
  build_table(d)

We want the denominators to be based on outer variable (V1 in this case) but due to the second row split it's not possible.

         all obs
----------------
V1L1        2   
  V2L1    1 / 1 
  V2L2    1 / 1 
V1L2        3   
  V2L3    1 / 1 
  V2L4    1 / 1 
  V2L5    1 / 1 

Using the function proposed by @w-wojtek this works:

afun <- function(x) {
  denom <- length(x)
  num <- table(unique(x))
  result <- lapply(num, `c`, denom)
  
  do.call(in_rows, lapply(result, rcell, format = "xx / xx"))
}

basic_table() %>%
  split_rows_by("V1", split_fun = keep_split_levels_of_vars(map = d)) %>%
  summarize_row_groups(format = "xx") %>%
  analyze("V2", afun = afun) %>%
  build_table(d)
         all obs
----------------
V1L1        2   
  V2L1    1 / 2 
  V2L2    1 / 2 
V1L2        3   
  V2L3    1 / 3 
  V2L4    1 / 3 
  V2L5    1 / 3 

@gmbecker
Copy link
Collaborator

Hi @anajens @w-wojtek,

I do see now what you're getting at here. And upon further inspection it is conceptually a pretty straightforward extension of the existing trim_levels_in_group which does this already, but only for the case where you want to restrict a single "inner variable" within each of the parent groups to exactly those levels of the inner variable observed in each group.

I'm considering ways of specifying the above special case within a map, in which case we wouldn't need two separate function (factories) - possibly using NAs - but I'm not set on that yet. If that doesn't happen, I do think I'll want to change the name of this function, but naming things is (NP-)hard so we'll see if I can come up with anything more succinct.

That said, I am still a bit surprised that that this isn't a data-preprocessing task, ie if the restriction on groupings/chains of levels of different variables is known at layouting time (a requirement for this approach), then I would have expected the data would simply be restricted to those data who meet those requirements before passing it to build_table. The reason I would have expected this is that I would think that you would want, e.g., the column counts etc to reflect that restriction of the data, and in the case that this is happening for row splits, that won't be the case with this approach.

Also, @w-wojtek your function is good work, but there's a trick (that I would not have expected you to know, mind you) that makes it much simpler, and will likely be useful to you again someday as a power-user of the framework, which is that you can call another splitting function directly with your custom splitting function and then massage the results from there.

The beginning of your custom function is recreating the work done by the existing keep_split_levels split function factory, so we can just use that. Note this also nearly gets rid of the need to use ::: in your user-level function (I will export spl_payload as well as some other things; it is a good point that that will often be needed when writing advanced custom splitting functions)

keep_split_levels_of_vars2 <- function(map = NULL) {

  if (is.null(map))
    stop("No map dataframe was provided.")

    myfun <- function(df, spl, vals = NULL, labels = NULL, trim = FALSE) {
        splvar <- rtables:::spl_payload(spl)
        nondup <- !duplicated(map[[splvar]])
        ksl_fun <- keep_split_levels(only = map[[splvar]][nondup], reorder = TRUE)
        ret <- ksl_fun(df, spl, vals, labels, trim = trim)

        ## keep non-split variables levels
        vars <- setdiff(colnames(map), splvar)  # variable names except split variable
        ret$datasplit <- lapply(ret$values, function(l) {

            df3 <- ret$datasplit[[l]]
         
            for (v in vars) {
                lev <- df3[[v]]
                levkeep <- as.character(na.omit(unique(map[map[, splvar] == l, v])))
                if (is.factor(lev) && !all(levkeep %in% levels(lev)))
                    stop("Attempted to keep invalid factor level(s) in split ", setdiff(levkeep, levels(lev)))

                df3 <- df3[lev %in% levkeep, , drop = FALSE]

                if (is.factor(lev))
                    df3[[v]] <- factor(as.character(df3[[v]]), levels = levkeep)
            }

            df3
        })
        names(ret$datasplit) <- ret$values
        ret
    }

    myfun
}

which lets us do:

d <- data.frame(
  V1 = factor(c("V1L1", "V1L1", "V1L2", "V1L2", "V1L2"), levels = c("V1L1", "V1L2")),
  V2 = factor(c("V2L1", "V2L2", "V2L3", "V2L4", "V2L5"), levels = c("V2L1", "V2L2", "V2L3", "V2L4", "V2L5"))
)

## sample separately so we have some that will need removal
## because they don't match the map
dat <- data.frame(V1 = sample(d$V1, 200, replace = TRUE),
                  V2 = sample(d$V2, 200, replace = TRUE))


afun <- function(x) {
  denom <- length(x)
  num <- table(x)
  result <- lapply(num, `c`, denom)
  in_rows(.list = result,
          .formats = setNames(rep("xx / xx", length(num)), names(tbl)))
}

basic_table() %>%
  split_rows_by("V1", split_fun = keep_split_levels_of_vars2(d)) %>%
  summarize_row_groups(format = "xx") %>%
  analyze("V2", afun = afun) %>%
  build_table(dat)

and get


         all obs
----------------
V1L1       25   
  V2L1   13 / 25
  V2L2   12 / 25
V1L2       75   
  V2L3   23 / 75
  V2L4   21 / 75
  V2L5   31 / 75

Do please note, however, that with col-counts turned on the column count is 200, which is "incorrect", as far as I can see, as I alluded to above.

@gmbecker gmbecker added this to ToDo in May - Jul 2021 via automation May 17, 2021
@gmbecker gmbecker moved this from ToDo to In Progress in May - Jul 2021 May 17, 2021
@ghost
Copy link
Author

ghost commented May 18, 2021

Thank you @gmbecker (@anajens ) for the comprehensive answer.

I had looked at trim_levels_in_group before I raised this issue, and also today. While I see this task is conceptually a pretty straightforward extension of existing trim_levels_in_group; I do not see however, how this function does this already (just to emphasize, we want to keep a certain levels of inner variable, conditionally on levels of the split variable, and regardless of whether there are some rows (patients) in a split data portion for the required levels of inner variable or not; So conceptually, this is nothing more than just a generalization of existing keep_split_levels to more than 1 dimension).

About pre-processing, I may be missing something, but I do not see how this can be achieved at the pre-processing stage, without using some esoteric and sophisticated tricks (probably even more complex than the option with custom split function discussed here). You cannot do much before a final data split - as the levels of inner variable to be kept in split data portion depend on the levels of the split variable. Also, we do not want to add one split more, by the current inner variable (V2 in our example used here), as we will lost the grouping needed by the counting function, at the next step.

Thank you very much indeed for sharing with me the details and further tricks about the implementation. I was thinking about using keep_split_levels in the body of this new function, but was not so sure how to handle it in details, so just decided to make this prototype very straightforward, just to confirm the idea first with you.

Please let us know if you decide on the way you want to specify such an extension.
Thank you!

@gmbecker
Copy link
Collaborator

I had looked at trim_levels_in_group before I raised this issue, and also today. While I see this task is conceptually a pretty straightforward extension of existing trim_levels_in_group; I do not see however, how this function does this already (just to emphasize, we want to keep a certain levels of inner variable, conditionally on levels of the split variable, and regardless of whether there are some rows (patients) in a split data portion for the required levels of inner variable or not; So conceptually, this is nothing more than just a generalization of existing keep_split_levels to more than 1 dimension).

I don't thin keep_split_levels is the correct thing to conceptually abstract from here. It doesn't matter too much in practice but for the sake of clarity I'd say that this function generalizes trim_levels_in_groups, rather than keep_split_levels because trim_levels_in_groups differentially keeps levels of a second (inner) variable within each level of a split, while keep_split_levels simply drops some of possible levels of a variable when splitting on it.

About pre-processing, I may be missing something, but I do not see how this can be achieved at the pre-processing stage, without using some esoteric and sophisticated tricks (probably even more complex than the option with custom split function discussed here). You cannot do much before a final data split - as the levels of inner variable to be kept in split data portion depend on the levels of the split variable. Also, we do not want to add one split more, by the current inner variable (V2 in our example used here), as we will lost the grouping needed by the counting function, at the next step.

While it is somewhat non-trivial to write a conceptually similar filter_to_dict function which accepts a map/dictionary df and a data df and does what we have keep_split_levels_in_map doing, that doesn't change the fact that I Think its probably what you ultimately want. Keep in mind that even if you used the splitting function you'd need to process the data that way in order to get the correct column counts to override the automatic (and incorrect as I pointed out) ones anyway.

@wwojciech
Copy link
Contributor

wwojciech commented May 19, 2021

FYI: I am swtiching to my old GitHub account, and deleted the previous one used in this discussion.

@wwojciech
Copy link
Contributor

Hi @gmbecker

In regard to @anajens request, I am attaching the my proposals of unit test functions.

unit_tests_new_split_fun.txt

@anajens
Copy link
Contributor

anajens commented Jun 4, 2021

Hi @gmbecker ,

@wwojciech has proposed a very nice new extension to the function above which in fact make it even more general so that it covers the use case for which trim_levels_in_groups was designed. (I'm posting on his behalf as to share this with you right away due to the time zone difference). So our proposal right now, is to update trim_levels_in_groups with the function below.

Here's Wojciech's new design for this function:

image

The main idea is that in the metadata map we can have n different outer variables (OV), one specific split variable (SV) and m different inner variables (IV).

The SV is conditional on all the OVs.
Each IV is conditional on all the OVs and SV.

Here's a practical example of this function in use.

map <- data.frame(
  LBCAT = c("CHEMISTRY", "CHEMISTRY", "CHEMISTRY", "IMMUNOLOGY"),
  PARAMCD = c("ALT", "CRP", "CRP", "IGA"),
  ANRIND = c("LOW", "LOW", "HIGH", "HIGH"),
  stringsAsFactors = FALSE
)

Our map has one outer variable LBCAT, the split variable is PARAMCD and we have only one inner variable ANRIND.
image

basic_table() %>%
  split_rows_by("LBCAT") %>%
  split_rows_by("PARAMCD", split_fun = trim_levels_by_map_new(map = map)) %>%
  analyze("ANRIND") %>%
  build_table(ex_adlb)

Result

             all obs
--------------------
CHEMISTRY           
  ALT               
    LOW        260  
  CRP               
    LOW        271  
    HIGH       293  
IMMUNOLOGY          
  IGA               
    HIGH       278 

Let's chat offline if you have any further questions. We can also provide any additional unit tests.

Function code:

trim_levels_in_groups_new <- function(map = NULL) {
  
  if (is.null(map) || any(sapply(map, class) != "character"))
    stop("No map dataframe was provided or not all of the columns are of type character.")
  
  myfun <- function(df, spl, vals = NULL, labels = NULL, trim = FALSE) {
    
    allvars <- colnames(map)
    splvar <- rtables:::spl_payload(spl)
    outvars <- allvars[-(which(allvars == splvar):length(allvars))]
    ## handle conditioning on outer variables (if any), 
    ## subset the map depending on where we are in the split tree
    if (outvars != "") {
      outvars_levs <- unique(df[, outvars])[ , , drop = TRUE]
      if (!is.null(dim(outvars_levs)))
        stop ("Outer variables were not specified correctly in the map.")
      outvars_filter <- paste(paste(outvars, paste0("'", outvars_levs, "'"), sep = " == "), 
                              collapse = " & ")
      map <- subset(map, subset = eval(parse(text = outvars_filter)), select = setdiff(allvars, outvars))
      map <- unique(map)
    }
    ## handle split variable
    nondup <- !duplicated(map[[splvar]])
    ksl_fun <- keep_split_levels(only = map[[splvar]][nondup], reorder = TRUE)
    ret <- ksl_fun(df, spl, vals, labels, trim = trim)
    
    ## keep non-split (inner) variables levels
    ret$datasplit <- lapply(ret$values, function(splvar_lev) {
      df3 <- ret$datasplit[[splvar_lev]]
      
      # loop through inner variables 
      for (iv in setdiff(colnames(map), splvar)) { 
        iv_lev <- df3[[iv]]
        levkeep <- as.character(na.omit(unique(map[map[, splvar] == splvar_lev, iv])))
        if (is.factor(iv_lev) && !all(levkeep %in% levels(iv_lev)))
          stop("Attempted to keep invalid factor level(s) in split ", setdiff(levkeep, levels(iv_lev)))
        
        df3 <- df3[iv_lev %in% levkeep, , drop = FALSE]
        
        if (is.factor(iv_lev))
          df3[[iv]] <- factor(as.character(df3[[iv]]), levels = levkeep)
      }
      
      df3
    })
    names(ret$datasplit) <- ret$values
    ret
  }
  
  myfun
}

@wwojciech
Copy link
Contributor

Thank you @anajens .

@anajens
Copy link
Contributor

anajens commented Jun 8, 2021

Hi @gmbecker ,

Here are the requirements for the new split function along with some examples. As we discussed, the function trim_levels_in_groups is not sufficient for our analysis needs and ideally one function that can cover the types of layouts shown below.

Requirements

  1. Due to the structure of clinical data, we can end up with rows are not possible at all when splitting across multiple variables. A new split function is needed so that tables can display rows with 0's that are theoretically possible but would be trimmed away if they don't occur in the data with split functions like drop_split_levels.

  2. It should be possible to control more than 2 levels of splits with metadata (via single map or by combining multiple maps).

  3. The metadata should control multiple analysis variables.

Example 1: Sample dataset with 1 record per subject (single analysis var)

df_1 <- data.frame(
  arm = c("a", "b", "c", "a", "b", "c"),
  dcstud = factor(c("disc", "comp", "comp", "disc", "disc", "comp"),
                  levels = c("disc", "comp")),
  discont = factor(c("dropout", "none", "none", "adverse", "moving", "none"),
                   levels = c("none", "dropout", "moving", "notime", "adverse", "death", "ill", "reaction")),
  discont_group = factor(c("other", "none", "none", "safety", "other", "none"))
)

map_1 <- data.frame(
  discstud = rep("disc", 4),
  discont_group = c("safety", "safety", "other",  "other"),
  discont = c("adverse", "death", "dropout", "moving")
)

# Full layout
basic_table() %>%
  split_rows_by("dcstud") %>%
  split_rows_by("discont_group") %>%
  analyze("discont") %>%
  build_table(df_1)

Desired output:

               all obs
----------------------
disc                  
  safety              
    adverse       1   
    death         0   
 other               
    dropout       1   
    moving        1   

Example 2: Long dataset with repeated sections (single analysis var)

df_2 <- data.frame(
  visit = factor(c(rep("W1", 4), rep("W2", 4)), levels = c("W1", "W2")),
  paramcd = factor(rep(c("ALT", "ALT", "IGA", "IGA"), 2), levels = c("ALT", "IGA")),
  avalc = factor(rep(c("A1", "A2", "I1", "I2"), 2), levels = c("A1", "A2", "A3", "I1", "I2"))
)

# Visit is not in map because we want all map sections repeated for all visits in data
map_2 <- data.frame(
  paramcd = c("ALT", "ALT", "ALT", "IGA", "IGA"),
  avalc = c("A1", "A2", "A3", "I1", "I2")
)

basic_table() %>%
  split_rows_by("visit") %>%
  split_rows_by("paramcd") %>%
  analyze("avalc") %>%
  build_table(df_2)

Desired output:

         all obs
----------------
W1              
  ALT           
    A1      1   
    A2      1   
    A3      0   
  IGA           
    I1      1   
    I2      1   
W2              
  ALT           
    A1      1   
    A2      1   
    A3      0   
  IGA           
    I1      1   
    I2      1   

Example 3: Long dataset (single analysis var)

This is the example from @wwojciech's design proposal:

df_3 <- data.frame(
  LBCAT = factor(c("CHEMISTRY", "CHEMISTRY", "CHEMISTRY", "IMMUNOLOGY"), levels = c("CHEMISTRY", "IMMUNOLOGY")),
  PARAMCD = factor(c("ALT", "CRP", "CRP", "IGA"), levels = c("ALT", "CRP", "IGA")),
  ANRIND = factor(c("LOW", "LOW", "HIGH", "HIGH"), levels = c("LOW", "HIGH"))
)

map_3 <- data.frame(
  LBCAT = c("CHEMISTRY", "CHEMISTRY", "CHEMISTRY", "IMMUNOLOGY"),
  PARAMCD = c("ALT", "CRP", "CRP", "IGA"),
  ANRIND = c("LOW", "LOW", "HIGH", "HIGH")
)

basic_table() %>%
  split_rows_by("LBCAT") %>%
  split_rows_by("PARAMCD") %>%
  analyze("ANRIND") %>%
  build_table(df_3)

Desired table:

             all obs
--------------------
CHEMISTRY           
  ALT               
    LOW        1  
  CRP               
    HIGH       1  
    LOW        1  
 MMUNOLOGY          
  IGA               
    HIGH       1  

Example 4: Long dataset (multiple analysis vars)

df_4 <- data.frame(
  LBCAT = factor(c("CHEMISTRY", "CHEMISTRY", "CHEMISTRY", "IMMUNOLOGY"), levels = c("CHEMISTRY", "IMMUNOLOGY")),
  PARAMCD = factor(c("ALT", "CRP", "CRP", "IGA"), levels = c("ALT", "CRP", "IGA")),
  ANRIND = factor(c("LOW", "LOW", "HIGH", "HIGH"), levels = c("LOW", "HIGH")),
  BNRIND = factor(c("LOW", "LOW", "HIGH", "HIGH"), levels = c("LOW", "HIGH"))
)

map_4 <- data.frame(
  LBCAT = c("CHEMISTRY", "CHEMISTRY", "CHEMISTRY", "IMMUNOLOGY"),
  PARAMCD = c("ALT", "CRP", "CRP", "IGA"),
  ANRIND = c("LOW", "LOW", "HIGH", "HIGH"),
  BNRIND = c("LOW", "LOW", "HIGH", "HIGH")
)

basic_table() %>%
  split_rows_by("LBCAT") %>%
  split_rows_by("PARAMCD") %>%
  analyze(c("ANRIND", "BNRIND")) %>%
  build_table(df_4)

Desired output:

             all obs
--------------------
CHEMISTRY           
  ALT               
    ANRIND          
      LOW       1   
    BNRIND          
      LOW       1   
  CRP               
    ANRIND          
      LOW       1   
      HIGH      1   
    BNRIND          
      LOW       1   
      HIGH      1   
IMMUNOLOGY          
  IGA               
    ANRIND          
      HIGH      1   
    BNRIND          
      HIGH      1  

@wwojciech
Copy link
Contributor

wwojciech commented Jun 8, 2021

Hi @anajens

I think, we could also add LBT09 here as this is a bit different from all of the examples, above. I can add it once I am back in the office.
Just as a side note: in my prototype implementation of trim_levels_in_groups_new custom split function, I assumed the map data.frame object is based on elements of type character, not factor. So, just for the consistency, it would be good to add stringsAsFactors = TRUE to data.frame() whenever the map is created in the above examples.

Thank you!

@gmbecker
Copy link
Collaborator

@wwojciech @anajens

I've just added (in gabe_tabletree_work only for now) experimental support for both analysis/content functions and split functions to accept a new .prev_splvals argument, which in the former case replaces support for receiving.parent_splval. .prev_splval will be populated with a named list with the names corresponding to the splits and the values corresponding to the specific child/group for all all ancestor splits to the current location.

This will allow us to write a much safer and less complex version(s) of the multi-level split function in plenty of time for the next release.

Ugly "proof of life":

library(rtables)
Loading required package: magrittr
> 
> splfun <- function(df, spl, vals= NULL, labels = NULL, trim = FALSE, .prev_splvals) {
+     print(.prev_splvals)
+     spl@split_fun <- NULL
+     rtables:::do_split(df = df, spl = spl, vals = vals, labels = labels, trim = trim, prev_splvals = prev_splvals)
+ }
> 
> 
> lyt_ov <- basic_table() %>%
+   split_cols_by("ARM", split_fun = splfun) %>%
+   add_colcounts() %>%
+     add_overall_col("ALL") %>%
+     split_rows_by("STRATA1", split_fun = splfun) %>%
+     split_rows_by("RACE", split_fun = splfun) %>%
+     split_rows_by("SEX", split_fun = splfun) %>%
+     analyze(c("AGE"))
> 
> result_overall <- build_table(lyt_ov, ex_adsl)

named list()
named list()
NULL
NULL
STRATA1 
    "A" 
STRATA1 
    "A" 
STRATA1    RACE 
    "A" "ASIAN" 
STRATA1    RACE 
    "A" "ASIAN" 
                    STRATA1                        RACE 
                        "A" "BLACK OR AFRICAN AMERICAN" 
                    STRATA1                        RACE 
                        "A" "BLACK OR AFRICAN AMERICAN" 
STRATA1    RACE 
    "A" "WHITE" 
STRATA1    RACE 
    "A" "WHITE" 
                           STRATA1                               RACE 
                               "A" "AMERICAN INDIAN OR ALASKA NATIVE" 
                           STRATA1                               RACE 
                               "A" "AMERICAN INDIAN OR ALASKA NATIVE" 
   STRATA1       RACE 
       "A" "MULTIPLE" 
   STRATA1       RACE 
       "A" "MULTIPLE" 
                                    STRATA1 
                                        "A" 
                                       RACE 
"NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER" 
                                    STRATA1 
                                        "A" 
                                       RACE 
"NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER" 
STRATA1    RACE 
    "A" "OTHER" 
STRATA1    RACE 
    "A" "OTHER" 
  STRATA1      RACE 
      "A" "UNKNOWN" 
  STRATA1      RACE 
      "A" "UNKNOWN" 
STRATA1 
    "B" 
STRATA1 
    "B" 
STRATA1    RACE 
    "B" "ASIAN" 
STRATA1    RACE 
    "B" "ASIAN" 
                    STRATA1                        RACE 
                        "B" "BLACK OR AFRICAN AMERICAN" 
                    STRATA1                        RACE 
                        "B" "BLACK OR AFRICAN AMERICAN" 
STRATA1    RACE 
    "B" "WHITE" 
STRATA1    RACE 
    "B" "WHITE" 
                           STRATA1                               RACE 
                               "B" "AMERICAN INDIAN OR ALASKA NATIVE" 
                           STRATA1                               RACE 
                               "B" "AMERICAN INDIAN OR ALASKA NATIVE" 
   STRATA1       RACE 
       "B" "MULTIPLE" 
   STRATA1       RACE 
       "B" "MULTIPLE" 
                                    STRATA1 
                                        "B" 
                                       RACE 
"NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER" 
                                    STRATA1 
                                        "B" 
                                       RACE 
"NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER" 
STRATA1    RACE 
    "B" "OTHER" 
STRATA1    RACE 
    "B" "OTHER" 
  STRATA1      RACE 
      "B" "UNKNOWN" 
  STRATA1      RACE 
      "B" "UNKNOWN" 
STRATA1 
    "C" 
STRATA1 
    "C" 
STRATA1    RACE 
    "C" "ASIAN" 
STRATA1    RACE 
    "C" "ASIAN" 
                    STRATA1                        RACE 
                        "C" "BLACK OR AFRICAN AMERICAN" 
                    STRATA1                        RACE 
                        "C" "BLACK OR AFRICAN AMERICAN" 
STRATA1    RACE 
    "C" "WHITE" 
STRATA1    RACE 
    "C" "WHITE" 
                           STRATA1                               RACE 
                               "C" "AMERICAN INDIAN OR ALASKA NATIVE" 
                           STRATA1                               RACE 
                               "C" "AMERICAN INDIAN OR ALASKA NATIVE" 
   STRATA1       RACE 
       "C" "MULTIPLE" 
   STRATA1       RACE 
       "C" "MULTIPLE" 
                                    STRATA1 
                                        "C" 
                                       RACE 
"NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER" 
                                    STRATA1 
                                        "C" 
                                       RACE 
"NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER" 
STRATA1    RACE 
    "C" "OTHER" 
STRATA1    RACE 
    "C" "OTHER" 
  STRATA1      RACE 
      "C" "UNKNOWN" 
  STRATA1      RACE 
      "C" "UNKNOWN" 
> 
> 
> sessionInfo()

R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rtables_0.3.7.0009 magrittr_2.0.1    

loaded via a namespace (and not attached):
[1] compiler_4.1.0    htmltools_0.5.1.1 digest_0.6.27     rlang_0.4.11     

@wwojciech
Copy link
Contributor

wwojciech commented Jul 9, 2021

Hi @gmbecker

Thank you for sharing this information. I would like to make sure I correctly understand your last post. I conclude from it that with this new update, we can use the new parameter .prev_splvals in our custom split function. Should we then update our proposals of the new split functions to this change? Is function I wrote still considered by you or you think about some other, strategic approach which would allow the user of rtables for the conditioning described in above examples? I am asking about it, since I understood from @anajens that the you see the option I proposed as too much risky.

Thank you

@gmbecker
Copy link
Collaborator

gmbecker commented Jul 9, 2021

Hi @wwojciech the plan is for us to refactor your function to use .prev_splvals, which yes, will be available for use in split functions, and then after testing, put it in as one of the provided split function factories. I will be getting back to work on all this stuff starting next week.

@gmbecker
Copy link
Collaborator

@wwojciech @anajens I have added trim_levels_to_map as an exported split function constructor, with unit tests based on the original example.

One thing to note is that tabulation will fail (with an informative message) if you get into a situation where there are no valid children according to the map. In practice this means that either:

a) all upstream combinations of levels need to have at least one value in the map at the level you're splitting on, or
b) upstream splits need to also be trimmed, e.g, by the same map.

We may someday be able to declare a "global splitting map" but that is not the case now, you'll need to call trim_levels_to_map at each level if you want some higher level terms to not be supported at all.

It is supported in both row-splitting and column splitting contexts. Please give it a try in gabe_tabletree_work when you get a chance and we'll merge it into main soon barring any problems.

May - Jul 2021 automation moved this from In Progress to Done Aug 19, 2021
@anajens
Copy link
Contributor

anajens commented Aug 19, 2021

Thank you very much @gmbecker! Looking forward to trying it out and will have some feedback for you next week.

gmbecker added a commit that referenced this issue Aug 27, 2021
* Exp allow a/cfuns + splfuns to accept .prev_splvals arg. #203 dev vbump

* Fix bug where names weren't showing up for .prev_splvals. #203 dev vbump

* Fix off-by-one error in pagination, sep in txt export.  Fixes #213

* add experimental fnotes_at_path function. Needs tests. #219. vbump

* Exp allow a/cfuns + splfuns to accept .prev_splvals arg. #203 dev vbump

* Fix bug where names weren't showing up for .prev_splvals. #203 dev vbump

* Fix off-by-one error in pagination, sep in txt export.  Fixes #213

* add experimental fnotes_at_path function. Needs tests. #219. vbump

* Run GH actions for all branches

* Working fntes_at_path with tests. Closes #219. dev vbump

* col ref footnote support. related to #219. Closes #187. dev vbump

* Support and tests for trim_levels_to_map. closes #203. Devel vbump.

* cell_values and value_at methods for Row objects. closes #210. dev vbump

* Trim outer levels to trim_levels_in_groups by deflt. #236 dev vbump

* Cleanup, additional tests, and fix bugs uncovered by new tests.

* Add NEWS entries, prepare for merge into main

Co-authored-by: dinakar29 <26552821+dinakar29@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

4 participants