Patch for bug report 17770 #139

paocorrales · 2023-08-31T13:12:06Z

The initial report was addressed by Martin Maechler. The following is related to comment 3.

I went through the code for xtabs to understand the behavior noted by Thomas Soeiro and I believe there is no bug in the code but the documentation could include some clarification to cover this.

When the example is executed:

x <- data.frame(A = c("Y", "Y", "Z", "Z"),
                B = c(NA, TRUE, FALSE, TRUE),
                C = c(TRUE, TRUE, NA, FALSE))

xtabs(formula = cbind(B, C) ~ A,
      data = x,
      na.action = na.omit)

what enters to the model.frame() function inside xtabs() is

stats::model.frame(formula = cbind(B, C) ~ A, data = x, na.action = na.omit)

with data being

     A    B     C
1    Y   NA  TRUE
2    Y TRUE  TRUE
3    Z FALSE    NA
4    Z TRUE FALSE

na.omit will remove all the lines containing an NA, i.e. all combinations of A-B and A-C in a row, resulting in the output shown in comment 3:

> na.omit(data)
  A    B     C
2 Y TRUE  TRUE
4 Z TRUE FALSE

To avoid this, a user should not use cbind(B, C). Instead something like this:

long_df <- tidyr::pivot_longer(x, cols = B:C)

xtabs(value ~ A + name, data = long_df)

By doing this, the table that goes into model.frame() and gets na.omited is

data
# A tibble: 8 × 3
  A     name  value
  <chr> <chr> <lgl>
1 Y     B     NA   
2 Y     C     TRUE 
3 Y     B     TRUE 
4 Y     C     TRUE 
5 Z     B     FALSE
6 Z     C     NA   
7 Z     B     TRUE 
8 Z     C     FALSE

and then

  A     name  value
  <chr> <chr> <lgl>
1 Y     C     TRUE 
2 Y     B     TRUE 
3 Y     C     TRUE 
4 Z     B     FALSE
5 Z     B     TRUE 
6 Z     C     FALSE

And only the combination of A-B or A-C that has NA is filtered.

In summary, na.action is called over data (the original data frame) instead of the result of model.frame(). This argument also has an impact on how NAs are treated inside sum(), if na.action = na.pass, then na.rm = FALSE inside sum(), otherways will be TRUE.

I propose a patch to include a sentence in the details section to make this behavior more clear.

Argument section

  \item{na.action}{a \code{\link{function}} which indicates what should happen when
    \code{data} contain \code{\link{NA}}s.  If unspecified, and
    \code{addNA} is true, this is set to \code{\link{na.pass}}.  \code{na.action} also has an impact on how NAs are treated inside `sum()`.  If `na.action = na.pass` and \code{formula} has a left hand side (with counts), \code{\link{sum}(*), if it set to \code{NULL} it will use \code{getOption("na.action", default = na.omit)}, otherwise it will use \code{\link{sum}(*, na.rm = TRUE).}

Description

Also note that `na.action `is called over `data ` and this may result in the loss of counts as complete rows are omitted if there is an \code{NA} present in any collum.

src/library/stats/man/xtabs.Rd

Patch for bug report 17770

c95cc26

paocorrales commented Sep 1, 2023

View reviewed changes

src/library/stats/man/xtabs.Rd Outdated Show resolved Hide resolved

src/library/stats/man/xtabs.Rd Outdated Show resolved Hide resolved

src/library/stats/man/xtabs.Rd Outdated Show resolved Hide resolved

paocorrales added 4 commits September 1, 2023 08:47

Update src/library/stats/man/xtabs.Rd

579711e

Update src/library/stats/man/xtabs.Rd

6efd6b2

Update src/library/stats/man/xtabs.Rd

3d8920e

New change to xtabs() doc for bug 17770

fa9f37d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch for bug report 17770 #139

Patch for bug report 17770 #139

paocorrales commented Aug 31, 2023 •

edited

Patch for bug report 17770 #139

Are you sure you want to change the base?

Patch for bug report 17770 #139

Conversation

paocorrales commented Aug 31, 2023 • edited

paocorrales commented Aug 31, 2023 •

edited