rowwise() giving incorrect result in some situations #1448

rmscriven · 2015-10-12T22:51:46Z

Hi guys. Granted that this is not generally a row-wise operation, I still think it would be good to bring to your attention. The issue came about on Stack Overflow, where an error was discovered in the resulting data frame.

http://stackoverflow.com/questions/33090745/dplyrrowwise-mutate-and-na-error

A minimal example is as follows

data.frame(k = c(-1, 1, 1)) %>% 
    rowwise() %>% 
    mutate(l = ifelse(k > 0, 1, NA))
Source: local data frame [3 x 2]
Groups: <by row>

      k     l
  (dbl) (dbl)
1    -1    NA
2     1     1
3     1    NA

We believe that row 3, column l should be 1, not NA as shown.

If you run the following code a few separate times, you will find that the incident above only occurs intermittently.

data.frame(k = rnorm(10)) %>% 
    rowwise() %>% 
    mutate(l = ifelse(k > 0, 1L, NA_integer_))

The text was updated successfully, but these errors were encountered:

jeremycg · 2015-10-13T17:06:24Z

A similar, probably related bug : https://stackoverflow.com/questions/33107956/dplyrmutate-gives-x-y-na-summarise-gives-x-y-real-number

Pass <- data.frame(P2 = c(0,3,2), F2 = c(0,2,0), id = 1:3)
#these two both fail
Pass %>% group_by(id) %>% mutate(pass2 = P2/(P2 + F2))
Pass %>% rowwise %>% mutate(pass2 = P2/(P2 + F2))

Both give an NA in the last row of pass2:

Source: local data frame [3 x 4]
Groups: <by row>

     P2    F2    id pass2
  (dbl) (dbl) (int) (dbl)
1     0     0     1    NA
2     3     2     2   0.6
3     2     0     3    NA

Whereas without rowwise or grouping, it works as expected:

Pass %>% mutate(pass2 = P2/(P2 + F2))
  P2 F2 id pass2
1  0  0  1   NaN
2  3  2  2   0.6
3  2  0  3   1.0

oppemaniac · 2015-10-15T21:08:50Z

I had the same issue in an unbalanced dataset, where I needed grouping! And it was also only the third group_id that had NA's (as many as rows this id had)! Using

pass %>% group_by(id) %>% plyr::mutate(pass2 = P2/(P2 + F2))

works!

See my answer in the discussion on stackoverflow above. But it seems strange that it is always the third group, where NA's appear:


> Pass <- structure(list(P1 = c(2L, 0L, 10L,8L, 9L), 
+ F1 = c(0L, 2L, 0L, 4L,3L), 
+ P2 = c(0L, 3L, 2L, 2L, 2L), 
+ F2 = c(0L, 2L, 0L, 1L,1L), 
+ id = c(1,2,4,4,5)), 
+ .Names = c("P1", "F1", "P2", "F2", "id"), 
+ class = c("tbl_df", "data.frame"), 
+ row.names = c(NA, -5L))
> Pass %>%
+   group_by(id) %>%
+     dplyr::mutate(pass_rate = (P1 + P2) / (P1 + P2 + F1 + F2) * 100,
+            pass_rate1 = P1 / (P1 + F1) * 100,
+            pass_rate2 = P2 / (P2 + F2) * 100)
Source: local data frame [5 x 8]
Groups: id [4]
     P1    F1    P2    F2    id pass_rate pass_rate1 pass_rate2
  (int) (int) (int) (int) (dbl)     (dbl)      (dbl)      (dbl)
1     2     0     0     0     1 100.00000  100.00000         NA
2     0     2     3     2     2  42.85714    0.00000   60.00000
3    10     0     2     0     4 100.00000  100.00000         NA
4     8     4     2     1     4  66.66667   66.66667         NA
5     9     3     2     1     5  73.33333   75.00000   66.66667
> Pass %>%
+   group_by(id) %>%
+     plyr::mutate(pass_rate = (P1 + P2) / (P1 + P2 + F1 + F2) * 100,
+            pass_rate1 = P1 / (P1 + F1) * 100,
+            pass_rate2 = P2 / (P2 + F2) * 100)
Source: local data frame [5 x 8]
Groups: id [4]
     P1    F1    P2    F2    id pass_rate pass_rate1 pass_rate2
  (int) (int) (int) (int) (dbl)     (dbl)      (dbl)      (dbl)
1     2     0     0     0     1 100.00000  100.00000        NaN
2     0     2     3     2     2  42.85714    0.00000   60.00000
3    10     0     2     0     4 100.00000  100.00000  100.00000
4     8     4     2     1     4  66.66667   66.66667   66.66667
5     9     3     2     1     5  73.33333   75.00000   66.66667

Look's like a little bug...

hadley · 2015-10-21T22:01:42Z

I suspect these are three separate bugs.

romainfrancois · 2015-10-29T12:04:03Z

The first problem has been otherwise taken care of, but I've added a regression test anyway. We are now consistently getting:

> data.frame(k = c(-1, 1, 1)) %>%
+     rowwise() %>%
+     mutate(l = ifelse(k > 0, 1, NA))
Source: local data frame [3 x 2]
Groups: <by row>

      k     l
  (dbl) (dbl)
1    -1    NA
2     1     1
3     1     1

romainfrancois · 2015-10-29T12:08:38Z

For the second problem, we get NA instead of NaN:

> Pass %>% group_by(id) %>% mutate(pass2 = P2/(P2 + F2))
Source: local data frame [3 x 4]
Groups: id [3]

     P2    F2    id pass2
  (dbl) (dbl) (int) (dbl)
1     0     0     1    NA
2     3     2     2   0.6
3     2     0     3   1.0
> 0/ (0+0)
[1] NaN

I think I know what this is about.

romainfrancois · 2015-10-29T12:24:02Z

Yep. This was because Rcpp's is_na also considers NaN to be NA for some reason.

> cppFunction("LogicalVector test( NumericVector x){ return is_na(x); }")
> test( c(NA, NaN, 1.0) )
[1]  TRUE  TRUE FALSE

ping @kevinushey

…yverse#1448

hadley added bug an unexpected problem or unintended behavior data frame labels Oct 21, 2015

hadley added this to the 0.5 milestone Oct 21, 2015

hadley assigned romainfrancois Oct 21, 2015

romainfrancois added a commit that referenced this issue Oct 29, 2015

added regression test for rowwise mutate and the special NA case. #1448

9c8e475

romainfrancois added a commit that referenced this issue Oct 29, 2015

more regression test for #1448

83f1715

romainfrancois added a commit that referenced this issue Oct 29, 2015

more regression test for #1448

0727ac3

romainfrancois added a commit that referenced this issue Oct 29, 2015

grouped and rowwise mutate disabiguates NA and NaN. #1448

c583127

romainfrancois added a commit that referenced this issue Oct 30, 2015

additional test for #1448

abfb411

romainfrancois closed this as completed Oct 30, 2015

krlmlr pushed a commit to krlmlr/dplyr that referenced this issue Mar 2, 2016

added regression test for rowwise mutate and the special NA case. tid…

4887e4d

…yverse#1448

lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rowwise() giving incorrect result in some situations #1448

rowwise() giving incorrect result in some situations #1448

rmscriven commented Oct 12, 2015

jeremycg commented Oct 13, 2015

oppemaniac commented Oct 15, 2015

hadley commented Oct 21, 2015

romainfrancois commented Oct 29, 2015

romainfrancois commented Oct 29, 2015

romainfrancois commented Oct 29, 2015

rowwise() giving incorrect result in some situations #1448

rowwise() giving incorrect result in some situations #1448

Comments

rmscriven commented Oct 12, 2015

jeremycg commented Oct 13, 2015

oppemaniac commented Oct 15, 2015

hadley commented Oct 21, 2015

romainfrancois commented Oct 29, 2015

romainfrancois commented Oct 29, 2015

romainfrancois commented Oct 29, 2015