Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rowwise() giving incorrect result in some situations #1448

Closed
rmscriven opened this issue Oct 12, 2015 · 6 comments
Closed

rowwise() giving incorrect result in some situations #1448

rmscriven opened this issue Oct 12, 2015 · 6 comments
Assignees
Labels
bug an unexpected problem or unintended behavior
Milestone

Comments

@rmscriven
Copy link

Hi guys. Granted that this is not generally a row-wise operation, I still think it would be good to bring to your attention. The issue came about on Stack Overflow, where an error was discovered in the resulting data frame.

http://stackoverflow.com/questions/33090745/dplyrrowwise-mutate-and-na-error

A minimal example is as follows

data.frame(k = c(-1, 1, 1)) %>% 
    rowwise() %>% 
    mutate(l = ifelse(k > 0, 1, NA))
Source: local data frame [3 x 2]
Groups: <by row>

      k     l
  (dbl) (dbl)
1    -1    NA
2     1     1
3     1    NA

We believe that row 3, column l should be 1, not NA as shown.

If you run the following code a few separate times, you will find that the incident above only occurs intermittently.

data.frame(k = rnorm(10)) %>% 
    rowwise() %>% 
    mutate(l = ifelse(k > 0, 1L, NA_integer_))
@jeremycg
Copy link

A similar, probably related bug : https://stackoverflow.com/questions/33107956/dplyrmutate-gives-x-y-na-summarise-gives-x-y-real-number

Pass <- data.frame(P2 = c(0,3,2), F2 = c(0,2,0), id = 1:3)
#these two both fail
Pass %>% group_by(id) %>% mutate(pass2 = P2/(P2 + F2))
Pass %>% rowwise %>% mutate(pass2 = P2/(P2 + F2))

Both give an NA in the last row of pass2:

Source: local data frame [3 x 4]
Groups: <by row>

     P2    F2    id pass2
  (dbl) (dbl) (int) (dbl)
1     0     0     1    NA
2     3     2     2   0.6
3     2     0     3    NA

Whereas without rowwise or grouping, it works as expected:

Pass %>% mutate(pass2 = P2/(P2 + F2))
  P2 F2 id pass2
1  0  0  1   NaN
2  3  2  2   0.6
3  2  0  3   1.0

@oppemaniac
Copy link

I had the same issue in an unbalanced dataset, where I needed grouping! And it was also only the third group_id that had NA's (as many as rows this id had)! Using

pass %>% group_by(id) %>% plyr::mutate(pass2 = P2/(P2 + F2))
works!

See my answer in the discussion on stackoverflow above. But it seems strange that it is always the third group, where NA's appear:


> Pass <- structure(list(P1 = c(2L, 0L, 10L,8L, 9L), 
+ F1 = c(0L, 2L, 0L, 4L,3L), 
+ P2 = c(0L, 3L, 2L, 2L, 2L), 
+ F2 = c(0L, 2L, 0L, 1L,1L), 
+ id = c(1,2,4,4,5)), 
+ .Names = c("P1", "F1", "P2", "F2", "id"), 
+ class = c("tbl_df", "data.frame"), 
+ row.names = c(NA, -5L))
> Pass %>%
+   group_by(id) %>%
+     dplyr::mutate(pass_rate = (P1 + P2) / (P1 + P2 + F1 + F2) * 100,
+            pass_rate1 = P1 / (P1 + F1) * 100,
+            pass_rate2 = P2 / (P2 + F2) * 100)
Source: local data frame [5 x 8]
Groups: id [4]
     P1    F1    P2    F2    id pass_rate pass_rate1 pass_rate2
  (int) (int) (int) (int) (dbl)     (dbl)      (dbl)      (dbl)
1     2     0     0     0     1 100.00000  100.00000         NA
2     0     2     3     2     2  42.85714    0.00000   60.00000
3    10     0     2     0     4 100.00000  100.00000         NA
4     8     4     2     1     4  66.66667   66.66667         NA
5     9     3     2     1     5  73.33333   75.00000   66.66667
> Pass %>%
+   group_by(id) %>%
+     plyr::mutate(pass_rate = (P1 + P2) / (P1 + P2 + F1 + F2) * 100,
+            pass_rate1 = P1 / (P1 + F1) * 100,
+            pass_rate2 = P2 / (P2 + F2) * 100)
Source: local data frame [5 x 8]
Groups: id [4]
     P1    F1    P2    F2    id pass_rate pass_rate1 pass_rate2
  (int) (int) (int) (int) (dbl)     (dbl)      (dbl)      (dbl)
1     2     0     0     0     1 100.00000  100.00000        NaN
2     0     2     3     2     2  42.85714    0.00000   60.00000
3    10     0     2     0     4 100.00000  100.00000  100.00000
4     8     4     2     1     4  66.66667   66.66667   66.66667
5     9     3     2     1     5  73.33333   75.00000   66.66667

Look's like a little bug...

@hadley
Copy link
Member

hadley commented Oct 21, 2015

I suspect these are three separate bugs.

@hadley hadley added bug an unexpected problem or unintended behavior data frame labels Oct 21, 2015
@hadley hadley added this to the 0.5 milestone Oct 21, 2015
@romainfrancois
Copy link
Member

The first problem has been otherwise taken care of, but I've added a regression test anyway. We are now consistently getting:

> data.frame(k = c(-1, 1, 1)) %>%
+     rowwise() %>%
+     mutate(l = ifelse(k > 0, 1, NA))
Source: local data frame [3 x 2]
Groups: <by row>

      k     l
  (dbl) (dbl)
1    -1    NA
2     1     1
3     1     1

@romainfrancois
Copy link
Member

For the second problem, we get NA instead of NaN:

> Pass %>% group_by(id) %>% mutate(pass2 = P2/(P2 + F2))
Source: local data frame [3 x 4]
Groups: id [3]

     P2    F2    id pass2
  (dbl) (dbl) (int) (dbl)
1     0     0     1    NA
2     3     2     2   0.6
3     2     0     3   1.0
> 0/ (0+0)
[1] NaN

I think I know what this is about.

@romainfrancois
Copy link
Member

Yep. This was because Rcpp's is_na also considers NaN to be NA for some reason.

> cppFunction("LogicalVector test( NumericVector x){ return is_na(x); }")
> test( c(NA, NaN, 1.0) )
[1]  TRUE  TRUE FALSE

ping @kevinushey

romainfrancois added a commit that referenced this issue Oct 30, 2015
@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

5 participants