Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change in behavior using summarise with max(. ,na.rm=TRUE) from 0.4.2 -> 0.4.3 #1455

Closed
felasa opened this issue Oct 15, 2015 · 2 comments
Closed

Comments

@felasa
Copy link

felasa commented Oct 15, 2015

Example:

toy data

library(dplyr)

foo <- 
  data.frame(id = c(1,1,2,2,3,3), 
             date_of_x = as.Date(c("2015-01-01", "2015-01-02", NA, "2015-02-02",
                           NA,NA)),
             type = c("A","B","A","B","A","B")) %>% tbl_df

foo
Source: local data frame [6 x 3]

     id  date_of_x   type
  (dbl)     (date) (fctr)
1     1 2015-01-01      A
2     1 2015-01-02      B
3     2       <NA>      A
4     2 2015-02-02      B
5     3       <NA>      A
6     3       <NA>      B

0.4.2

foo %>% group_by(id) %>% 
  summarise(most_recent = max(date_of_x, na.rm=TRUE),
            confirmed = !is.na(most_recent))

Source: local data frame [3 x 3]

  id most_recent confirmed
1  1  2015-01-02      TRUE
2  2  2015-02-02      TRUE
3  3        <NA>     FALSE

in 0.4.3:

foo %>% group_by(id) %>% 
  summarise(most_recent = max(date_of_x, na.rm=TRUE),
            confirmed = !is.na(most_recent))

Source: local data frame [3 x 3]

    id most_recent confirmed
(dbl)      (date)     (lgl)
1     1  2015-01-02      TRUE
2     2  2015-02-02      TRUE
3     3        <NA>      TRUE
Warning message:
In max.default(c(NA_real_, NA_real_), na.rm = TRUE) :
 no non-missing arguments to max; returning -Inf

in 0.4.3, it returns a value of class Inf and in 0.4.2 one of class NA

Change is consistent with the behavior of max I guess but it broke some workflows (maybe just mine but I am unsure if i could just replace is.na with is.infinite and fix everything).

May be working as it should, although im confused why it changed at all (i.e. why the previous one worked).

@felasa felasa changed the title Change in behavior using mutate with max(. ,na.rm=TRUE) from 0.4.2 -> 0.4.3 Change in behavior using summarise with max(. ,na.rm=TRUE) from 0.4.2 -> 0.4.3 Oct 15, 2015
@hadley
Copy link
Member

hadley commented Oct 21, 2015

This seems like a bug in max.Date() in R. I'll post to R-devel

@hadley hadley closed this as completed Oct 21, 2015
@felasa
Copy link
Author

felasa commented Nov 5, 2015

However bizarre the base R methods handling dates, this doesn't account for the difference in outputs with dplyr versions (since nothing changed in base R in bewteen).

The issue I was pointing out is the difference in the 'confirmed' columns from the examples above.

(having some issues readapting code for .3, so i'm venting here)

@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants