Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

summarize/mutate get operator precedence of + over ! wrong #381

Closed
smcinerney opened this issue Apr 11, 2014 · 1 comment
Closed

summarize/mutate get operator precedence of + over ! wrong #381

smcinerney opened this issue Apr 11, 2014 · 1 comment

Comments

@smcinerney
Copy link

summarize/mutate gets the precedence of + over ! wrong

WRONG: summarize(nna_count = sum(!is.na(Field1) + !is.na(Field2) + !is.na(Field3) + !is.na(Field4) + !is.na(Field5)))

# WORKAROUND1: override precedence of +,!
df %.% group_by(id,date) %.%
 mutate(nna_count =
   (!is.na(Field1)) + (!is.na(Field2)) + (!is.na(Field3)) + (!is.na(Field4)) + (!is.na(Field5)) 
 ) 

# WORKAROUND2: pass one single vectorized expression
df %.% regroup(list(quote(id),quote(date))) %.%
+   summarize(nna_count = sum(!is.na(c(Field1,Field2,Field3,Field4,Field5))))

I'm trying to summarize(/mutate) in dplyr by the count of non-NAs in each row... keeps giving wrong answer.

Arithmetic on booleans like sum(FALSE + TRUE + FALSE + TRUE + TRUE) does indeed add up to 3, so where is the problem? And why does dplyr not catch the error?

N = 9
set.seed(1234)
df <- data.frame(id=c(1,1,1,2,2,2,3,3,3), date=c('2005','2006','2007'),
                 Field1 = ifelse(runif(N)>.5, runif(N, 5,30), NA),
                 Field2 = ifelse(runif(N)>.5, runif(N, 4,22), NA),
                 Field3 = ifelse(runif(N)>.5, runif(N, 7,18), NA),
                 Field4 = ifelse(runif(N)>.5, runif(N, 9,25), NA),
                 Field5 = ifelse(runif(N)>.5, runif(N, 3,30), NA) )

# > df
# id date   Field1    Field2    Field3    Field4    Field5
#1  1 2005       NA        NA        NA        NA        NA
#2  1 2006 22.33978        NA        NA 12.824412  6.850614
#3  1 2007 18.62437        NA 12.334904        NA        NA
#4  2 2005 12.06834        NA  9.683217 13.929516  8.296716
#5  2 2006 28.08584        NA 15.420058        NA        NA
#6  2 2007 12.30790        NA  7.811579  9.826346        NA
#7  3 2005       NA        NA        NA 18.033117        NA
#8  3 2006       NA  7.259732 14.889989        NA  7.320774
#9  3 2007 11.67052 17.674071        NA        NA 27.197018


# Trying to summarize by the count of non-NAs in each row...!
df %.% regroup(list(quote(id),quote(date))) %.%
    summarize(nna_count = sum(!is.na(Field1) + !is.na(Field2) + !is.na(Field3) + !is.na(Field4) + !is.na(Field5)))

# TOTALLY WRONG?!

# Source: local data frame [9 x 3]
# Groups: id
# 
# id date nna_count
#1  1 2005        0
#2  1 2006        1
#3  1 2007        1
#4  2 2005        1
#5  2 2006        1
#6  2 2007        1
#7  3 2005        0
#8  3 2006        0
#9  3 2007        0

By debugging with a Gray-code, I see all the !is.na()s acting weird except for Field1:

mutate(na_count = sum(16*!is.na(Field1) + 8*!is.na(Field2) + 4*!is.na(Field3) + 2*!is.na(Field4) + !is.na(Field5))) 

only ever gives 16 or 0

@smcinerney smcinerney changed the title summarize/mutate get operator precedence wrong with sum(!is.na(Field1) + !is.na(Field2) ... summarize/mutate get operator precedence of + over ! wrong Apr 11, 2014
@smcinerney
Copy link
Author

Apparently this is a generic R bug, not dplyr:
http://stackoverflow.com/questions/17651687/behavior-of-summing-is-na-results

@hadley hadley closed this as completed Apr 11, 2014
@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants