You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to summarize(/mutate) in dplyr by the count of non-NAs in each row... keeps giving wrong answer.
Arithmetic on booleans like sum(FALSE + TRUE + FALSE + TRUE + TRUE) does indeed add up to 3, so where is the problem? And why does dplyr not catch the error?
N = 9
set.seed(1234)
df <- data.frame(id=c(1,1,1,2,2,2,3,3,3), date=c('2005','2006','2007'),
Field1 = ifelse(runif(N)>.5, runif(N, 5,30), NA),
Field2 = ifelse(runif(N)>.5, runif(N, 4,22), NA),
Field3 = ifelse(runif(N)>.5, runif(N, 7,18), NA),
Field4 = ifelse(runif(N)>.5, runif(N, 9,25), NA),
Field5 = ifelse(runif(N)>.5, runif(N, 3,30), NA) )
# > df
# id date Field1 Field2 Field3 Field4 Field5
#1 1 2005 NA NA NA NA NA
#2 1 2006 22.33978 NA NA 12.824412 6.850614
#3 1 2007 18.62437 NA 12.334904 NA NA
#4 2 2005 12.06834 NA 9.683217 13.929516 8.296716
#5 2 2006 28.08584 NA 15.420058 NA NA
#6 2 2007 12.30790 NA 7.811579 9.826346 NA
#7 3 2005 NA NA NA 18.033117 NA
#8 3 2006 NA 7.259732 14.889989 NA 7.320774
#9 3 2007 11.67052 17.674071 NA NA 27.197018
# Trying to summarize by the count of non-NAs in each row...!
df %.% regroup(list(quote(id),quote(date))) %.%
summarize(nna_count = sum(!is.na(Field1) + !is.na(Field2) + !is.na(Field3) + !is.na(Field4) + !is.na(Field5)))
# TOTALLY WRONG?!
# Source: local data frame [9 x 3]
# Groups: id
#
# id date nna_count
#1 1 2005 0
#2 1 2006 1
#3 1 2007 1
#4 2 2005 1
#5 2 2006 1
#6 2 2007 1
#7 3 2005 0
#8 3 2006 0
#9 3 2007 0
By debugging with a Gray-code, I see all the !is.na()s acting weird except for Field1:
The text was updated successfully, but these errors were encountered:
smcinerney
changed the title
summarize/mutate get operator precedence wrong with sum(!is.na(Field1) + !is.na(Field2) ...
summarize/mutate get operator precedence of + over ! wrong
Apr 11, 2014
summarize/mutate gets the precedence of + over ! wrong
I'm trying to summarize(/mutate) in dplyr by the count of non-NAs in each row... keeps giving wrong answer.
Arithmetic on booleans like
sum(FALSE + TRUE + FALSE + TRUE + TRUE)
does indeed add up to 3, so where is the problem? And why does dplyr not catch the error?By debugging with a Gray-code, I see all the
!is.na()
s acting weird except for Field1:only ever gives 16 or 0
The text was updated successfully, but these errors were encountered: