Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group_by: equal R float values not grouped together (machine dependent) #482

Closed
sebschub opened this issue Jul 4, 2014 · 1 comment
Closed

Comments

@sebschub
Copy link

sebschub commented Jul 4, 2014

This report is based on a discussion on so.
Here the minimal example which produces different output for different machines but all with R 3.1.0 und dplyr 0.2 (more details below) :

library(dplyr)

df <- data.frame(value=seq(1,10), height=c(rep(1,5),rep(2,5)))

# height is no factor
dfs <- df %>% group_by(height) %>% summarize(m=mean(value))
dfs$height==dfs$height[1]

# height is factor
df$height <- as.factor(df$height)
dfs <- df %>% group_by(height) %>% summarize(m=mean(value))

The faulty output from one machine: The data frame

   value height
1      1      1
2      2      1
3      3      1
4      4      1
5      5      1
6      6      2
7      7      2
8      8      2
9      9      2
10    10      2

Is summarized by height. When df$height is not a factor, the results looks like this:

  height        m
1      1 4.500000
2      1 3.000000
3      1 2.000000
4      1 1.000000
5      2 9.000000
6      2 6.000000
7      2 8.333333

while if df$height is a factor, the result is what I want:

  height m
1      1 3
2      2 8

Why does something like this happen? My first guess was that it is a numerical issue and non-factor heights differ very slightly. However, the test above shows that all height==1 are equal:

> dfs$height==dfs$height[1]
[1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

The faulty output is from an older system with with e.g. gcc 4.3.2: SUSE Linux Enterprise Server 11 (x86_64). On all more recent systems I checked, the issue did not occur.

Is this a bug, or is grouping by float just stupid and proper results cannot be guaranteed?

@hadley
Copy link
Member

hadley commented Jul 28, 2014

I can't reproduce this problem. Any thoughts @romainfrancois ?

@hadley hadley closed this as completed Aug 1, 2014
@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants