`vec_unique()` makes a distinction between +0 and -0 while `unique()` does not #637

cmuyehara · 2019-10-24T23:56:54Z

Hi,

I noticed that vec_unique() seems to make a distinction between +0 and -0 while most R functions do not. This can affect the behavior of pivot_wider when +0 and -0 are present in the id_cols, causing it to generate two rows instead of one.

This seems to be something about how the underlying C function vctrs_unique_loc() works, but that's as far as I got.

library(tidyverse)
library(vctrs)

posZero = 0
negZero = ceiling(-0.9)

unique(posZero, negZero)
#> [1] 0
vec_unique(c(posZero, negZero ))
#> [1] 0 0

# using `abs()` to convert -0 to a +0 fixes the problem
vec_unique(abs(c(posZero, negZero)))
#> [1] 0

# Similarly, `identical()` normally treats these as equivalent
identical(posZero, negZero, num.eq = T)
#> [1] TRUE

# However, setting num.eq = F does a bitwise comparison and differentiates between +0 and -0
identical(posZero, negZero, num.eq = F)
#> [1] FALSE

# This can cause `pivot_wider()` to generate extra rows
data.frame(id = c(posZero, negZero),
           names = c('a', 'b'), 
           vals = c(1,2)) %>%
  pivot_wider(id_cols = id, names_from =  names, values_from = vals )
#> # A tibble: 2 x 3
#>      id     a     b
#>   <dbl> <dbl> <dbl>
#> 1     0     1    NA
#> 2     0    NA     2

# using `abs()` to convert the -0 to a +0 again fixes the problem
data.frame(id = c(posZero, negZero),
           names = c('a', 'b'), 
           vals = c(1,2)) %>%
  mutate(id = abs(id)) %>%
  pivot_wider(id_cols = id, names_from =  names, values_from = vals )
#> # A tibble: 1 x 3
#>      id     a     b
#>   <dbl> <dbl> <dbl>
#> 1     0     1     2

^{Created on 2019-10-24 by the reprex package (v0.3.0)}

DavisVaughan · 2019-10-25T10:00:41Z

Unique explicitly handles 0 to account for this in their hashing of doubles. Both are equal to 0 using ==. https://github.com/wch/r-source/blob/800fa450bca0bef95d53ce705cbc70f6075fdec0/src/main/unique.c#L115

DavisVaughan mentioned this issue Oct 25, 2019

Treat positive and negative 0 as equivalent when hashing #638

Merged

DavisVaughan closed this as completed in #638 Oct 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`vec_unique()` makes a distinction between +0 and -0 while `unique()` does not #637

`vec_unique()` makes a distinction between +0 and -0 while `unique()` does not #637

cmuyehara commented Oct 24, 2019

DavisVaughan commented Oct 25, 2019 •

edited

Loading

vec_unique() makes a distinction between +0 and -0 while unique() does not #637

vec_unique() makes a distinction between +0 and -0 while unique() does not #637

Comments

cmuyehara commented Oct 24, 2019

DavisVaughan commented Oct 25, 2019 • edited Loading

`vec_unique()` makes a distinction between +0 and -0 while `unique()` does not #637

`vec_unique()` makes a distinction between +0 and -0 while `unique()` does not #637

DavisVaughan commented Oct 25, 2019 •

edited

Loading