-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vec_unique() is sensitive to the marked encoding, where base::unique() is not #553
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I'd imagine the fix would go here Line 144 in f5fe6e0
|
I could have sworn I looked at the encoding of these column names. But in any case, thanks all, and here is the most minimal reprex: marked <- unknown <- "Max Temp (\u00B0C)"
Encoding(unknown) <- "unknown"
(x <- c(marked, unknown))
#> [1] "Max Temp (°C)" "Max Temp (°C)"
Encoding(x)
#> [1] "UTF-8" "unknown"
unique(x)
#> [1] "Max Temp (°C)"
x[[1]] == x[[2]]
#> [1] TRUE
vctrs::vec_unique(x)
#> [1] "Max Temp (°C)" "Max Temp (°C)" Created on 2019-08-30 by the reprex package (v0.3.0.9000) |
After more research, Line 106 in 480165c
|
See below for a more minimal reprex.
Long backstory
The troublesome object comes from https://github.com/tidyverse/tidyr/issues/722. It presents as a problem with `tidyr::unnest()` but I've narrowed it down to a very weird phenomenon with `vctrs::vec_rbind()`. I've pulled out the relevant list-column here, as just a list of tibbles.The only apparent difference is in the attributes of the tibble components,
i.e. the presence of
flag_info
.In particular, the sub-tibble names appear to be the same.
But we get a different result from
vec_rbind()
. The columns with a specialcharacter in the name aren’t correctly “matched up” with
result_bad
and weget two copies.
Stripping the
flag_info
attribute doesn’t rescue this. Seems irrelevant.The problem goes away with less challenging names, even without removing the
flag_info
attribute.Directly assigning the exact same names fixes it.
Re-assigning the same names this way does not fix it.
BUT … re-assigning the same names with one level of indirection DOES fix it.
I can’t see any differences in these names with
rawToChar()
.Putting some distinguishing data in makes it easier to see that the column
names aren’t being correctly “matched up”. Also indicates that the problem isn't due to the first tibble having zero rows.
Created on 2019-08-30 by the reprex package (v0.3.0.9000)
The text was updated successfully, but these errors were encountered: