Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

na_equal = FALSE not respected in presence of non-ASCII strings? #1291

Closed
hadley opened this issue Nov 17, 2020 · 2 comments · Fixed by #1292
Closed

na_equal = FALSE not respected in presence of non-ASCII strings? #1291

hadley opened this issue Nov 17, 2020 · 2 comments · Fixed by #1292

Comments

@hadley
Copy link
Member

hadley commented Nov 17, 2020

df1 <- tibble::tibble(x = c(NA, "e"))
key <- vctrs::vec_group_loc(df1)
vctrs::vec_match(key$key, df1, na_equal = FALSE)
#> [1] NA  2

df1 <- tibble::tibble(x = c(NA, "é"))
key <- vctrs::vec_group_loc(df1)
vctrs::vec_match(key$key, df1, na_equal = FALSE)
#> [1] 1 2

Created on 2020-11-17 by the reprex package (v0.3.0.9001)

From tidyverse/dplyr#5568

@DavisVaughan
Copy link
Member

This looks fixed in the dev version!

df1 <- tibble::tibble(x = c(NA, "e"))
key <- vctrs::vec_group_loc(df1)
vctrs::vec_match(key$key, df1, na_equal = FALSE)
#> [1] NA  2

df1 <- tibble::tibble(x = c(NA, "é"))
key <- vctrs::vec_group_loc(df1)
vctrs::vec_match(key$key, df1, na_equal = FALSE)
#> [1] NA  2

Created on 2020-11-17 by the reprex package (v0.3.0.9001)

Probably something to do with the switch to vctrs:::vec_normalize_encoding()

Worth adding this as a test case though, ill do that.

@DavisVaughan
Copy link
Member

For some reason the old obj_maybe_translate_encoding() was turning NA_character_ into "NA".

x <- c(NA, "é")
x
#> [1] NA  "é"

vctrs:::obj_maybe_translate_encoding(x)
#> [1] "NA" "é"

Created on 2020-11-17 by the reprex package (v0.3.0.9001)

The new vec_normalize_encoding() doesn't do that:

x <- c(NA, "é")

Encoding(x)
#> [1] "unknown" "UTF-8"

vctrs:::vec_normalize_encoding(x)
#> [1] NA  "é"

y <- x
y <- iconv(y, "UTF-8", "latin1")

Encoding(y)
#> [1] "unknown" "latin1"

vctrs:::vec_normalize_encoding(y)
#> [1] NA  "é"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants