-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding, std::string, and Rcpp::String #56
Comments
Ah that's interesting! I hadn't picked up on this issue because I've been testing on MacOS and Ubuntu, where it works Am I right in understanding in the link to the Rcpp discussion where it suggests using If that's the case I've just created a branch where I've changed all instances of devtools::install_github("SymbolixAU/jsonify", ref = "issue56") |
Thanks for the effort, but unfortunately it didn't work. I've never used typedef Vector<STRSXP> CharacterVector ;
typedef Vector<STRSXP> StringVector ; As far as I can tell, #include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::List test_encoding() {
std::string std_string("回收站");
Rcpp::List out(3);
out[0] = std_string;
out[1] = Rcpp::String(std_string);
out[2] = Rcpp::String(std_string, CE_UTF8);
out.attr("names") = Rcpp::CharacterVector::create(
"auto wrapped", "no encoding", "explicit encoding"
);
return out;
}
/*** R
test_encoding()
#>$`auto wrapped`
#>[1] "回收站"
#>
#>$`no encoding`
#>[1] "回收站"
#>
#>$`explicit encoding`
#>[1] "回收站"
*/
This discussion is also relevant, RcppCore/Rcpp#263. Based on that, I'm wondering if there is indeed a macro floating around that I'm missing, so that's what I'm looking for next. |
@ChrisMuir I don't suppose you or @knapply are able to make a fix for this? I don't have access to a Windows machine so won't get a chance to test & fix in the near future? Happy to help find the relevant pieces of code though. |
I have an old PC that I haven't touched in years. If it will boot, I can take a stab at this tonight. I'll update my progress here. |
I didn't notice anything that they wouldn't play along with, so I wrapped all the string-related operations that eventually return to R in That (seems) to be all that was needed; PR inbound: #57 example_json <- '{"name":"回收站","arabic_alphabet":"غ ظ ض ذ خ ث ت ش ر ق ص ف ع س ن م ل ك ي ط ح ز و ه د ج ب أ"}'
jsonify::from_json(example_json)
#> $name
#> [1] "回收站"
#>
#> $arabic_alphabet
#> [1] "غ ظ ض ذ خ ث ت ش ر ق ص ف ع س ن م ل ك ي ط ح ز و ه د ج ب أ"
jsonify::from_json(example_json, simplify = FALSE)
#> $name
#> [1] "回收站"
#>
#> $arabic_alphabet
#> [1] "غ ظ ض ذ خ ث ت ش ر ق ص ف ع س ن م ل ك ي ط ح ز و ه د ج ب أ"
temp_file <- tempfile(fileext = ".json")
readr::write_file(example_json, temp_file)
jsonify::from_json(temp_file)
#> $name
#> [1] "回收站"
#>
#> $arabic_alphabet
#> [1] "غ ظ ض ذ خ ث ت ش ر ق ص ف ع س ن م ل ك ي ط ح ز و ه د ج ب أ"
sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18362)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] jsonify_1.0.0 readr_1.3.1 compiler_3.6.1 backports_1.1.5 R6_2.4.0
#> [6] hms_0.5.1 tools_3.6.1 pillar_1.4.2.9001 rstudioapi_0.10.0-9000 tibble_2.1.3
#> [11] crayon_1.3.4 Rcpp_1.0.2 vctrs_0.2.0 zeallot_0.1.0 packrat_0.5.0
#> [16] pkgconfig_2.0.3 rlang_0.4.1 |
closed in #57 |
Thank you so much for working on this.
I took the package for a test drive and noticed it encounters an issue I haven't been able to solve. Perhaps you have some ideas.
My understanding is that by using
std::string
, strings will automatically use native encoding when they are returned to R (at least on Windows). The result is that strings can be irreparably mangled.I haven't found a way to avoid this with
std::string
and have stuck toRcpp::String
myself.I posed the question on StackOverflow with examples at the C++ level: https://stackoverflow.com/questions/58126425/rcppstring-keep-utf-8-encoding-but-stdstring-does-not
and here's an example of the behavior with
{jsonify}
:The text was updated successfully, but these errors were encountered: