Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upEncoding, std::string, and Rcpp::String #56
Comments
|
Ah that's interesting! I hadn't picked up on this issue because I've been testing on MacOS and Ubuntu, where it works Am I right in understanding in the link to the Rcpp discussion where it suggests using If that's the case I've just created a branch where I've changed all instances of devtools::install_github("SymbolixAU/jsonify", ref = "issue56") |
|
Thanks for the effort, but unfortunately it didn't work. I've never used typedef Vector<STRSXP> CharacterVector ;
typedef Vector<STRSXP> StringVector ;As far as I can tell, #include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::List test_encoding() {
std::string std_string("回收站");
Rcpp::List out(3);
out[0] = std_string;
out[1] = Rcpp::String(std_string);
out[2] = Rcpp::String(std_string, CE_UTF8);
out.attr("names") = Rcpp::CharacterVector::create(
"auto wrapped", "no encoding", "explicit encoding"
);
return out;
}
/*** R
test_encoding()
#>$`auto wrapped`
#>[1] "回收站"
#>
#>$`no encoding`
#>[1] "回收站"
#>
#>$`explicit encoding`
#>[1] "回收站"
*/
This discussion is also relevant, RcppCore/Rcpp#263. Based on that, I'm wondering if there is indeed a macro floating around that I'm missing, so that's what I'm looking for next. |
|
@ChrisMuir I don't suppose you or @knapply are able to make a fix for this? I don't have access to a Windows machine so won't get a chance to test & fix in the near future? Happy to help find the relevant pieces of code though. |
|
I have an old PC that I haven't touched in years. If it will boot, I can take a stab at this tonight. I'll update my progress here. |
|
I didn't notice anything that they wouldn't play along with, so I wrapped all the string-related operations that eventually return to R in That (seems) to be all that was needed; PR inbound: #57 example_json <- '{"name":"回收站","arabic_alphabet":"غ ظ ض ذ خ ث ت ش ر ق ص ف ع س ن م ل ك ي ط ح ز و ه د ج ب أ"}'
jsonify::from_json(example_json)
#> $name
#> [1] "回收站"
#>
#> $arabic_alphabet
#> [1] "غ ظ ض ذ خ ث ت ش ر ق ص ف ع س ن م ل ك ي ط ح ز و ه د ج ب أ"
jsonify::from_json(example_json, simplify = FALSE)
#> $name
#> [1] "回收站"
#>
#> $arabic_alphabet
#> [1] "غ ظ ض ذ خ ث ت ش ر ق ص ف ع س ن م ل ك ي ط ح ز و ه د ج ب أ"
temp_file <- tempfile(fileext = ".json")
readr::write_file(example_json, temp_file)
jsonify::from_json(temp_file)
#> $name
#> [1] "回收站"
#>
#> $arabic_alphabet
#> [1] "غ ظ ض ذ خ ث ت ش ر ق ص ف ع س ن م ل ك ي ط ح ز و ه د ج ب أ"
sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18362)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
#> [4] LC_NUMERIC=C LC_TIME=English_United States.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] jsonify_1.0.0 readr_1.3.1 compiler_3.6.1 backports_1.1.5 R6_2.4.0
#> [6] hms_0.5.1 tools_3.6.1 pillar_1.4.2.9001 rstudioapi_0.10.0-9000 tibble_2.1.3
#> [11] crayon_1.3.4 Rcpp_1.0.2 vctrs_0.2.0 zeallot_0.1.0 packrat_0.5.0
#> [16] pkgconfig_2.0.3 rlang_0.4.1 |
|
closed in #57 |

Thank you so much for working on this.
I took the package for a test drive and noticed it encounters an issue I haven't been able to solve. Perhaps you have some ideas.
My understanding is that by using
std::string, strings will automatically use native encoding when they are returned to R (at least on Windows). The result is that strings can be irreparably mangled.I haven't found a way to avoid this with
std::stringand have stuck toRcpp::Stringmyself.I posed the question on StackOverflow with examples at the C++ level: https://stackoverflow.com/questions/58126425/rcppstring-keep-utf-8-encoding-but-stdstring-does-not
and here's an example of the behavior with
{jsonify}: