-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encodings #9
Encodings #9
Conversation
Yeah I think that makes sense.
The rule of thumb for this in Ruby, as I understand it, is to transcode into |
The situation with Duktape strings seems a bit odd, but there's a reason for that:
Implications for mapping into Duktape strings:
Implications for mapping from Duktape strings:
Note that I'm not necessarily suggesting doing something as complicated as above, but just wanted to describe the various alternatives :) |
Alright, I've got the basics in. @svaarala thanks for the explanation! For mapping into Duktape, we'll always transcode into UTF-8. For returned strings, I think we should just always tag as UTF-8 regardless of internal encoding. We'll probably want to do something about the surrogate pairs encoding but maybe in another PR. Heres how Ruby JSON handles those cases. https://github.com/ruby/ruby/blob/trunk/ext/json/parser/parser.c#L1304-L1372 |
We need to decide on an encoding policy. Right now all the returned strings are untagged.
I can't say I fully understand CESU-8, but it sounds we should always be transcoding Ruby Strings to UTF-8 before handing them off to Duktape. As for the returned object, they should either be tagged as UTF-8 or transcoded back to default internal/external (whatever one). I'd be fine with always having UTF-8 strings.
/cc @judofyr @brianmario