JSON encodes >0xff to utf8 but needs to encode to ucs2 #845

Closed
rurban opened this Issue Sep 26, 2012 · 1 comment

Comments

Projects
None yet
1 participant
@rurban
Member

rurban commented Sep 26, 2012

pge2pir.tg:

unicode:
    $I0 = $P1.'to_int'(16)
    $S0 = chr $I0           # this encodes to utf8!

e.g. \u0080 is encoded to 0xC2 0x80 (utf8) but needs to be 0x00 0x80 (utf16 or better ucs2)
ucs2 is 2 byte only, utf16 potentially longer. \u is only 2 bytes.

This broke the mime_base64 tests. See GH #813

rurban pushed a commit that referenced this issue Sep 28, 2012

Reini Urban
[GH #845] Fix data_json \unnnn encoding
\u needs to be encoded as utf16 char, not utf8 because they
are binary different. E.g. \u00a2 is 0xc2a2 in utf8.

Add utf16_chr vtable method for the Parrot_utf16_encoding.
Add unicode tests to data_json.

@ghost ghost assigned rurban Sep 28, 2012

rurban pushed a commit that referenced this issue Sep 28, 2012

rurban pushed a commit that referenced this issue Sep 28, 2012

rurban pushed a commit that referenced this issue Sep 30, 2012

Reini Urban
[GH #845] Add more unicode tests to data_json.
Add more documentation to the chr op.
@rurban

This comment has been minimized.

Show comment Hide comment
@rurban

rurban Oct 1, 2012

Member

Invalid ticket.
Since we rely in integer codepoints in ord/char in all our string handling functions (which are subjective to native endianess) and not on the binary representation, changing to a "better" binary representation will break more.

The only problem is the binary encoding and the MIME::Base64 tests, which have to work around these issues.

Member

rurban commented Oct 1, 2012

Invalid ticket.
Since we rely in integer codepoints in ord/char in all our string handling functions (which are subjective to native endianess) and not on the binary representation, changing to a "better" binary representation will break more.

The only problem is the binary encoding and the MIME::Base64 tests, which have to work around these issues.

@rurban rurban closed this Oct 1, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment