Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle UTF-8 #22

Merged
merged 1 commit into from Aug 24, 2016

Conversation

@laanwj
Copy link
Contributor

laanwj commented Apr 18, 2016

This adds full UTF-8 support both on input and output.

Input: read and validate full UTF-8, both Basic Multilingual Plane and extended characters.
Collate surrogate pairs as specified in RFC4627.
This ensures that UTF-8 strings that reach the application are always valid, and that invalid UTF-8 fails parsing.

Output: Assume UTF-8 strings provided for output are valid.
The escaping was broken, fix this by not encoding UTF-8 characters with \u.
Writing them to the output stream as-is is the right thing to do.
See https://www.ietf.org/rfc/rfc4627.txt:

"JSON text SHALL be encoded in Unicode.  The default encoding is UTF-8."

Also add tests for the new functionality.
Fixes #16.
Sister PR in the bitcoin core repository, with examples: Add full UTF-8 support to RPC #7892

@MarcoFalke

This comment has been minimized.

Copy link
Contributor

MarcoFalke commented Apr 18, 2016

You didn't adjust src/univalue/Makefile.am?

@laanwj

This comment has been minimized.

Copy link
Contributor Author

laanwj commented Apr 18, 2016

Ah yes, thanks for noticing, this needs the build system changes

Edit: done

This adds full UTF-8 support both on input and output.

Input: read and validate full UTF-8, both Basic Multilingual Plane
and extended characters.
Collate surrogate pairs as specified in RFC4627.
This ensures that UTF-8 strings that reach the application are
always valid, and that invalid UTF-8 fails parsing.

Output: Assume UTF-8 strings provided for output are valid.
The escaping was broken, fix this by not encoding UTF-8 characters with \u.
Writing them to the output stream as-is is the right thing to do.
See https://www.ietf.org/rfc/rfc4627.txt:

    "JSON text SHALL be encoded in Unicode.  The default encoding is UTF-8."

Also add tests for the new functionality.
Fixes #16.
@laanwj laanwj force-pushed the laanwj:2016_04_unicode branch from 81c55da to c9a716c Apr 19, 2016
@MarcoFalke

This comment has been minimized.

Copy link
Contributor

MarcoFalke commented May 27, 2016

Concept ACK

laanwj added a commit to bitcoin-core/univalue that referenced this pull request Jun 10, 2016
Merge UTF-8 support
See jgarzik#22
@luke-jr

This comment has been minimized.

Copy link

luke-jr commented Aug 16, 2016

This seems fairly well-tested in Core. Would be nice to have it in an official release of the lib too...

@MarcoFalke

This comment has been minimized.

Copy link
Contributor

MarcoFalke commented Aug 24, 2016

@jgarzik Anything holding this back?

@jgarzik jgarzik merged commit 09a2693 into jgarzik:master Aug 24, 2016
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.