Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support UTF-8 for JSON #7

Merged
merged 2 commits into from
Jun 6, 2013
Merged

Support UTF-8 for JSON #7

merged 2 commits into from
Jun 6, 2013

Conversation

nscavell
Copy link
Contributor

Went ahead and took a stab at supporting UTF-8 for both parsing and displaying JSON. I left the readFromString method alone (still using US-ASCII).

I looked at a couple of libraries: org.json, json-smart, jackson, jettision and they all behave slightly differently when it comes to escaping unicode characters.

Jettision seemed to be the most accurate and it meets the requirements of the GateIn team wrt to localization. It behaves closely to http://www.ietf.org/rfc/rfc4627.txt as it only escapes unicode characters \u0000 through \u001F. It does not however escape forward slash.

@nscavell
Copy link
Contributor Author

Sorry wanted it to be a separate PR since DMR-6 is an actual bug fix. I would at least like to see DMR-6 make it into next release. Let me know if you want individual PR's for each issue.

for (int i = 1; i < length - 1; i = yyText.offsetByCodePoints(i, 1)) {
int ch = yyText.codePointAt(i);
for (int i = 1; i < length - 1; i++) {
char ch = yyText.charAt(i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change these to not use code points?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need it. I believe if you have characters outside BMP (which is my understanding of code points, but I am novice in this area) then they need to be represented by two unicode characters in json.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on the encoding in which the parser reads the content. I'm ok leaving it in if you think it solves a problem. A test to prove this code wrong would be ideal :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you changed it to expect input in UTF-8 then I think we should assume that the text might contain any code point.

@dmlloyd
Copy link
Contributor

dmlloyd commented May 16, 2013

Looks good.

@nscavell
Copy link
Contributor Author

nscavell commented Jun 3, 2013

Any update or status for this ? Would be nice to include in a release in the not so distant future :)

@dmlloyd dmlloyd merged commit 1693191 into jbossas:master Jun 6, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants