-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guidance on JSON escaping for non-ascii characters. #46
Comments
@tomchristie great question, and one I don't think we happened to run across in the wild, so I don't think I've given it too much thought just yet. At a glance, I suppose I might lean toward (assuming both are valid) the one that involves the least processing/change. ie if we don't have a good reason to force an encode on our side (and the resultant decode on the client), it seems like it would be easier to save ourselves the trouble of doing it (as well as remembering that it is required). That said, I haven't run in to this before, so I fully suspect nuances that I'm missing. Could you elaborate a bit on the pro-encoding side of things and/or let me know what you think about the leave-it-be argument I've roughly set forward? Thanks! |
That'd be a valid option. What that actually means might depend on which frameworks and/or json encoding libs you're using for your various services. For example see differences between Rails 3.2.13 vs Rails 4.0.
The option that requires least thought is clearly to use escaped characters. However it's nicer to users if the API presents un-escaped characters - that way command line tools such as It's not clear to me what further subetlies there might be around un-escaped charated tho. For example it's probably still a good idea to escape control characters in that case, as per this example. If leaving as utf-8, what ranges would still need encoding? The current set as used in Rails might be an okay choice, but it's not obvious. (Note: I'm only using rails as an example here as it's the one place I've noticed where there's actually been some kinda of conscious design decision) |
Yeah, I think it is definitely helpful to reference places where some effort went in to making this decision already. un-escaped does seem likely to work in the most different places (without extra work), at least at a guess. Control characters are an odd case, but perhaps there would be cases where an API would want to include them for curl or something? It would be pretty weird, but maybe possible. You could argue that in most cases you probably shouldn't be including these in API responses in the first place I suppose. Anyway, seems like we are still leaning more toward unescaped/raw if I'm not mistaken. I'd maybe even say that we could just leave it as that generic recommendation and defer whether or not we need to say it should be escaped for some narrower character set or not until somebody more explicitly runs up against the question so we can have clearer examples/inputs to work from. What do you think? |
Coming back to this shortly, but in the meantime referencing Python's behaviour with The standard JSON escape chars are escaped, using their shortforms... (Ie not the hex version)
The following control characters are escaped to the hex notation:
Everything else is regular unicode. (Linking to |
Noting that JSON requires \x00 - \x1F to always be escaped. |
Relevant link on u2028 u2029... http://stackoverflow.com/questions/2965293/javascript-parse-error-on-u2028-unicode-character Do shout me down if I'm being too verbose :) seems best to put this things down for future reference. |
Not sure if relevant or not, but JSLint on 'unsafe' chars that should be escaped (in the context of a browser)... http://www.jslint.com/lint.html#unsafe |
Okay, my thoughts after all that: I guess I'd probably recommend either as okay, but unicode as preferred, due to being more user friendly when displayed. Would probably be okay to underspecify the required escapes, perhaps simply noting that control characters do need escaping, as per the JSON spec. Alternatively, consider this as out-of-scope and offer no guidance one way or the other. (Also not unreasonable) |
I'd be up for recommending unicode + escaped control characters. I think that seems reasonable and a good thing to note (thanks for the detailed references/notes). Would you be up for a pull request with the related verbiage? Thanks! |
Sure thing, consider it on my todo list. |
@tomchristie no worries, certainly no hurry here. In the mean time we can definitely refer people to this discussion if it comes up. Will just be nice to polish it up and get it in there when you have a moment. Thanks! |
JSON allows for both escaped or non-escaped non-ascii characters.
It'd be useful for this document to include guidance on which style is preferred, or if there is no preference.
For example, the following is valid JSON:
As is the unescaped variant:
I could see valid arguments for either case.
Happy of course if you consider this out-of-scope, but I know it's something I'd value knowing another team's design preferences.
The text was updated successfully, but these errors were encountered: