Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worth noting: No newlines in JSON strings #9

Open
snej opened this issue May 17, 2017 · 1 comment
Open

Worth noting: No newlines in JSON strings #9

snej opened this issue May 17, 2017 · 1 comment

Comments

@snej
Copy link

snej commented May 17, 2017

If the history of RSS is any guide, people are going to be writing code that generates JSON feeds by ad-hoc string concatenation or template substitution, without going through a real JSON encoder. And they're going to make mistakes that result in invalid JSON, most likely when writing article bodies.

JSON parsers will generally barf on these, which should mean that most of these mistakes get caught in casual testing before being released into the wild, but the different parsers vary in strictness, so it's possible someone will test with a more lenient parser and then their feed(s) will fail for others. Or the mistakes might only occur in some cases that aren't hit during testing.

There are two things I think are worth calling out in the spec:

JSON strings cannot contain newlines or tabs — they must be escaped as \n or \t. (The RFC requires that all control characters be escaped.) Some parsers seem not to mind if this is violated, but some do.

JSON has some very specific rules for how to escape Unicode characters. If someone uses a different library to do the encoding, the results may work most of the time but not always; for example Latin characters might make it through OK but not non-Roman ones. Again, this might slip past the kind of rudimentary testing that a lot of web-devs do (I'm talking about you, PHP kiddies.) For example, I've found that JSON-encoding NSStrings is tricky because NSString's "characters" are not Unicode codepoints but rather UTF-16-encoded values, and if you don't wrap your head around that, lots of higher-Unicode characters come out wrong. (Actually, the popularity of emoji is a real boon here, as emoji represent the most complex case of Unicode character encoding; so if you don't get the escaping correct, emoji tend to break, which is quickly apparent in real world use.)

The best advice for escaping Unicode is probably "don't do it." The spec clearly says that only double-quote, backslash and control characters need to be escaped. Everything else can appear literally in a string.

@OlofT
Copy link

OlofT commented May 28, 2019

This is a major show-stopper for JSON feed, and why I stopped using it. Whenever an author adds a newline wrongly, the whole feed breaks until that post isn't part of the feed anymore. Usually this happens when people are dealing with "pre" and "code" tags.

Prominent blogs like https://nshipster.com has been victims of this error, so much that they don't even use JSON feed any more.

A side-note:
Please refrain from language-snark, as in "PHP kiddies".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants