Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unescaped tab, line feed, carriage return should not be accepted in strings #90

Closed
dtolnay opened this issue Jun 28, 2016 · 8 comments
Closed
Labels

Comments

@dtolnay
Copy link
Member

dtolnay commented Jun 28, 2016

From JSON standard:

Insignificant whitespace is allowed before or after any token. The whitespace characters are: character tabulation (U+0009), line feed (U+000A), carriage return (U+000D), and space (U+0020). Whitespace is not allowed within any token, except that space is allowed in strings.

@dtolnay dtolnay added the bug label Jun 28, 2016
@dtolnay
Copy link
Member Author

dtolnay commented Jun 28, 2016

These are the only checks from JSON_checker that we fail. Rustc-serialize correctly rejects unescaped whitespace in strings.

@oli-obk
Copy link
Member

oli-obk commented Jun 28, 2016

Is it a problem if our parser/deserializer is more lenient than the standard, as long as our serializer produces correct json?

@StefanoD
Copy link

StefanoD commented Jun 28, 2016

@oli-obk So, you want to guess what the sender wanted to send you? Can be dangerous...

@dtolnay
Copy link
Member Author

dtolnay commented Jun 28, 2016

I think we should aim to accept valid JSON and reject invalid JSON. I would make one exception which is I think it is okay for us to accept types other than list and map at the root level.

@oli-obk
Copy link
Member

oli-obk commented Jun 28, 2016

but if accepting valid json requires additional code and conditions, it's additional code we need to maintain and test + it slows down the regular path. If the correct way is faster/easier (like with forbidding trailing commas), then it's fine with me.

@maciejhirsz
Copy link

Kind of related, I've been looking at control characters:

The control characters U+0000–U+001F and U+007F come from ASCII

0x7F is not marked as U in the LUT.

@maciejhirsz
Copy link

maciejhirsz commented Jul 5, 2016

I've been roaming around, since I'm looking for more universal testing suite for myself.

@dtolnay

I would make one exception which is I think it is okay for us to accept types other than list and map at the root level.

That's not an exception, both ECMA 404 and RFC 7159 state that JSON text has to conform to the grammar of a JSON value, which permits strings, numbers and and the 3 literals.

@oli-obk

it's additional code we need to maintain and test + it slows down the regular path.

I've done this with a LUT and it didn't slow down the regular path at all. The logic is pretty trivial, there isn't much to maintain or test.

@dtolnay
Copy link
Member Author

dtolnay commented Jul 6, 2016

This was fixed in #98/#100.

@dtolnay dtolnay closed this as completed Jul 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants