New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSONP: Always escape U+2028 and U+2029 #37

Merged
merged 1 commit into from May 6, 2011

Conversation

Projects
None yet
6 participants
@judofyr
Contributor

judofyr commented May 6, 2011

I just discovered a "bug" in JSON:

JSON is not a true subset of JavaScript because of two tiny whitespace unicode characters: U+2028 and U+2029. In ECMA-262 they are defined as "Line Terminator Characters" (see 7.3 Line Terminator) and are therefore equivalent of \n and \r. That means they are not valid in the middle of a string.

According to JSON U+2029 and U+2029 are just two regular Unicode characters and are therefore valid in the middle of a string. This is usually not a problem as long as you use a proper JSON parser, but in the case of JSONP the browser is the JSON parser.

This pull request will simply escape any U+2028/9 (to \u2028/9) and fixes this issue.

For a "real-world" example of this issue, you can try to call the JSONP-API of GitHub to the YARD-repository (whose description happens to include U+2028):

<script src="https://github.com/api/v2/json/repos/show/lsegal/yard?callback=alert"></script>
@mtodd

This comment has been minimized.

Show comment
Hide comment
@mtodd

mtodd May 6, 2011

Member

Great find, Magnus. UTF is a tricky bitch! :)

So, we're basically still handing them the same values, just escaping them first? That sounds like a great solution, as opposed to incorrectly returning a 400 Bad Request.

Looks good, I'll merge it in and push it up.

Member

mtodd commented May 6, 2011

Great find, Magnus. UTF is a tricky bitch! :)

So, we're basically still handing them the same values, just escaping them first? That sounds like a great solution, as opposed to incorrectly returning a 400 Bad Request.

Looks good, I'll merge it in and push it up.

mtodd added a commit that referenced this pull request May 6, 2011

Merge pull request #37 from judofyr/jsonp_u2028
JSONP: Always escape U+2028 and U+2029

@mtodd mtodd merged commit 9f5bcdd into rack:master May 6, 2011

# causes a ParseError in the browser. We work around this issue by
# replacing them with the escaped version. This should be safe because
# according to the JSON spec, these characters are *only* valid inside
# a string and should therefore not be present any other places.

This comment has been minimized.

@mtodd

mtodd May 6, 2011

Member

Great job commenting this.

@mtodd

mtodd May 6, 2011

Member

Great job commenting this.

This comment has been minimized.

@judofyr

judofyr May 6, 2011

Contributor

I was just about to commit this (without comments) when I said to myself: "WTF is this? It doesn't make any sense at all? What's so special about U+2028/9?" Then I wrote the comment and all was good.

@judofyr

judofyr May 6, 2011

Contributor

I was just about to commit this (without comments) when I said to myself: "WTF is this? It doesn't make any sense at all? What's so special about U+2028/9?" Then I wrote the comment and all was good.

This comment has been minimized.

@mtodd

mtodd May 6, 2011

Member

Haha, and it clarified why escaping made the most sense (when my initial thought was to reject it instead). :)

@mtodd

mtodd May 6, 2011

Member

Haha, and it clarified why escaping made the most sense (when my initial thought was to reject it instead). :)

This comment has been minimized.

@judofyr

judofyr May 15, 2011

Contributor

Technically, it's not really a 400 Bad Request either. The request from the client is perfectly fine, it's the response from the server that's wrong…

@judofyr

judofyr May 15, 2011

Contributor

Technically, it's not really a 400 Bad Request either. The request from the client is perfectly fine, it's the response from the server that's wrong…

This comment has been minimized.

@mtodd

mtodd May 15, 2011

Member

Of course, but it's only obvious when those UTF-8 characters are explained :)

@mtodd

mtodd May 15, 2011

Member

Of course, but it's only obvious when those UTF-8 characters are explained :)

# replacing them with the escaped version. This should be safe because
# according to the JSON spec, these characters are *only* valid inside
# a string and should therefore not be present any other places.
body << s.to_s.gsub("\u2028", '\u2028').gsub("\u2029", '\u2029')

This comment has been minimized.

@jchris

jchris May 15, 2011

It would be cool to summarize why this gsub technique works (are the single and double quotes meaningful?) for folks who aren't deep rubyists. Also, I'm curious what the original was doing with response.each{ |s| body << s.to_s }

@jchris

jchris May 15, 2011

It would be cool to summarize why this gsub technique works (are the single and double quotes meaningful?) for folks who aren't deep rubyists. Also, I'm curious what the original was doing with response.each{ |s| body << s.to_s }

This comment has been minimized.

@judofyr

judofyr May 15, 2011

Contributor

It would be cool to summarize why this gsub technique works (are the single and double quotes meaningful?) for folks who aren't deep rubyists.

In Ruby "\u2028" is a string that contains a single character: U+2028. '\u2028' on the other hand contains 6 characters:

# Unpack the Unicode codepoints:
>> "\u2028".unpack("U*")
=> [8232]
>> '\u2028'.unpack("U*")
=> [92, 117, 50, 48, 50, 56]

JSON specifies that \u followed by 4 hex-digits is the same as a single, literal Unicode character.

Also, I'm curious what the original was doing with response.each{ |s| body << s.to_s }

In Ruby the response isn't actually a string; it's something you can iterate. The reason for this is so you can create lazy/streaming responses (so you don't have to put everything into memory at the same time). This code on the other hand will put everything in memory at the same time, but it's quite simple to change it to a streaming version. So far, nobody has requested that, and it only matters if you serve huge JSONP responses.

@judofyr

judofyr May 15, 2011

Contributor

It would be cool to summarize why this gsub technique works (are the single and double quotes meaningful?) for folks who aren't deep rubyists.

In Ruby "\u2028" is a string that contains a single character: U+2028. '\u2028' on the other hand contains 6 characters:

# Unpack the Unicode codepoints:
>> "\u2028".unpack("U*")
=> [8232]
>> '\u2028'.unpack("U*")
=> [92, 117, 50, 48, 50, 56]

JSON specifies that \u followed by 4 hex-digits is the same as a single, literal Unicode character.

Also, I'm curious what the original was doing with response.each{ |s| body << s.to_s }

In Ruby the response isn't actually a string; it's something you can iterate. The reason for this is so you can create lazy/streaming responses (so you don't have to put everything into memory at the same time). This code on the other hand will put everything in memory at the same time, but it's quite simple to change it to a streaming version. So far, nobody has requested that, and it only matters if you serve huge JSONP responses.

@benmanns

This comment has been minimized.

Show comment
Hide comment
@benmanns

benmanns Sep 26, 2011

Contributor

This needs to use octets to be compatible with 1.8. See #44.

Contributor

benmanns commented Sep 26, 2011

This needs to use octets to be compatible with 1.8. See #44.

@luckydrq

This comment has been minimized.

Show comment
Hide comment
@luckydrq

luckydrq Nov 7, 2014

@judofyr

Transforming "\u2028" to '\u2028' seems weird because '\u2028' is only a string of 6 which makes no sense IMO.

luckydrq commented Nov 7, 2014

@judofyr

Transforming "\u2028" to '\u2028' seems weird because '\u2028' is only a string of 6 which makes no sense IMO.

@xgqfrms-GitHub

This comment has been minimized.

Show comment
Hide comment
@xgqfrms-GitHub

xgqfrms-GitHub Sep 29, 2016

json-not-subset-of-js-error
It seems that your had error encoding!
I just test it using Chrome, it's OK!

xgqfrms-GitHub commented Sep 29, 2016

json-not-subset-of-js-error
It seems that your had error encoding!
I just test it using Chrome, it's OK!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment