JSONP: Always escape U+2028 and U+2029 #37

Merged
merged 1 commit into from May 6, 2011

Conversation

Projects
None yet
6 participants
Contributor

judofyr commented May 6, 2011

I just discovered a "bug" in JSON:

JSON is not a true subset of JavaScript because of two tiny whitespace unicode characters: U+2028 and U+2029. In ECMA-262 they are defined as "Line Terminator Characters" (see 7.3 Line Terminator) and are therefore equivalent of \n and \r. That means they are not valid in the middle of a string.

According to JSON U+2029 and U+2029 are just two regular Unicode characters and are therefore valid in the middle of a string. This is usually not a problem as long as you use a proper JSON parser, but in the case of JSONP the browser is the JSON parser.

This pull request will simply escape any U+2028/9 (to \u2028/9) and fixes this issue.

For a "real-world" example of this issue, you can try to call the JSONP-API of GitHub to the YARD-repository (whose description happens to include U+2028):

<script src="https://github.com/api/v2/json/repos/show/lsegal/yard?callback=alert"></script>
Member

mtodd commented May 6, 2011

Great find, Magnus. UTF is a tricky bitch! :)

So, we're basically still handing them the same values, just escaping them first? That sounds like a great solution, as opposed to incorrectly returning a 400 Bad Request.

Looks good, I'll merge it in and push it up.

mtodd merged commit 9f5bcdd into rack:master May 6, 2011

@mtodd mtodd commented on the diff May 6, 2011

lib/rack/contrib/jsonp.rb
@@ -75,7 +75,17 @@ module Rack
# since JSON is returned as a full string.
#
def pad(callback, response, body = "")
- response.each{ |s| body << s.to_s }
+ response.each do |s|
+ # U+2028 and U+2029 are allowed inside strings in JSON (as all literal
+ # Unicode characters) but JavaScript defines them as newline
+ # seperators. Because no literal newlines are allowed in a string, this
+ # causes a ParseError in the browser. We work around this issue by
+ # replacing them with the escaped version. This should be safe because
+ # according to the JSON spec, these characters are *only* valid inside
+ # a string and should therefore not be present any other places.
@mtodd

mtodd May 6, 2011

Member

Great job commenting this.

@judofyr

judofyr May 6, 2011

Contributor

I was just about to commit this (without comments) when I said to myself: "WTF is this? It doesn't make any sense at all? What's so special about U+2028/9?" Then I wrote the comment and all was good.

@mtodd

mtodd May 6, 2011

Member

Haha, and it clarified why escaping made the most sense (when my initial thought was to reject it instead). :)

@judofyr

judofyr May 15, 2011

Contributor

Technically, it's not really a 400 Bad Request either. The request from the client is perfectly fine, it's the response from the server that's wrong…

@mtodd

mtodd May 15, 2011

Member

Of course, but it's only obvious when those UTF-8 characters are explained :)

@jchris jchris commented on the diff May 15, 2011

lib/rack/contrib/jsonp.rb
@@ -75,7 +75,17 @@ module Rack
# since JSON is returned as a full string.
#
def pad(callback, response, body = "")
- response.each{ |s| body << s.to_s }
+ response.each do |s|
+ # U+2028 and U+2029 are allowed inside strings in JSON (as all literal
+ # Unicode characters) but JavaScript defines them as newline
+ # seperators. Because no literal newlines are allowed in a string, this
+ # causes a ParseError in the browser. We work around this issue by
+ # replacing them with the escaped version. This should be safe because
+ # according to the JSON spec, these characters are *only* valid inside
+ # a string and should therefore not be present any other places.
+ body << s.to_s.gsub("\u2028", '\u2028').gsub("\u2029", '\u2029')
@jchris

jchris May 15, 2011

It would be cool to summarize why this gsub technique works (are the single and double quotes meaningful?) for folks who aren't deep rubyists. Also, I'm curious what the original was doing with response.each{ |s| body << s.to_s }

@judofyr

judofyr May 15, 2011

Contributor

It would be cool to summarize why this gsub technique works (are the single and double quotes meaningful?) for folks who aren't deep rubyists.

In Ruby "\u2028" is a string that contains a single character: U+2028. '\u2028' on the other hand contains 6 characters:

# Unpack the Unicode codepoints:
>> "\u2028".unpack("U*")
=> [8232]
>> '\u2028'.unpack("U*")
=> [92, 117, 50, 48, 50, 56]

JSON specifies that \u followed by 4 hex-digits is the same as a single, literal Unicode character.

Also, I'm curious what the original was doing with response.each{ |s| body << s.to_s }

In Ruby the response isn't actually a string; it's something you can iterate. The reason for this is so you can create lazy/streaming responses (so you don't have to put everything into memory at the same time). This code on the other hand will put everything in memory at the same time, but it's quite simple to change it to a streaming version. So far, nobody has requested that, and it only matters if you serve huge JSONP responses.

Contributor

benmanns commented Sep 26, 2011

This needs to use octets to be compatible with 1.8. See #44.

rmunn referenced this pull request in gerritjvv/pjson Aug 31, 2014

Closed

asString doesn't escape U+2028 and U+2029 #1

luckydrq commented Nov 7, 2014

@judofyr

Transforming "\u2028" to '\u2028' seems weird because '\u2028' is only a string of 6 which makes no sense IMO.

mariusandra referenced this pull request in shakacode/react_on_rails Apr 27, 2016

Merged

\u2028 and \u2029 break rendering #375

json-not-subset-of-js-error
It seems that your had error encoding!
I just test it using Chrome, it's OK!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment