Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Escape multibyte line terminators in JSON encoding #10057

Merged
merged 1 commit into from

8 participants

zackham Krzysztof Knapik Steve Sloan Michael Saffitz Mario Caropreso Piotr Solnica Trevor Turk Rafael Mendonça França
zackham

Currently, json/encoding respects the JSON spec (as it should) which disallows \n and \r inside strings, escaping them as expected.

Unfortunately, ECMA-262 (Javascript) disallows not only \n and \r in strings, but "Line Terminators" which includes U+2028 and U+2029. See here: http://bclary.com/2004/11/07/#a-7.3

This pull request adds U+2028 and U+2029 to be escaped.

Why? 

It's very common to see something like this in a Rails template:

<script type="text/javascript"> 
var posts = <%= @posts.to_json %>;
</script>

If U+2028 or U+2029 are part of any attributes output in the to_json call, you will end up with an exception. In Chrome: Uncaught SyntaxError: Unexpected token ILLEGAL.

In other words, if one of your users pastes something into a textarea that happens to include these fancy unicode line terminators, and you run to_json on that model and stick it in a template, that page is probably broken now.

Why not?

This is JSON encoding, and the JSON spec is specific about how to encode strings. U+2028 and U+2029 don't get special treatment.

That being said, this is non-obvious, counterintuitive, and can be tough to debug (https://www.google.com/?q=u2028).

What do you do in your apps to deal with this? Is there a convention I'm missing?

zackham zackham Escape multibyte line terminators in JSON encoding
Currently, json/encoding respects the JSON spec (as it should) which 
disallows \n and \r inside strings, escaping them as expected.

Unfortunately, ECMA-262 (Javascript) disallows not only \n and \r in 
strings, but "Line Terminators" which includes U+2028 and U+2029. 
See here: http://bclary.com/2004/11/07/#a-7.3

This pull request adds U+2028 and U+2029 to be escaped.

# Why? 

It's very common to see something like this in a Rails template:

<script type="text/javascript"> 
var posts = <%= @posts.to_json %>;
</script>

If U+2028 or U+2029 are part of any attributes output in the to_json
call, you will end up with an exception.
In Chrome: Uncaught SyntaxError: Unexpected token ILLEGAL 

# Why not?

This is JSON encoding, and the JSON spec is specific about how to 
encode strings. U+2028 and U+2029 don't get special treatment.

Just trying to start a discussion... what do you do in your apps
to deal with this? Is there a convention I'm missing?
9b8ee8e
Krzysztof Knapik

+1

Mario Caropreso

The following error is raised in the activesupport's test suite:

/vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `require': /vagrant/patch1/rails/activesupport/lib/active_support/json/encoding.rb:126: too short escaped multibyte character: /\xe2\x80(\xa8|\xa9)|[\x00-\x1F"\\><&]/ (SyntaxError)
too short escaped multibyte character: /\xe2\x80(\xa8|\xa9)|[\x00-\x1F"\\]/
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `block in require'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:213:in `load_dependency'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `require'
    from /vagrant/patch1/rails/activesupport/lib/active_support/json.rb:2:in `<top (required)>'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `require'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `block in require'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:213:in `load_dependency'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `require'
    from /vagrant/patch1/rails/activesupport/test/core_ext/duration_test.rb:4:in `<top (required)>'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `require'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `block in require'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:213:in `load_dependency'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `require'
    from /home/vagrant/.rvm/gems/ruby-2.0.0-p0@global/gems/rake-10.0.4/lib/rake/rake_test_loader.rb:10:in `block (2 levels) in <main>'
    from /home/vagrant/.rvm/gems/ruby-2.0.0-p0@global/gems/rake-10.0.4/lib/rake/rake_test_loader.rb:9:in `each'
    from /home/vagrant/.rvm/gems/ruby-2.0.0-p0@global/gems/rake-10.0.4/lib/rake/rake_test_loader.rb:9:in `block in <main>'
    from /home/vagrant/.rvm/gems/ruby-2.0.0-p0@global/gems/rake-10.0.4/lib/rake/rake_test_loader.rb:4:in `select'
    from /home/vagrant/.rvm/gems/ruby-2.0.0-p0@global/gems/rake-10.0.4/lib/rake/rake_test_loader.rb:4:in `<main>'
rake aborted!
Piotr Solnica

+9001

Rafael Mendonça França rafaelfranca referenced this pull request from a commit
Rafael Mendonça França rafaelfranca Merge branch 'fix-json-encoding'
This is the compination of #10057 and 10534.

Closes #10320
e4ec944
Rafael Mendonça França rafaelfranca merged commit 9b8ee8e into from
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Apr 2, 2013
  1. zackham

    Escape multibyte line terminators in JSON encoding

    zackham authored
    Currently, json/encoding respects the JSON spec (as it should) which 
    disallows \n and \r inside strings, escaping them as expected.
    
    Unfortunately, ECMA-262 (Javascript) disallows not only \n and \r in 
    strings, but "Line Terminators" which includes U+2028 and U+2029. 
    See here: http://bclary.com/2004/11/07/#a-7.3
    
    This pull request adds U+2028 and U+2029 to be escaped.
    
    # Why? 
    
    It's very common to see something like this in a Rails template:
    
    <script type="text/javascript"> 
    var posts = <%= @posts.to_json %>;
    </script>
    
    If U+2028 or U+2029 are part of any attributes output in the to_json
    call, you will end up with an exception.
    In Chrome: Uncaught SyntaxError: Unexpected token ILLEGAL 
    
    # Why not?
    
    This is JSON encoding, and the JSON spec is specific about how to 
    encode strings. U+2028 and U+2029 don't get special treatment.
    
    Just trying to start a discussion... what do you do in your apps
    to deal with this? Is there a convention I'm missing?
This page is out of date. Refresh to see the latest.
Showing with 4 additions and 2 deletions.
  1. +4 −2 activesupport/lib/active_support/json/encoding.rb
6 activesupport/lib/active_support/json/encoding.rb
View
@@ -98,6 +98,8 @@ def check_for_circular_references(value)
"\010" => '\b',
"\f" => '\f',
"\n" => '\n',
+ "\xe2\x80\xa8" => '\u2028',
+ "\xe2\x80\xa9" => '\u2029',
"\r" => '\r',
"\t" => '\t',
'"' => '\"',
@@ -121,9 +123,9 @@ class << self
def escape_html_entities_in_json=(value)
self.escape_regex = \
if @escape_html_entities_in_json = value
- /[\x00-\x1F"\\><&]/
+ /\xe2\x80(\xa8|\xa9)|[\x00-\x1F"\\><&]/
else
- /[\x00-\x1F"\\]/
+ /\xe2\x80(\xa8|\xa9)|[\x00-\x1F"\\]/
end
end
Something went wrong with that request. Please try again.