Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Escape multibyte line terminators in JSON encoding #10057

Merged
merged 1 commit into from

8 participants

@zackham

Currently, json/encoding respects the JSON spec (as it should) which disallows \n and \r inside strings, escaping them as expected.

Unfortunately, ECMA-262 (Javascript) disallows not only \n and \r in strings, but "Line Terminators" which includes U+2028 and U+2029. See here: http://bclary.com/2004/11/07/#a-7.3

This pull request adds U+2028 and U+2029 to be escaped.

Why? 

It's very common to see something like this in a Rails template:

<script type="text/javascript"> 
var posts = <%= @posts.to_json %>;
</script>

If U+2028 or U+2029 are part of any attributes output in the to_json call, you will end up with an exception. In Chrome: Uncaught SyntaxError: Unexpected token ILLEGAL.

In other words, if one of your users pastes something into a textarea that happens to include these fancy unicode line terminators, and you run to_json on that model and stick it in a template, that page is probably broken now.

Why not?

This is JSON encoding, and the JSON spec is specific about how to encode strings. U+2028 and U+2029 don't get special treatment.

That being said, this is non-obvious, counterintuitive, and can be tough to debug (https://www.google.com/?q=u2028).

What do you do in your apps to deal with this? Is there a convention I'm missing?

@zackham zackham Escape multibyte line terminators in JSON encoding
Currently, json/encoding respects the JSON spec (as it should) which 
disallows \n and \r inside strings, escaping them as expected.

Unfortunately, ECMA-262 (Javascript) disallows not only \n and \r in 
strings, but "Line Terminators" which includes U+2028 and U+2029. 
See here: http://bclary.com/2004/11/07/#a-7.3

This pull request adds U+2028 and U+2029 to be escaped.

# Why? 

It's very common to see something like this in a Rails template:

<script type="text/javascript"> 
var posts = <%= @posts.to_json %>;
</script>

If U+2028 or U+2029 are part of any attributes output in the to_json
call, you will end up with an exception.
In Chrome: Uncaught SyntaxError: Unexpected token ILLEGAL 

# Why not?

This is JSON encoding, and the JSON spec is specific about how to 
encode strings. U+2028 and U+2029 don't get special treatment.

Just trying to start a discussion... what do you do in your apps
to deal with this? Is there a convention I'm missing?
9b8ee8e
@knapo

+1

@cmaruz

The following error is raised in the activesupport's test suite:

/vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `require': /vagrant/patch1/rails/activesupport/lib/active_support/json/encoding.rb:126: too short escaped multibyte character: /\xe2\x80(\xa8|\xa9)|[\x00-\x1F"\\><&]/ (SyntaxError)
too short escaped multibyte character: /\xe2\x80(\xa8|\xa9)|[\x00-\x1F"\\]/
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `block in require'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:213:in `load_dependency'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `require'
    from /vagrant/patch1/rails/activesupport/lib/active_support/json.rb:2:in `<top (required)>'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `require'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `block in require'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:213:in `load_dependency'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `require'
    from /vagrant/patch1/rails/activesupport/test/core_ext/duration_test.rb:4:in `<top (required)>'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `require'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `block in require'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:213:in `load_dependency'
    from /vagrant/patch1/rails/activesupport/lib/active_support/dependencies.rb:228:in `require'
    from /home/vagrant/.rvm/gems/ruby-2.0.0-p0@global/gems/rake-10.0.4/lib/rake/rake_test_loader.rb:10:in `block (2 levels) in <main>'
    from /home/vagrant/.rvm/gems/ruby-2.0.0-p0@global/gems/rake-10.0.4/lib/rake/rake_test_loader.rb:9:in `each'
    from /home/vagrant/.rvm/gems/ruby-2.0.0-p0@global/gems/rake-10.0.4/lib/rake/rake_test_loader.rb:9:in `block in <main>'
    from /home/vagrant/.rvm/gems/ruby-2.0.0-p0@global/gems/rake-10.0.4/lib/rake/rake_test_loader.rb:4:in `select'
    from /home/vagrant/.rvm/gems/ruby-2.0.0-p0@global/gems/rake-10.0.4/lib/rake/rake_test_loader.rb:4:in `<main>'
rake aborted!
@solnic

+9001

@rafaelfranca rafaelfranca referenced this pull request from a commit
@rafaelfranca rafaelfranca Merge branch 'fix-json-encoding'
This is the compination of #10057 and 10534.

Closes #10320
e4ec944
@rafaelfranca rafaelfranca merged commit 9b8ee8e into rails:master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Apr 2, 2013
  1. @zackham

    Escape multibyte line terminators in JSON encoding

    zackham authored
    Currently, json/encoding respects the JSON spec (as it should) which 
    disallows \n and \r inside strings, escaping them as expected.
    
    Unfortunately, ECMA-262 (Javascript) disallows not only \n and \r in 
    strings, but "Line Terminators" which includes U+2028 and U+2029. 
    See here: http://bclary.com/2004/11/07/#a-7.3
    
    This pull request adds U+2028 and U+2029 to be escaped.
    
    # Why? 
    
    It's very common to see something like this in a Rails template:
    
    <script type="text/javascript"> 
    var posts = <%= @posts.to_json %>;
    </script>
    
    If U+2028 or U+2029 are part of any attributes output in the to_json
    call, you will end up with an exception.
    In Chrome: Uncaught SyntaxError: Unexpected token ILLEGAL 
    
    # Why not?
    
    This is JSON encoding, and the JSON spec is specific about how to 
    encode strings. U+2028 and U+2029 don't get special treatment.
    
    Just trying to start a discussion... what do you do in your apps
    to deal with this? Is there a convention I'm missing?
This page is out of date. Refresh to see the latest.
Showing with 4 additions and 2 deletions.
  1. +4 −2 activesupport/lib/active_support/json/encoding.rb
View
6 activesupport/lib/active_support/json/encoding.rb
@@ -98,6 +98,8 @@ def check_for_circular_references(value)
"\010" => '\b',
"\f" => '\f',
"\n" => '\n',
+ "\xe2\x80\xa8" => '\u2028',
+ "\xe2\x80\xa9" => '\u2029',
"\r" => '\r',
"\t" => '\t',
'"' => '\"',
@@ -121,9 +123,9 @@ class << self
def escape_html_entities_in_json=(value)
self.escape_regex = \
if @escape_html_entities_in_json = value
- /[\x00-\x1F"\\><&]/
+ /\xe2\x80(\xa8|\xa9)|[\x00-\x1F"\\><&]/
else
- /[\x00-\x1F"\\]/
+ /\xe2\x80(\xa8|\xa9)|[\x00-\x1F"\\]/
end
end
Something went wrong with that request. Please try again.