Skip to content

Commit

Permalink
Remove unicode character encoding from ActiveSupport::JSON.encode
Browse files Browse the repository at this point in the history
The encoding scheme (e.g. ☠ -> "\u2620") was broken for characters
not in the Basic Multilingual Plane.  It is possible to escape them
for json using the weird encoding scheme of a twelve-character
sequence representing the UTF-16 surrogate pair (e.g. '𠜎' ->
"\u270e\u263a") but this wasn't properly handled in the escaping code.
Since raw UTF-8 is allowed in json, it was decided to simply pass
through the raw bytes rather than attempt to escape them.
  • Loading branch information
Brett Carter authored and steveklabnik committed Dec 14, 2012
1 parent f447240 commit 8f8397e
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 10 deletions.
6 changes: 6 additions & 0 deletions activesupport/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
## Rails 4.0.0 (unreleased) ##
* Remove surrogate unicode character encoding from ActiveSupport::JSON.encode
The encoding scheme was broken for unicode characters outside the basic multilingual plane;
since json is assumed to be UTF-8, and we already force the encoding to UTF-8 simply pass through
the un-encoded characters.

*Brett Carter*

* Deprecate `Time.time_with_date_fallback`, `Time.utc_time` and `Time.local_time`.
These methods were added to handle the limited range of Ruby's native Time
Expand Down
8 changes: 1 addition & 7 deletions activesupport/lib/active_support/json/encoding.rb
Original file line number Diff line number Diff line change
Expand Up @@ -129,13 +129,7 @@ def escape_html_entities_in_json=(value)

def escape(string)
string = string.encode(::Encoding::UTF_8, :undef => :replace).force_encoding(::Encoding::BINARY)
json = string.
gsub(escape_regex) { |s| ESCAPED_CHARS[s] }.
gsub(/([\xC0-\xDF][\x80-\xBF]|
[\xE0-\xEF][\x80-\xBF]{2}|
[\xF0-\xF7][\x80-\xBF]{3})+/nx) { |s|
s.unpack("U*").pack("n*").unpack("H*")[0].gsub(/.{4}/n, '\\\\u\&')
}
json = string.gsub(escape_regex) { |s| ESCAPED_CHARS[s] }
json = %("#{json}")
json.force_encoding(::Encoding::UTF_8)
json
Expand Down
19 changes: 16 additions & 3 deletions activesupport/test/json/encoding_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -112,21 +112,34 @@ def test_hash_encoding

def test_utf8_string_encoded_properly
result = ActiveSupport::JSON.encode('€2.99')
assert_equal '"\\u20ac2.99"', result
assert_equal '"€2.99"', result
assert_equal(Encoding::UTF_8, result.encoding)

result = ActiveSupport::JSON.encode('✎☺')
assert_equal '"\\u270e\\u263a"', result
assert_equal '"✎☺"', result
assert_equal(Encoding::UTF_8, result.encoding)
end

def test_non_utf8_string_transcodes
s = '二'.encode('Shift_JIS')
result = ActiveSupport::JSON.encode(s)
assert_equal '"\\u4e8c"', result
assert_equal '""', result
assert_equal Encoding::UTF_8, result.encoding
end

def test_wide_utf8_chars
w = '𠜎'
result = ActiveSupport::JSON.encode(w)
assert_equal '"𠜎"', result
end

def test_wide_utf8_roundtrip
hash = { string: "𐒑" }
json = ActiveSupport::JSON.encode(hash)
decoded_hash = ActiveSupport::JSON.decode(json)
assert_equal "𐒑", decoded_hash['string']
end

def test_exception_raised_when_encoding_circular_reference_in_array
a = [1]
a << a
Expand Down

1 comment on commit 8f8397e

@steveklabnik
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This closed #3727

Please sign in to comment.