Skip to content

Commit 8f8397e

Browse files
Brett Cartersteveklabnik
authored andcommitted
Remove unicode character encoding from ActiveSupport::JSON.encode
The encoding scheme (e.g. ☠ -> "\u2620") was broken for characters not in the Basic Multilingual Plane. It is possible to escape them for json using the weird encoding scheme of a twelve-character sequence representing the UTF-16 surrogate pair (e.g. '𠜎' -> "\u270e\u263a") but this wasn't properly handled in the escaping code. Since raw UTF-8 is allowed in json, it was decided to simply pass through the raw bytes rather than attempt to escape them.
1 parent f447240 commit 8f8397e

3 files changed

Lines changed: 23 additions & 10 deletions

File tree

activesupport/CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,10 @@
11
## Rails 4.0.0 (unreleased) ##
2+
* Remove surrogate unicode character encoding from ActiveSupport::JSON.encode
3+
The encoding scheme was broken for unicode characters outside the basic multilingual plane;
4+
since json is assumed to be UTF-8, and we already force the encoding to UTF-8 simply pass through
5+
the un-encoded characters.
6+
7+
*Brett Carter*
28

39
* Deprecate `Time.time_with_date_fallback`, `Time.utc_time` and `Time.local_time`.
410
These methods were added to handle the limited range of Ruby's native Time

activesupport/lib/active_support/json/encoding.rb

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -129,13 +129,7 @@ def escape_html_entities_in_json=(value)
129129

130130
def escape(string)
131131
string = string.encode(::Encoding::UTF_8, :undef => :replace).force_encoding(::Encoding::BINARY)
132-
json = string.
133-
gsub(escape_regex) { |s| ESCAPED_CHARS[s] }.
134-
gsub(/([\xC0-\xDF][\x80-\xBF]|
135-
[\xE0-\xEF][\x80-\xBF]{2}|
136-
[\xF0-\xF7][\x80-\xBF]{3})+/nx) { |s|
137-
s.unpack("U*").pack("n*").unpack("H*")[0].gsub(/.{4}/n, '\\\\u\&')
138-
}
132+
json = string.gsub(escape_regex) { |s| ESCAPED_CHARS[s] }
139133
json = %("#{json}")
140134
json.force_encoding(::Encoding::UTF_8)
141135
json

activesupport/test/json/encoding_test.rb

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -112,21 +112,34 @@ def test_hash_encoding
112112

113113
def test_utf8_string_encoded_properly
114114
result = ActiveSupport::JSON.encode('€2.99')
115-
assert_equal '"\\u20ac2.99"', result
115+
assert_equal '"€2.99"', result
116116
assert_equal(Encoding::UTF_8, result.encoding)
117117

118118
result = ActiveSupport::JSON.encode('✎☺')
119-
assert_equal '"\\u270e\\u263a"', result
119+
assert_equal '"✎☺"', result
120120
assert_equal(Encoding::UTF_8, result.encoding)
121121
end
122122

123123
def test_non_utf8_string_transcodes
124124
s = '二'.encode('Shift_JIS')
125125
result = ActiveSupport::JSON.encode(s)
126-
assert_equal '"\\u4e8c"', result
126+
assert_equal '""', result
127127
assert_equal Encoding::UTF_8, result.encoding
128128
end
129129

130+
def test_wide_utf8_chars
131+
w = '𠜎'
132+
result = ActiveSupport::JSON.encode(w)
133+
assert_equal '"𠜎"', result
134+
end
135+
136+
def test_wide_utf8_roundtrip
137+
hash = { string: "𐒑" }
138+
json = ActiveSupport::JSON.encode(hash)
139+
decoded_hash = ActiveSupport::JSON.decode(json)
140+
assert_equal "𐒑", decoded_hash['string']
141+
end
142+
130143
def test_exception_raised_when_encoding_circular_reference_in_array
131144
a = [1]
132145
a << a

0 commit comments

Comments
 (0)