encoding with ruby 1.9 #26

tomoyuki28jp · 2010-11-22T09:08:37Z

Is this a bug?


irb(main):012:0> utf8_string = "日本語"
=> "日本語"
irb(main):012:0> utf8_string.encoding
=> #<Encoding:UTF-8>
irb(main):013:0> escaped = Loofah.scrub_fragment(utf8_string, :escape).to_s
=> "&#26085;&#26412;&#35486;"
irb(main):015:0> escaped.encoding
=> #<Encoding:US-ASCII>
irb(main):016:0> escaped.encode('UTF-8')
=> "&#26085;&#26412;&#35486;"
irb(main):019:0> escaped.force_encoding('UTF-8')
=> "&#26085;&#26412;&#35486;"

Software versions:

Ubuntu 10.04
rvm 1.0.20
ruby 1.9.2p0
nokogiri (1.4.4)
loofah (1.0.0.beta.1.20101025234603)

The text was updated successfully, but these errors were encountered:

tomoyuki28jp · 2010-11-22T10:33:05Z

It seems like utf-8 characters are also escaped. Maybe it's intended behavior?

flavorjones · 2010-11-22T13:38:04Z

Agree, this looks like a bug:

utf8_string = "日本語"

doc = Nokogiri::HTML::DocumentFragment.parse utf8_string
doc.to_s # => "日本語"
doc.to_s.encoding # => #<Encoding:UTF-8>

doc = Loofah::HTML::DocumentFragment.parse utf8_string
doc.to_s # => "&#26085;&#26412;&#35486;"
doc.to_s.encoding # => #<Encoding:US-ASCII>

flavorjones · 2010-11-22T17:28:55Z

fixed on master by 00e91cc and c3514cb

thanks, @tenderlove!

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encoding with ruby 1.9 #26

encoding with ruby 1.9 #26

tomoyuki28jp commented Nov 22, 2010

tomoyuki28jp commented Nov 22, 2010

flavorjones commented Nov 22, 2010

flavorjones commented Nov 22, 2010

encoding with ruby 1.9 #26

encoding with ruby 1.9 #26

Comments

tomoyuki28jp commented Nov 22, 2010

Software versions:

tomoyuki28jp commented Nov 22, 2010

flavorjones commented Nov 22, 2010

flavorjones commented Nov 22, 2010