Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding with ruby 1.9 #26

Closed
tomoyuki28jp opened this issue Nov 22, 2010 · 3 comments
Closed

encoding with ruby 1.9 #26

tomoyuki28jp opened this issue Nov 22, 2010 · 3 comments

Comments

@tomoyuki28jp
Copy link

Is this a bug?


irb(main):012:0> utf8_string = "日本語"
=> "日本語"
irb(main):012:0> utf8_string.encoding
=> #<Encoding:UTF-8>
irb(main):013:0> escaped = Loofah.scrub_fragment(utf8_string, :escape).to_s
=> "&#26085;&#26412;&#35486;"
irb(main):015:0> escaped.encoding
=> #<Encoding:US-ASCII>
irb(main):016:0> escaped.encode('UTF-8')
=> "&#26085;&#26412;&#35486;"
irb(main):019:0> escaped.force_encoding('UTF-8')
=> "&#26085;&#26412;&#35486;"

Software versions:

  • Ubuntu 10.04
  • rvm 1.0.20
  • ruby 1.9.2p0
  • nokogiri (1.4.4)
  • loofah (1.0.0.beta.1.20101025234603)
@tomoyuki28jp
Copy link
Author

It seems like utf-8 characters are also escaped. Maybe it's intended behavior?

@flavorjones
Copy link
Owner

Agree, this looks like a bug:

utf8_string = "日本語"

doc = Nokogiri::HTML::DocumentFragment.parse utf8_string
doc.to_s # => "日本語"
doc.to_s.encoding # => #<Encoding:UTF-8>

doc = Loofah::HTML::DocumentFragment.parse utf8_string
doc.to_s # => "&#26085;&#26412;&#35486;"
doc.to_s.encoding # => #<Encoding:US-ASCII>

@flavorjones
Copy link
Owner

fixed on master by 00e91cc and c3514cb

thanks, @tenderlove!

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants