Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issues with 1.9.2 on Solaris #46

Closed
jdelStrother opened this issue Jun 29, 2011 · 2 comments
Closed

Encoding issues with 1.9.2 on Solaris #46

jdelStrother opened this issue Jun 29, 2011 · 2 comments
Milestone

Comments

@jdelStrother
Copy link

Hi,
I'm having some encoding problems with RDiscount 1.6.8. I'm not convinced that they're RDiscount's fault, but I'm not quite sure how to narrow it down any further.

Some strings in our database are producing invalid encodings when passed through RDiscount. Certain character sets seem to cause trouble - for example, the tamil character ழ - U+0BB4.

  s = "\u0BB4\n"
  '\x%X\x%X\x%X' % s.each_byte.to_a #=> "\xE0\xAE\xB4"

  RDiscount.new(s).to_html #=> "<p>\xE0\xAE</p>\n"
  RDiscount.new(s).to_html.valid_encoding? # => false

So in the original string, that codepoint is represented with the bytes 0xE0,0xAE,0xB4, but after rdiscounting we end up with just 0xE0,0xAE.

This is on a Solaris box running ruby 1.9.2p180. An OS X box running the same ruby & rdiscount version works fine. What else can I look at to pinpoint the problem some more?

@davidfstr
Copy link
Owner

Verified that this issue does not repro on Ruby 1.9.3p362 on OS X 10.7.5. I will need to spin up a Solaris VM to continue investigating.

@davidfstr
Copy link
Owner

Looks like my latest Unicode fix on master takes care of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants