Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSX HTML::Pipeline::MarkdownFilter Fails on Right Double Quotation Mark around email address #173

Closed
ericgoodwin opened this issue Feb 5, 2015 · 8 comments

Comments

@ericgoodwin
Copy link

When using the HTML::Pipeline::MarkdownFilter on a string containing a "Right Double Quotation Mark" (U+201D) around an email address the output html will include an invalid byte sequence when trying to autolink it as a mailto:

I'm only having this issue on OSX. I'm running 10.10.2.

To reproduce:

renderer = HTML::Pipeline.new([HTML::Pipeline::MarkdownFilter]).freeze
renderer.to_html("This is  an “test@example.com” example").split

This is really a bug within github-markdown, but I'm submitting it here as github-markdown doesn't seem to have a Github repository. I've also tried using Redcloth and it fails as well.

ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-darwin14.0]
# Nokogiri (1.6.5)
    ---
    warnings: []
    nokogiri: 1.6.5
    ruby:
      version: 2.1.5
      platform: x86_64-darwin14.0
      description: ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-darwin14.0]
      engine: ruby
    libxml:
      binding: extension
      source: packaged
      libxml2_path: "/Users/ericgoodwin/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin14.1.0/libxml2/2.9.2"
      libxslt_path: "/Users/ericgoodwin/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/nokogiri-1.6.5/ports/x86_64-apple-darwin14.1.0/libxslt/1.1.28"
      libxml2_patches:
      - 0001-Revert-Missing-initialization-for-the-catalog-module.patch
      - 0002-Fix-missing-entities-after-CVE-2014-3660-fix.patch
      libxslt_patches:
      - 0001-Adding-doc-update-related-to-1.1.28.patch
      - 0002-Fix-a-couple-of-places-where-f-printf-parameters-wer.patch
      - 0003-Initialize-pseudo-random-number-generator-with-curre.patch
      - 0004-EXSLT-function-str-replace-is-broken-as-is.patch
      - 0006-Fix-str-padding-to-work-with-UTF-8-strings.patch
      - 0007-Separate-function-for-predicate-matching-in-patterns.patch
      - 0008-Fix-direct-pattern-matching.patch
      - 0009-Fix-certain-patterns-with-predicates.patch
      - 0010-Fix-handling-of-UTF-8-strings-in-EXSLT-crypto-module.patch
      - 0013-Memory-leak-in-xsltCompileIdKeyPattern-error-path.patch
      - 0014-Fix-for-bug-436589.patch
      - 0015-Fix-mkdir-for-mingw.patch
      compiled: 2.9.2
      loaded: 2.9.2
@ericgoodwin
Copy link
Author

cc/ @gjtorikian @mdiep

@ericgoodwin
Copy link
Author

It looks like it is thinking the Right Double Quotation Mark is part of the email address when doing the autolink. This could be related to this bug. vmg/redcarpet#388

@jch
Copy link
Contributor

jch commented Feb 5, 2015

@ericgoodwin do you have a backtrace of the error?

@ericgoodwin
Copy link
Author

Not sure it's much help because there is only one line that isn't Pry, but here it is.

[2] pry(main)> renderer.to_html("“test@example.com”").split
ArgumentError: invalid byte sequence in UTF-8
from (pry):2:in `split'
[3] pry(main)> wtf
Exception: ArgumentError: invalid byte sequence in UTF-8
--
0: (pry):2:in `split'
1: (pry):2:in `__pry__'
2: /Users/ericgoodwin/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/pry-0.10.1/lib/pry/pry_instance.rb:355:in `eval'
3: /Users/ericgoodwin/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/pry-0.10.1/lib/pry/pry_instance.rb:355:in `evaluate_ruby'
4: /Users/ericgoodwin/.rbenv/versions/2.1.5/lib/ruby/gems/2.1.0/gems/pry-0.10.1/lib/pry/pry_instance.rb:323:in `handle_line'

The error can also be created by just using github-markdown as well.

[7] pry(main)> GitHub::Markdown.to_html("“test@example.com”", :gfm).split
ArgumentError: invalid byte sequence in UTF-8
from (pry):4:in `split'

@ericgoodwin
Copy link
Author

Just found out it's an issue with Ruby versions as well.
Pasting the same text ("“test@example.com”") into irb Ruby 1.9 vs 2.1.3

Ruby 2.1.3

irb(main):001:0> "“test@example.com”"
=> "“test@example.com”"

Ruby 1.9

irb(main):001:0> "\U+FFE2test@example.com\U+FFE2"
=> "test@example.com"

It would appear that Ruby 1.9 is converting the characters automatically.

@jch
Copy link
Contributor

jch commented Feb 6, 2015

@ericgoodwin 👍 thanks for digging in. I'm going to close this issue for now because I can't think of anything specific to take action on. If you find more details about the encoding, please continue to comment here. I'm happy to reopen if there is a fix that can be applied against the gem.

@jch jch closed this as completed Feb 6, 2015
@ericgoodwin
Copy link
Author

@jch Ok. Wasn't quite sure where to post this in the first place. I might keep on adding a few comments on here if that's ok with you so anyone else who runs into the issue can find some information on it. The issue seems to be with the github-markdown binary.

@jch
Copy link
Contributor

jch commented Feb 6, 2015

👍 Yep keep 'em coming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants