DocumentFragment.parse of certain attribute values mangles document on JRuby #747

Closed
jmcnevin opened this Issue Aug 14, 2012 · 2 comments

Comments

Projects
None yet
3 participants

Nokogiri 1.5.5
JRuby 1.6.7.2

There seems to be an issue on JRuby with attribute values beginning with something that looks like a protocol (namely, characters, and then a colon). This causes the document to be parsed incorrectly.

I created a small script to test this:

require 'nokogiri'

test_string = <<-EOF
<p>This is a sample document that has been created as an example of a link to a file that is not an .html document.</p>
<p>
  <img src="embedded:image1.png" alt="image1.png" />
</p>
EOF

doc = Nokogiri::HTML::DocumentFragment.parse(test_string)

puts doc.to_s

CRuby:

<p>This is a sample document that has been created as an example of a link to a file that is not an .html document.</p>
<p>
  <img src="embedded:image1.png" alt="image1.png"></p>

JRuby:

<p>This is a sample document that has been created as an example of a link to a file that is not an .html document.</p>
<p>
  <img image1.png="">
</p>
Owner

yokolet commented Aug 19, 2012

Hello!

I got below on Nokogiri master and JRuby 1.7.0.preview2:

<p>This is a sample document that has been created as an example of a link to a file that is not an .html document.</p>
<p>
  <img alt="image1.png" src="embedded:image1.png">
</p>

So, it looks the bug has been fixed by another bug fix.

Member

jvshahid commented Nov 21, 2013

Fixed.

jvshahid closed this Nov 21, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment