nth-child broken for DocumentFragment #672

Open
maximkulkin opened this Issue May 7, 2012 · 2 comments

Projects

None yet

2 participants

@maximkulkin

Task: get last div (with text "3").

d = Nokogiri::HTML::Document.parse('<html><body><div>1</div><div>2</div><div>3</div></body></html>')
p d.css('div:last-child')
# => [
#  #<Nokogiri::XML::Element:0x769abc name="div" children=[#<Nokogiri::XML::Text:0x769814 "3">]>
# ]

This works fine.

d = Nokogiri::HTML::Document.parse('<div>1</div><div>2</div><div>3</div>')
p d.css('div:last-child')
# => [
#   #<Nokogiri::XML::Element:0x7972b4 name="div" children=[#<Nokogiri::XML::Text:0x797048 "3">]>
# ]

Surprisingly this also works fine. Let's try using :root CSS3 selector:

d = Nokogiri::HTML::Document.parse('<div>1</div><div>2</div><div>3</div>')
p d.css(':root:last-child')
# => [
#   #<Nokogiri::XML::Element:0x96936c name="html" children=[
#     #<Nokogiri::XML::Element:0x96c274 name="body" children=[
#       #<Nokogiri::XML::Element:0x96bedc name="div" children=[#<Nokogiri::XML::Text:0x96bba8 "1">]>,
#       #<Nokogiri::XML::Element:0x970090 name="div" children=[#<Nokogiri::XML::Text:0x96fd70 "2">]>,
#       #<Nokogiri::XML::Element:0x96faa0 name="div" children=[#<Nokogiri::XML::Text:0x96f80c "3">]>
#     ]>
#   ]>
# ]

Ok. Now we see that it implicitly wraps content with "html + body" tags.

Let's try the same with DocumentFragment class

d = Nokogiri::HTML::DocumentFragment.parse('<div>1</div><div>2</div><div>3</div>')
p d.css('div:last-child')
# => [
#   #<Nokogiri::XML::Element:0x97dca4 name="div" children=[#<Nokogiri::XML::Text:0x9db354 "1">]>,
#   #<Nokogiri::XML::Element:0x97dc68 name="div" children=[#<Nokogiri::XML::Text:0x9de414 "2">]>,
#   #<Nokogiri::XML::Element:0x97dc2c name="div" children=[#<Nokogiri::XML::Text:0x9dd9c4 "3">]>
# ]

Nope. It doesn't work.

d = Nokogiri::HTML::DocumentFragment.parse('<div>1</div><div>2</div><div>3</div>')
p d.css(':root:last-child')
# => [
#   #<Nokogiri::XML::Element:0x97dca4 name="div" children=[#<Nokogiri::XML::Text:0x9db354 "1">]>,
#   #<Nokogiri::XML::Element:0x97dc68 name="div" children=[#<Nokogiri::XML::Text:0x9de414 "2">]>,
#   #<Nokogiri::XML::Element:0x97dc2c name="div" children=[#<Nokogiri::XML::Text:0x9dd9c4 "3">]>
# ]

Doesn't work also. Let's try with full body

d = Nokogiri::HTML::DocumentFragment.parse('<html><body><div>1</div><div>2</div><div>3</div></body></html>')
p d.css('div:last-child')
# => [
#   #<Nokogiri::XML::Element:0x96eccc name="div" children=[#<Nokogiri::XML::Text:0x97c3e0 "1">]>,
#   #<Nokogiri::XML::Element:0x96ec90 name="div" children=[#<Nokogiri::XML::Text:0x97bbac "2">]>,
#   #<Nokogiri::XML::Element:0x96ec68 name="div" children=[#<Nokogiri::XML::Text:0x97e730 "3">]>
# ]

Nope. Seems like nth-child CSS3 selector is broken for DocumentFragment.

@flavorjones
Member

Hello!

Thanks for asking this question, and pointing out some obvious inconsistencies in Nokogiri's fragment handling.

First, though, let's tackle the ominous semantic implications here. DocumentFragment is supposed to, as closely as possible, represent a "document" of some sort that has multiple roots. Normal HTML and XML documents, as I'm sure you know, can only have one root node. But with a DocumentFragment, the sky is the limit!

We've got another open issue that raises similar semantic questions: #656, which deals with NodeSet (which again represents multiple nodes, though in a different semantic way).

Look at the implementation of DocumentFragment#css to see what we've done, exactly: https://github.com/tenderlove/nokogiri/blob/master/lib/nokogiri/xml/document_fragment.rb#L76-82

Nokogiri iterates over each of the root nodes, and returns a NodeSet of the results.

So, I guess what I'm asking is, when you ask for :nth-child from a document with multiple roots, how should it behave? Arguably, Nokogiri is doing the right thing above, though I admit it's not entirely obvious at first blush.

Second, though, let me ask the obvious question: why doesn't DocumentFragment#xpath behave the same way? Just like in the awesome "Coming to America" bit where the guy asks the waiter to taste the soup (see http://www.youtube.com/watch?v=CKsSvR5u-qk for details), I have to just say, "Ah ha. Ah ha."

This is most certainly a bug in Nokogiri -- #css and #xpath should always have the same semantics, however arguably-incorrect they may be.

To sum up: I'm nominating this one for the 2.0 roadmap along with #656.

@maximkulkin

Mike,

Assuming DocumentFragment has #parse() method and the name points that it can represent a part of document (thus not restricted by normal HTML/XML rules), I understand DocumentFragment as an analogue to Document which can have multiple roots. I expect that I would be able reference any particular root element.

@flavorjones flavorjones removed the 2.0 label Jan 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment