Functions or expressions raises exception in chained xpath call #656

Closed
Fitzsimmons opened this Issue Apr 16, 2012 · 3 comments

Comments

Projects
None yet
2 participants
Contributor

Fitzsimmons commented Apr 16, 2012

I think this test really says it all.

require 'rubygems'
require 'nokogiri'

fail_xml = <<-XML
<root>
  <stuff>thing</stuff>
</root>
XML

fail_doc = Nokogiri::XML(fail_xml)

puts fail_doc.xpath("//root").xpath("./stuff/text()").inspect #works
puts fail_doc.xpath("//root/stuff/text() = 'thing'").inspect #works
puts fail_doc.xpath("//root").xpath("./stuff/text() = 'thing'").inspect #crashes

Stacktrace:

/home/justinf/.rvm/gems/ruby-1.9.3-p0/gems/nokogiri-1.5.2/lib/nokogiri/xml/node_set.rb:135:in `|': node_set must be a Nokogiri::XML::NodeSet (ArgumentError)
    from /home/justinf/.rvm/gems/ruby-1.9.3-p0/gems/nokogiri-1.5.2/lib/nokogiri/xml/node_set.rb:135:in `block in xpath'
    from /home/justinf/.rvm/gems/ruby-1.9.3-p0/gems/nokogiri-1.5.2/lib/nokogiri/xml/node_set.rb:239:in `block in each'
    from /home/justinf/.rvm/gems/ruby-1.9.3-p0/gems/nokogiri-1.5.2/lib/nokogiri/xml/node_set.rb:238:in `upto'
    from /home/justinf/.rvm/gems/ruby-1.9.3-p0/gems/nokogiri-1.5.2/lib/nokogiri/xml/node_set.rb:238:in `each'
    from /home/justinf/.rvm/gems/ruby-1.9.3-p0/gems/nokogiri-1.5.2/lib/nokogiri/xml/node_set.rb:134:in `xpath'
    from test_constraint.rb:14:in `<main>'
Owner

flavorjones commented Apr 16, 2012

Hi!

Thanks for reporting this issue. I'd like to point out, though, that this is not a crash. This is an exception being raised. Precise language is important, so I'm going to change the title of this ticket from "causes crash" to "raises exception".

I'll take a look later today.

Owner

flavorjones commented Apr 19, 2012

Howdy,

Thanks for your patience. Looking at this, there's a bit of a semantic issue with NodeSet. Let's start the explanation with a slightly more complicated example:

require 'rubygems'
require 'nokogiri'

fail_xml = <<-XML
<root>
  <stuff>thing</stuff>
  <stuff>thang</stuff>
</root>
XML

fail_doc = Nokogiri::XML(fail_xml)

# baseline. returns a NodeSet containing two Elements
puts fail_doc.xpath("//root/stuff").inspect
# => [
#      #<Nokogiri::XML::Element:0xcc8120 name="stuff" children=[
#        #<Nokogiri::XML::Text:0xcc7f18 "thing">
#      ]>,
#      #<Nokogiri::XML::Element:0xcc7d10 name="stuff" children=[
#        #<Nokogiri::XML::Text:0xcc7b08 "thang">
#      ]>
#    ]

# returns a NodeSet, which contains two Text nodes
puts fail_doc.xpath("//root/stuff/text()").inspect #works
# => [#<Nokogiri::XML::Text:0x605680 "thing">, #<Nokogiri::XML::Text:0x6055a4 "thang">]

# also returns a NodeSet, which contains two Text nodes. there's magic here.
puts fail_doc.xpath("//root/stuff").xpath("./text()").inspect
# => [#<Nokogiri::XML::Text:0x605680 "thing">, #<Nokogiri::XML::Text:0x6055a4 "thang">]

# returns a bare boolean, answering the question "is there a 'stuff' node which contains text == 'thing'?"
puts fail_doc.xpath("//root/stuff/text() = 'thing'").inspect
# => true

This example demonstrates that the result of Node#xpath can be a NodeSet, or it could be a boolean (TrueClass or FalseClass).

(The result could also be nil, a String, a Float or a Fixnum. See the method Nokogiri_marshal_xpath_funcall_and_return_values in xml_xpath_context.c on master for details.)

There's some unobvious magic (really, an API design choice) involved in the third query. We're calling NodeSet#xpath on the result of Node#xpath and getting back the same result that we did when calling Node#xpath on the composite query.

There's some coercion going on here to make sure that the resulting NodeSet is flattened. I think we could arguably have chosen to return this:

puts fail_doc.xpath("//root/stuff").xpath("./text()").inspect
# => [[#<Nokogiri::XML::Text:0x605680 "thing">], [#<Nokogiri::XML::Text:0x6055a4 "thang">]]

And it would have been arguably correct (perhaps more semantic).

So here's my question for you: if we invoke a boolean xpath query on a NodeSet, what should we get back?

# what should this return?
puts fail_doc.xpath("//root/stuff").xpath("./text() = 'thing'").inspect
# option A => true
# option B => [true, false]

I'd argue that option B is probably what you expect. But that breaks from both of the established API conventions:

  1. boolean xpath expressions are return bare (that is, not contained by a NodeSet)
  2. NodeSet#xpath returns a flattened set of collected results from each node in the NodeSet.

And then there's the real complexity here: currently, NodeSet is an Enumerable that only knows how to contain Nodes. That is, we can't return a NodeSet containing Booleans. There's some nontrivial work to make that happen, if we decide that's the right thing to do.

So, I'm inclined defer a decision for a bit. Let's talk and make a decision, but wait until 2.0 to address this particular wart.

@tenderlove - thoughts?

Owner

flavorjones commented Feb 17, 2016

Closing, captured this wart in ROADMAP.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment