Kernel.Array() method fails to wrap text node #679

Closed
mislav opened this Issue May 15, 2012 · 25 comments

4 participants

@mislav
require 'nokogiri'
doc = Nokogiri('<a>b</a>')
text_node = doc.at('/a/text()')
Array(text_node)
#=> []

I expected: [<text_node>]

# Nokogiri (1.5.2)
    ---
    warnings: []
    nokogiri: 1.5.2
    ruby:
      version: 1.9.3
      platform: x86_64-darwin11.2.0
      description: ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-darwin11.2.0]
      engine: ruby
    libxml:
      binding: extension
      compiled: 2.7.8
      loaded: 2.7.8
@Serabe

@tenderlove maybe, moving the to_ary up to Nokogiri:: XML::Node would be best since now the behaviour is inconsistent between Nokogiri classes.

Though, I would rather suggest @mislav using Array[] since it doesn't rely on objects having a method and reverting this commit.

@tenderlove
Sparkle Motion member

Moving this implementation of to_ary up to Node seems wrong. Not all nodes are text nodes, and not all nodes should be treated as such when converting to an array.

The other thing we could do is make to_ary return [self], which would result in Array() returning an array with the text node inside. Either way, meh.

/cc @flavorjones @ender672

@tenderlove
Sparkle Motion member

Oops, forgot to cc @yokolet

@Serabe

I think the problem is using a core method in a way is not intended to.

Kernel#Array()
Returns arg as an Array. First tries to call arg.to_ary, then arg.to_a.

Array::[]
Returns a new array populated with the given objects.

to_ary and to_a does not make sense in a simple object just to use parenthesis instead of brackets.

@mislav

In case it isn't obvious, the only reason I'm using Array() in the first place is because I don't know if the passed object is a single node or an array of nodes.

Array(elements).each do |element|
  # ...
end
@Serabe

@mislav at always returns nil or a Node.

@mislav

@Serabe at is only for the purpose of demonstration of the bug. My user code is different. At several points, I want to avoid the check:

elements = [elements] unless Array === elements
elements.each ...
@Serabe

@mislav what I am saying is that either it is not a bug (which I believe) or that to_ary method should be moved to Nodesince a Text node is not that different from a simple Node.

@tenderlove
Sparkle Motion member

@Serabe The current behavior is obviously a bug. Why would you expect that Array(node) would return an empty list?

I'm reverting my patch for now, until we get more input. :-)

@tenderlove tenderlove reopened this May 15, 2012
@tenderlove tenderlove reopened this May 15, 2012
@mislav

That I have no opinion about. It's for you to discuss. The only thing I care about is that I don't end up with empty arrays when I use Array() with Nokogiri objects, since that breaks the principle of least surprise for me.

@Serabe

@tenderlove because if you expect to Array(any_node) to return [any_node] you should add to_a to Node, not to Text.

@tenderlove
Sparkle Motion member

@Serabe Node already implements to_a. Here we need to implement to_ary.

@Serabe

@tenderlove then docs are wrong in Kernel#Array

@tenderlove
Sparkle Motion member

@Serabe how so?

@Serabe

@tenderlove afaik, Kernel#Array should call to_a if object does not respond to_ary or returns nil. Let me check rubyspec.

@Serabe

@tenderlove the problem is that to_ain Node returns []. So it is a bug (my apologies about this).

@tenderlove
Sparkle Motion member
irb(main):001:0> require 'nokogiri'
=> true
irb(main):002:0> doc = Nokogiri('<a>b</a>')
=> #<Nokogiri::XML::Document:0x3fc0e5c48570 name="document" children=[#<Nokogiri::XML::Element:0x3fc0e5c481ec name="a" children=[#<Nokogiri::XML::Text:0x3fc0e5c35fb0 "b">]>]>
irb(main):003:0> text_node = doc.at('/a/text()')
=> #<Nokogiri::XML::Text:0x3fc0e5c35fb0 "b">
irb(main):004:0> text_node.to_a
=> []
irb(main):005:0>
@Serabe

@tenderlove and it is the expected behaviour since to_a comes from Enumerable and each goes through attributes. Then to_ary would go in Node, in which Enumerable is being included.

@tenderlove
Sparkle Motion member

@Serabe no problem! 😄

Our implementation of to_a returns a list of all the attributes for a particular node. to_a doesn't really make sense in the context of a text node since a text node can never have attributes. I think in this case we either need to remove to_a on the text node, or return [self] from to_ary so that Array() works.

@Serabe

@tenderlove then I would expect any node (since any node can come from a search) to respond to to_ary as [self] not just Text. Otherwise changing the query in @mislav's code could break it.

@flavorjones
Sparkle Motion member

@tenderlove, @serabe - sorry for jumping in so late.

My understanding is that, because XML::Node implements #each, and also mixes in Enumerable, then we get an implementation of #to_a. Ya?

I think we should reconsider a Node being Enumerable in Nokogiri 2.0. My strawman proposal is to remove it in favor of a better-defined Attributes class which would be Enumerable (and very Hash-y).

This would help us address some other lingering problems around, e.g., how SAX parsers handle node attributes; and the odd API we have around Node#attributes and XML::Attr.

Thoughts?

/cc @ender672 @yokolet

@Serabe
@tenderlove
Sparkle Motion member

I think we should reconsider a Node being Enumerable in Nokogiri 2.0. My strawman proposal is to remove it in favor of a better-defined Attributes class which would be Enumerable (and very Hash-y).

👍

I'd like to keep the hashy syntax for nodes, but removing enumerable seems fine.

@flavorjones
Sparkle Motion member

Slated for 2.0

@flavorjones flavorjones removed the 2.0 label Jan 2, 2015
@flavorjones
Sparkle Motion member

Closing, captured in ROADMAP.md for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment