Pull request #9 changed the behavior of node name uniqueness, and this went out in release 0.9.3.
In the discussion on that pull request (#9 (comment)), it was pointed out "there is existing code which is using duplicate node names already, and this fix will break that code", which is true and broke my code which was using version 0.8.3.
From what I can tell, the reasoning behind the fix was twofold:
I propose that a tree that looks like this should be perfectly valid:
| +---> deep
The fact that there are two nodes with the name "deep" in this tree is irrelevant. They are not the same node, they just happen to have the same name.
Conceptually, when you retrieve nodes out of the tree, you do it like so:
If we expected all node names to be unique and used as a key within the entire tree, then we would expect this to work:
But it doesn't, and it shouldn't.
It's worth noting that a node's name is used as an implicit id in two ways:
Rather than fundamentally changing the behavior of the tree to match the documentation, I think the following changes should be made:
Fix for Issue #24. Allow duplicates in non-sibling nodes.
The current code was too restrictive about duplicate nodes in the tree, whereas
the implementation requires only sibling node names to be unique.
Note that the uniqueness is defined (currently) by the node name.
Thanks for highlighting this issue. Mea culpa. I had myself put in the comment about the possibility of code already using duplicates.
I have pushed out a fix with a test case (0.9.3pre, f7f6377) into the master branch now, which seems to be handling the scenario correctly.
This check does need to be configurable though, as there might be scenarios (ref #9) where a truly unique node set is required in the tree. Any ideas appreciated. I was thinking about a add_nodups method, but it will not allow the convenience of the << operation. Perhaps a decorator, or a class-level configurator of some sort …
While it should be possible to implement the comparison within the tree's context (i.e., taking into account the parent node, parentage/edges, etc.), it might make the system too restrictive (similar to what happened here).
The Enumerable.sort does return nodes based on the node name right now, which seems to be a good default, and in any case, Enumerable.sort_by can be used for any other ordering.
Any thoughts on the above?
@evolve75 Thanks for jumping on this so quickly.
I'm not convinced it does need to be configurable. PR #9 did not actually claim to have a use case for globally unique node names, he just said he was making the behavior consistent with the documentation.
In fact, I posit that configurable behavior is incorrect: the node names should not be globally unique. A node in a tree is defined by the combination of its name and its position in the tree: ie, its "path". "root->one->deep" simply has nothing to do with "root->two->deep" despite the same leaf node name.
Tree::TreeNode#<=> is the only place where #name is actually used as a unique identifier across the entire tree. So, for the sake of conceptual integrity, that should be fixed; while true that the reason to implement #<=> is to adhere to the Comparable interface, the intent of Comparable is to actually compare the nodes. The node name does not identify a node, so simply comparing on the name is an incomplete and inadequate comparison. As stated above, a node is defined by its path; as such, to correctly implement Comparable you should compare the entire path, not just the name.
And there is no scenario in which comparing an entire node that is not in a tree to a node that already is in a tree makes sense. Before a node has been added to the tree, it is not "equal" to a node in the tree that happens to have the same name. "root->one->deep" is not equal to "root->two->deep" OR to "someothertree->deep" OR to "deep".
And finally, I would say that having Enumerable#sort return nodes sorted based on the node name is actually a pretty crappy default. A "good" default would be, say, the same order as either #each or #breadth_each.
For example, say you have the tree we're talking about:
And you call #sort on it. The resulting [deep, deep, one, root, two] is completely meaningless. I, and I think anyone who thinks they're dealing with a tree data structure, would expect it to be either [root, one, deep, two, deep] or [root, one, two, deep, deep].
[deep, deep, one, root, two]
[root, one, deep, two, deep]
[root, one, two, deep, deep]
Thanks for the detailed feedback. I have released the rubytree-0.9.3pre gem. Can you please check if it addresses the issue with globally unique names ([sudo] gem install rubytree --pre)?
[sudo] gem install rubytree --pre
For the <=> operator, your points are valid, but we will need to think through on what other implications would this change result in.
Also, this will necessarily have a particular traversal mechanism imposed on the sort, and we will also need to traverse the tree (at least for the edges and parentage involved for the two nodes) to address arbitrary node compares.
I just fired up my old code which is using siblings with same name, is there any way to reenable this again?
So "root->one" is not the same as "root->one->one"
Could we get this "feature" back?
I have released R0.9.3 which fixes this issue. The node names are no longer required to be globally unique. They will need to be unique between siblings though.
Please update to the latest version and let me know if the behavior is as expected.
The new release takes care of the issue globally unique node names. I am working out the comparison operator logic for the nodes (<=>), and will update it in a subsequent release. Given the breakage the unique node _mis-_feature might have caused to existing code, I wanted to get this fixed quickly.