Skip to content

Never use rbtree to implemented SortedSet. #451

Open
wants to merge 1 commit into from

4 participants

@xaviershay

It is at least an order of magnitude slower on all operations. I guess Ruby 2 is much faster than its predecessors!

As a bonus, links to source code should now show up in RDoc.

Benchmarks: https://gist.github.com/xaviershay/7520037

Redmine: https://bugs.ruby-lang.org/issues/9121

@xaviershay xaviershay Never use rbtree to implemented SortedSet.
It is at least an order of magnitude slower on all operations.
c76a2f8
@zzak
Ruby Programming Language member
zzak commented Nov 18, 2013

@knu ping!

As maintainer of lib/set.rb, Akinori-san should be able to review and commit this.

Personally, I like this change! :clap: You had me at the RDoc bonus, but the simplicity is nice!

@xaviershay

What's missing here is I don't have a good hypothesis for why rbtree is so much slower, but if others can replicate the benchmark I think it's ok to figure that out later if anyone wants to try speeding up the pure ruby version again.

@zzak
Ruby Programming Language member
zzak commented Nov 18, 2013

Agreed, although last time I checked rbtree was broken on 2.1.0 so eliminating that dependency may help keep stdlib stable.

@knu
Ruby Programming Language member
knu commented Nov 18, 2013

What's the benchmark numbers you get with ruby 1.9.3? Is it Hash in Ruby 2.1 that got faster or else?

@xaviershay

1.9.3 is slower across the board, but has the same difference between rbtree and not: https://gist.github.com/xaviershay/7520037#file-gistfile2-txt

@vjoel
vjoel commented Nov 18, 2013

These benchmarks miss the point of using rbtree, which is to pay a small insertion cost to keep the structure sorted so that ordered lookups are fast. If you only need to perform lookups after the structure is built, then rbtree is a waste of time.

See https://gist.github.com/vjoel/7535917.

Also, this part of the benchmark is particularly misleading:

b.report("#to_a #{n} items") {
  10000.times { s.to_a }
}

because the non-rbtree implementation caches the #to_a output and will never drop that cache inside of the above loop.

@xaviershay

Thanks @vjoel , that's an obvious case I should have seen. I still think this change should be applied:

  1. Radically changing the performance characteristics of a structure depending on what gems the user has loaded seems unacceptable to me.
  2. The rbtree version is trivial to implement yourself if you want to use it.
  3. Having a dependency in the stdlib on an externally maintained gem seems bad. 3a. ... particularly if that dependency doesn't work on trunk (I don't know if that's true or not)
  4. Removes a fairly terrible setup method with string evals, puts the methods back into rdoc.
@vjoel
vjoel commented Nov 19, 2013

@xaviershay I agree, mostly. The two implementations are not implementations of the same concept.

Apparently, your use-case works better with lazy sorting (the stdlib when require 'rbtree' fails). There are definitely use-cases where eager sorting is needed. IMHO, most people will think of SortedSet as being "eager" in this sense. (For example, see the Java 7 docs http://docs.oracle.com/javase/7/docs/api/java/util/SortedSet.html--this interface has two implementations, based on trees and skiplists, which are both maintain sorted structures on insertion, rather than waiting for lookups.) Perhaps a lazily sorted set should be a separate class called LazilySortedSet.

I vote for keeping SortedSet as it is using rbtree and adding rbtree to the stdlib. But I've been saying that for years...

@xaviershay

I vote for keeping SortedSet as it is using rbtree and adding rbtree to the stdlib.

I could support that. Seems obvious that a language comes with an efficient tree structure.

@xaviershay

@knu , want to make call on this one way or the other so we can close it out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.