Immutable TreeMap/TreeSet performance (SI-5331) #82

Merged
merged 37 commits into from Feb 16, 2012

Projects

None yet

6 participants

@erikrozendaal

As discussed on the scala-internals mailing list. There are some further improvements related to performance and code.

Note: This breaks test/files/run/t2873.scala, since it uses RedBlack#Empty for testing for the absence of a compiler bug. I don't know how to fix this test so that it still tests for the same compiler bug.

erikrozendaal added some commits Dec 17, 2011
@erikrozendaal erikrozendaal Use RedBlack.iterator to create iterators for TreeSet/TreeMap.
This turns iterator creation from an O(n) operation into an
O(log n) operation. Unfortunately, it halves actual iteration
speed (consuming the iterator fully), probably due to the many
by-name closures that are needed.
540ad02
@erikrozendaal erikrozendaal Use custom implementation for iterating over RedBlack trees. Raw
performance is much better than '++' based iterator.
88ed930
@erikrozendaal erikrozendaal Optimized implementations of head/headOption/last/lastOption for
TreeMap/TreeSet.
edcec03
@erikrozendaal erikrozendaal Optimized implementation of init/tail for TreeSet/TreeMap. 9cdede8
@erikrozendaal erikrozendaal RedBlack.scala: Change count from 'def' to 'val' in NonEmpty tree
to ensure TreeSet/TreeMap 'range' operations are O(log n) instead
of O(n).
b7e6714
@erikrozendaal erikrozendaal Implemented drop/take/slice/splitAt/dropRight/takeRight for
TreeMap/TreeSet by splitting the underlying RedBlack tree. This
makes the operation O(log n) instead of O(n) and allows more
structural sharing.
95cb7bc
@erikrozendaal erikrozendaal Implemented takeWhile/dropWhile/span to use tree splitting. This
changes the operation from O(n log n) to O(n) and allows for more
structural sharing.
8d67823
@erikrozendaal erikrozendaal Switched from isSmaller to ordering. 3f66061
@erikrozendaal erikrozendaal Moved from implicit ordering value to implicit parameter. 7ec9b0b
@erikrozendaal erikrozendaal Moved from Empty case object to case class in preparation of moving
type parameter A.
a02a815
@erikrozendaal erikrozendaal Moved type parameter A from RedBlack to Tree. 418adc6
@erikrozendaal erikrozendaal Changed abstract class RedBlack to singleton object. 6c0e036
@erikrozendaal erikrozendaal Use single shared Empty instance across all RedBlack trees. d2706db
@erikrozendaal erikrozendaal Make sure the redblack test compiles and runs. 6b95074
@erikrozendaal erikrozendaal Made RedBlack private to the scala.collection.immutable package.
Use ArrayStack instead of Stack in TreeIterator for slightly increased
performance.
b9699f9
@erikrozendaal erikrozendaal TreeMap/TreeSet no longer keep track of the size (the RedBlack tree
already does so).
32171c2
@erikrozendaal erikrozendaal Performance improvements for iteration (foreach and iterator). b421bba
@erikrozendaal erikrozendaal Improved performance of RedBlack.NonEmpty.nth (helps take/drop/split/…
…etc).
4a0c4bb
@erikrozendaal erikrozendaal Added some tests for TreeMap/TreeSet. ad0b09c
@erikrozendaal erikrozendaal Minimize number of calls to ordering. c51bdea
@paulp
paulp commented Dec 28, 2011

That's awesome. Have you signed the SCA? http://www.scala-lang.org/sites/default/files/contributor_agreement.pdf I can't merge this much work until that's in place.

@erikrozendaal

Thanks for reminding me. Just mailed the signed SCA to Danielle at EPFL according to the instructions.

@dcsobral

What is the gain of making it private? I could see having it private right from the beginning, but removing access to something that is public -- without even removing the code itself -- ought to bring some gain.

Sadly, it will make testing stuff on REPL a real pain.

Mainly to make it possible to further improve the implementation (performance) without causing more pain later, such as replacing the Empty singleton instance with null and relying less on polymorphism.

Right now seems a good time because RedBlack has changed so much anyway that it is definitely no longer source (or binary) compatible.

BTW, object these classes still leak out of TreeMap/TreeSet when serializing. Should we use custom serialization to avoid breaking this in the future? Is compatible serialization guaranteed by the Scala library at all?

I didn't realize -- or, rather, the coin didn't drop -- that the changes were that big on the source code compatibility front. If that's the case, I can only agree that it is a good time to make it private.

@dcsobral

Since you put the test itself on the package, why do you need to import its contents?

Hmm... yes, this could be moved into "object Test" instead...

@dcsobral dcsobral and 2 others commented on an outdated diff Dec 28, 2011
src/library/scala/collection/immutable/RedBlack.scala
def isEmpty: Boolean
def isBlack: Boolean
- def lookup(x: A): Tree[B]
- def update[B1 >: B](k: A, v: B1): Tree[B1] = blacken(upd(k, v))
- def delete(k: A): Tree[B] = blacken(del(k))
- def range(from: Option[A], until: Option[A]): Tree[B] = blacken(rng(from, until))
- def foreach[U](f: (A, B) => U)
- def toStream: Stream[(A,B)]
+ def lookup(x: A)(implicit ordering: Ordering[A]): Tree[A, B]
+ def update[B1 >: B](k: A, v: B1)(implicit ordering: Ordering[A]): Tree[A, B1] = blacken(upd(k, v))
+ def delete(k: A)(implicit ordering: Ordering[A]): Tree[A, B] = blacken(del(k))
+ def range(from: Option[A], until: Option[A])(implicit ordering: Ordering[A]): Tree[A, B] = blacken(rng(from, until))
+ def foreach[U](f: ((A, B)) => U)
@dcsobral
dcsobral Dec 28, 2011

This is wrong. By passing the Ordering on each method, you make it possible for the tree to be inconsistent. Sure, someone has to work at it, and having the class private helps prevent such misuse, but the very possibility indicates the ordering is in the wrong place. I really dislike this design.

Instead, the tree itself should have an Ordering (and getting A back), which kind of gets back to RedBlack being a class and its subclasses having a hidden pointer to it -- which would be required to access the Ordering. However, as long as TreeMap and TreeSet don't extend RedBlack, it shouldn't be a problem. RedBlack becomes just a store for the Ordering shared by the nodes.

@erikrozendaal
erikrozendaal Dec 29, 2011

This was another reason I feel RedBlack (as it currently is in this pull request) should be private[immutable]. This way TreeMap/TreeSet can ensure a consistent Ordering[A] is always used.

Making RedBlack a class again could help with this. The costs are additional instances of this class (including the Empty nested object) and the additional $$outer pointer in the Tree nodes (this may or may not cause additional overhead, depending on alignment, etc). At least on MacOS X JDK 1.6.0_29 with compressed oops there was additional memory usage per node.

So definitely a tradeoff between performance and potential correctness.

@viktorklang
viktorklang Dec 29, 2011

That's an understandable tradeoff. conserving memory is important for datastructures.

@dcsobral dcsobral and 2 others commented on an outdated diff Dec 29, 2011
src/library/scala/collection/immutable/RedBlack.scala
- def iterator: Iterator[(A, B)] =
- left.iterator ++ Iterator.single(Pair(key, value)) ++ right.iterator
+ override def foreach[U](f: ((A, B)) => U) {
+ if (left ne Empty.Instance) left foreach f
@dcsobral
dcsobral Dec 29, 2011

I'm pretty sure this is a pessimization. You introduced two test conditions that will be executed on every node except empty ones . Much better to have Empty do nothing.

@erikrozendaal
erikrozendaal Dec 29, 2011

Strangely enough this makes a huge different in foreach. Looping over a random TreeMap[Int, Int] size 10,000:

  • without guard conditions: ~3150 times per second
  • with guard conditions: ~7100 times per second

For a TreeSet[Int] of the same size:

  • without guard conditions: ~3000 times per second
  • with guard conditions: ~6000 times per second

(TreeSet is slower than TreeMap, I suspect due to unboxing of Ints when invoking the supplied foreach body)

See https://gist.github.com/1507127 for the basic benchmark used (there must be an easier way to do this though).

This is why I feel using plain nulls and less polymorpishm will give another speed boost. For example, the TreeSet "head" method is still about 3 times slower than the java.util.Map "first", probably due to the need to a needed method call in RedBlack.NonEmpty.smallest.

@viktorklang
viktorklang Dec 29, 2011

nullchecks are gigacheap

@dcsobral
dcsobral Dec 29, 2011

That's bizarre. Are you sure this was the only thing changed on the benchmark? On the other hand, if are keeping the method this way, you might as well move the whole foreach to Tree, put a self condition for the empty tree, and make it final. Experiments with Range have shown this (making the method final in the class it is called) gives the best performance possible.

@erikrozendaal
erikrozendaal Dec 29, 2011

I think pointer comparisons are just a lot faster than polymorphic calls. For a tree of size n there will be n+1 pointers to the empty node, so we save on half the calls to foreach. It may also help that the call sites to left.foreach() and right.foreach() are now monomorphic, maximizing the efficiency of the inline method call cache.

Moving the code to Tree and making it final is a bit harder though. Tree doesn't have access to key/value/left/right, so would need to do downcasting. Again, using null to represent the empty node and moving key/value/left/right to Tree would be optimal but is quite a complicated undertaking...

@erikrozendaal

I've updated the RedBlack implementation to use null for representing the empty tree. Also the iterators were further optimized. You can find the implementation at https://github.com/erikrozendaal/scala/blob/redblack-with-null/src/library/scala/collection/immutable/RedBlack.scala.

Below you can see a performance comparison between the tree from this pull request (up to commit erikrozendaal@c51bdea), the null based redblack tree without inlining the accessors to the Tree fields, null based with inlining, java.util.TreeMap and scala.collection.immutable.HashMap. All measurements are based on a map with 10,000 random integer keys/value pairs. Run with Java 1.6.0_29 with -d64 -server. Numbers are operations per second.

pull/82with nulls, use getterswith nulls, inlined gettersjava.util.TreeMapimmutable.HashMap
lookup15,331,92125,107,20825,288,54224,927,77246,726,788
add1,876,6832,768,5165,216,6536,415,887
add & remove630,9041,261,4972,225,2053,193,6982,933,961
head30,097,99941,659,02969,986,84266,009,21546,456,097
last25,671,17336,402,42163,042,95767,400,887
iterator (create)3,850,42326,174,51627,073,71158,404,392180,170,441
iterator (first)3,741,93425,085,60525,207,52142,033,23146,981,884
iterator (all)3,59411,83611,9259,3294,652
foreach8,09915,37515,2587,058
drop(5)269,805669,568732,468
take(5)1,529,7283,544,9436,643,459
splitAt(size / 2)204,768451,053567,126
range(-1e8, 1e8)385,453712,859901,27453,214,983
range(-1e8, 1e8),headOption379,835700,441885,70812,531,721

As you can see the null based implementation is 2-4x faster than the one with the Empty singleton node. When inlining the accessors it is also very close to performance of the java version. The iteration optimization make it even faster than the Java TreeMap or the immutable.HashMap. The main area where the java.util.TreeMap still wins is with the range related operations. In the Java version subMap creates a view (O(1)) while the Scala version constructs a new tree (O(log n)).

Questions:

  • Is this new code acceptable?
  • If yes, shall I update this pull request or create a new one after this one is merged?
@paulp
paulp commented Jan 5, 2012

It looks pretty good to me but I have to count on the others in this thread to help me spot issues as I have no time for the fine toothed comb treatment. I'm sorry you had to resort to "@(inline @getter)" -- if you can put together some evidence tht direct field access is significantly faster, maybe we can convince martin to make it less difficult to achieve.

@dcsobral
dcsobral commented Jan 6, 2012

On Thu, Jan 5, 2012 at 18:42, Erik Rozendaal
reply@reply.github.com
wrote:

I've updated the RedBlack implementation to use null for representing the empty tree. Also the iterators were further optimized. You can find the implementation at https://github.com/erikrozendaal/scala/blob/redblack-with-null/src/library/scala/collection/immutable/RedBlack.scala.

Below you can see a performance comparison between the tree from this pull request (up to commit erikrozendaal@c51bdea), the null based redblack tree without inlining the accessors to the Tree fields, null based with inlining, java.util.TreeMap and scala.collection.immutable.HashMap. All measurements are based on a map with 10,000 random integer keys/value pairs. Run with Java 1.6.0_29 with -d64 -server. Numbers are operations per second.

I'm wondering if you shouldn't leave the current public implementation
of rb tree alone, and provide yours as a new private implementation,
plus the changes to TreeSet and TreeMap. The public implementation can
be deprecated, and we then look at the migration impacts of not having
it as ancestor to TreeSet/TreeMap.

Daniel C. Sobral

I travel to the future all the time.

@erikrozendaal

I'm wondering if you shouldn't leave the current public implementation
of rb tree alone, and provide yours as a new private implementation,
plus the changes to TreeSet and TreeMap. The public implementation can
be deprecated, and we then look at the migration impacts of not having
it as ancestor to TreeSet/TreeMap.

Yes, I thought about that too. Shouldn't be a problem and it may help avoid some level of source/binary compatibility. I wonder how many people really use RedBlack directly though. It has "implementation detail" written all over it (few docs, API not consistent with standard collections, etc). But maybe it's better to be safe than sorry :)

@paulp
paulp commented Jan 6, 2012

I think that's a great idea. Yes, we don't really have the luxury of assuming something isn't much used, we have to suffer the same restrictions on everything. A new implementation makes my life much simpler.

@ijuma
ijuma commented Jan 6, 2012

I had a look at the benchmark and it seems to me that operations are being measured without using the result. Is this true or did I miss something? If that is true, then benchmark results are potentially inaccurate.

@ijuma
ijuma commented Jan 6, 2012

To give a concrete example:

bench("head", tree, count) { _.head }

def bench(name: String, tree: TreeSet[Int], count: Int)(block: TreeSet[Int] => Unit): Unit = {
  val s1 = time {
    for (_ <- 1 to count) {
      block(tree)
    }
  }
  println("%-25s: size: %7d: %7d iterations in %6.3f seconds (%12.2f per second)".format(name, tree.size, count, s1, count / s1))
}
@erikrozendaal

(Micro)benchmarking hotspot is indeed very hard, especially to get consistent results, even from run to run :(

I originally started with an implementation of bench that returned the last result. Later I switched to unit to reduce benchmark overhead. This didn't seem to change relative speeds between the various different benchmarks, so I'm pretty sure hotspot isn't totally optimizing away the code.

Using the following version of bench:

def bench[A, B, C](name: String, tree: TreeMap[A, B], count: Int)(block: TreeMap[A, B] => C): C = {
  var result: C = null.asInstanceOf[C]
  val s1 = time {
    for (_ <- 1 to count) {
      result = block(tree)
    }
  }
  println("%-25s: size: %7d: %7d iterations in %6.3f seconds (%12.2f per second)".format(name, tree.size, count, s1, count / s1))
  result
}

Changes the head benchmarks to 31.5M / 44M for getters vs inlined getters respectively. So although the numbers change I don't think it affects the conclusion.

Finally, it is hard to tell the affect of micro-benchmark results on real world programs...

erikrozendaal added some commits Jan 6, 2012
@erikrozendaal erikrozendaal Deprecate TreeMap.isSmaller and TreeSet.isSmaller.
These methods were used by the old RedBlack tree implementation,
but are no longer required and were not defined in any interface.
Use ordering or compare instead.
7e92b3c
@erikrozendaal erikrozendaal Restore old RedBlack class to maintain backwards compatibility.
The class is marked as deprecated and no longer used by the TreeMap/TreeSet
implementation but is restored in case it was used by anyone else (since
it was not marked as private to the Scala collection library).

Renamed RedBlack.{Tree,RedTree,BlackTree} to Node, RedNode, and BlackNode
to work around name clash with RedBlack class.
f656142
@erikrozendaal

I restored the old implementation of RedBlack class in the commit erikrozendaal@f656142 and marked the class as deprecated.

This required renaming the {Red,Black}Tree classes in the RedBlack object to avoid naming conflicts. I renamed them to {Red,Black}Node. Another option is to rename the RedBlack object to something else instead (so it is no longer a companion of the deprecated RedBlack class).

Let me know what you think and I'll adjust it and add it to this pull request.

@dcsobral

I think it would be preferable to avoid the existing public object altogether, so that it can be removed when the deprecation expires.

The two RedBlack implementations are totally independent (one is a class, the other an object) that happen to share the same name. But your comment convincend me that the updated implementation should have a new name. So I'll rename it to RedBlackTree and put it in its own file.

@erikrozendaal erikrozendaal Renamed object RedBlack to RedBlackTree.
This more clearly separates the new implementation from the now
deprecated abstract class RedBlack and avoids naming conflicts
for the member classes.
288874d
@erikrozendaal

I added the null based implementation to this pull request. The old RedBlack class has also been restored and the new implementation is named RedBlackTree. I wonder if it is worth the effort to rewrite the history so that all binary compatible changes are at the start and the new RedBlackTree based implementation is added later. This would make integration with the next 2.9.x release easier. I could also just make a separate pull request for 2.9.x that only contains the binary compatible changes (mainly improvements to head/last/iterator/toStream).

@ijuma
ijuma commented Jan 7, 2012

The fact that micro-benchmarks on HotSpot are hard is more reason not to break the rules. There is enough uncertainty as it is without adding more. Also, if you want to reduce overhead, do not use Range (use a while loop). At least until the optimization efforts on that are finished.

@erikrozendaal erikrozendaal Tests for takeWhile/dropWhile/span.
Also simplified implementation of span to just use splitAt.
e61075c
@erikrozendaal

Good point! I'll rerun the benchmarks using the following benchmark method:

  def bench[A, B](name: String, tree: TreeMap[A, B], count: Int)(block: TreeMap[A, B] => Int => Int): Unit = {
    print("%-25s: ".format(name))
    val f = block(tree)
    var result = 0
    val elapsed = time {
      var i = 0
      while (i < count) {
        result += f(i)
        i += 1
      }
    }
    println("size: %7d: %7d iterations in %6.3f seconds (%12.2f per second): result = %s".format(tree.size, count, elapsed, count / elapsed, result))
  }

    val benches = Map[String, (TreeMap[java.lang.Integer, java.lang.Integer] => Int => Int)](
      "lookup" -> { tree => n => if (tree.contains(rnd(n))) 1 else 0 },
      "add" -> { tree => n => tree.updated(rnd(n), n).size },
      "add-remove" -> { tree => n => (tree.updated(rnd(n), n) - rnd(n)).size },
      "head" -> { tree => n => tree.head._1 },
      "last" -> { tree => n => tree.last._1 },
      "tail" -> { tree => n => tree.tail.size },
      "init" -> { tree => n => tree.init.size },
      "iterator" -> { tree => n => if (tree.iterator.hasNext) 1 else 0 },
      "iterator.next" -> { tree => n => tree.iterator.next._1 },
      "iterator.size" -> { tree => n => tree.iterator.size },
      // etc...
  )

I'll also run one benchmark at a time (in a single JVM), this seems to give best/most consistent results.

@dcsobral
dcsobral commented Jan 7, 2012

You should use caliper to benchmark. Look at the fur foreach benchmark
project on my github as a basis. It's easy to compare multiple compiler
versions with it too.
Em 07/01/2012 16:50, "Erik Rozendaal" <
reply@reply.github.com>
escreveu:

Good point! I'll rerun the benchmarks using the following benchmark method:

 def bench[A, B](name: String, tree: TreeMap[A, B], count: Int)(block:
TreeMap[A, B] => Int => Int): Unit = {
   print("%-25s: ".format(name))
   val f = block(tree)
   var result = 0
   val elapsed = time {
     var i = 0
     while (i < count) {
       result += f(i)
       i += 1
     }
   }
   println("size: %7d: %7d iterations in %6.3f seconds (%12.2f per
second): result = %s".format(tree.size, count, elapsed, count / elapsed,
result))
 }

   val benches = Map[String, (TreeMap[java.lang.Integer,
java.lang.Integer] => Int => Int)](
     "lookup" -> { tree => n => if (tree.contains(rnd(n))) 1 else 0 },
     "add" -> { tree => n => tree.updated(rnd(n), n).size },
     "add-remove" -> { tree => n => (tree.updated(rnd(n), n) -
rnd(n)).size },
     "head" -> { tree => n => tree.head._1 },
     "last" -> { tree => n => tree.last._1 },
     "tail" -> { tree => n => tree.tail.size },
     "init" -> { tree => n => tree.init.size },
     "iterator" -> { tree => n => if (tree.iterator.hasNext) 1 else 0 },
     "iterator.next" -> { tree => n => tree.iterator.next._1 },
     "iterator.size" -> { tree => n => tree.iterator.size },
     // etc...
 )

I'll also run one benchmark at a time (in a single JVM), this seems to
give best/most consistent results.


Reply to this email directly or view it on GitHub:
#82 (comment)

@erikrozendaal

Here are the updated benchmark results. Complete results can be found at https://gist.github.com/1576263 (except for the java.util.TreeMap output, which accidentally overwrote). Numbers changed, sometimes radically, but I think the main conclusions are still the same.

#82with nulls, use getterswith nulls, inlined gettersjava.util.TreeMapimmutable.HashMap
lookup3,801,7797,362,1957,397,6777,572,08524,217,850
add1,192,2011,794,5932,427,5494,528,338
add & remove392,738867,4021,016,9173,528,6672,465,620
head28,917,10340,490,92861,891,42757,270,76041,516,223
last25,028,03632,569,79352,351,47656,938,347
iterator (create)4,667,90327,797,58427,981,03472,102,692366,207,914
iterator (consume first)4,567,21124,348,63424,282,58037,049,43242,535,377
foreach7,22812,54112,6949,231
iterator (consume all)3,58310,0529,8959,1476,202
drop(5)256,134632,225696,058
take(5)2,044,9544,463,0867,529,279
splitAt(size / 2)188,260367,508484,105
range(-1e8, 1e8).head382,497693,286880,98112,473,028
@erikrozendaal

Here are the results for OpenJDK build 1.7.0-u4-b05-20120106. It's nice to see the pattern match code got a decent speed boost (good news for more idiomatic Scala code with a lot of pattern matching). Also inlining of field access has less effect now. Still, the null based code is still the faster version of RedBlack.

PS Don't forget that the iterator based comparison is not fair between "#82" and the null based versions because additional algorithmic improvements were made in the meantime.
PPS All these numbers are running in 64-bit mode with compressed OOPS enabled, just like the previous 1.6.0_29 based results.

#82 (pattern matching, singleton empty)with nulls, use getterswith nulls, inlined gettersjava.util.TreeMapimmutable.HashMap
lookup4,507,1307,299,7007,240,0647,181,13829,847,773
add1,746,6362,534,2942,553,9124,696,720
add & remove756,1581,072,5701,060,5223,496,4552,608,751
head25,772,81543,935,45358,751,12259,795,62343,097,982
last22,547,78337,729,65352,286,15859,816,398
iterator (create)5,353,56428,918,38628,923,35473,329,891375,772,141
iterator (consume first)5,572,96423,698,29223,899,98535,495,58642,454,982
foreach8,05412,52312,4119,541
iterator (consume all)5,27411,10513,52110,5748,997
drop(5)394,074749,614698,918
take(5)3,680,5817,920,0217,778,135
splitAt(size / 2)351,272500,936527,641
range(-1e8, 1e8).head645,592945,271958,67525,801,491
@pavelpavlov

Erik,

Please take a look at 'to' method implementation.
From the 4 subrange methods defined in scala.collection.generic.Sorted three (from, range, until) are implemented as single call to rangeImpl unlike 'to' which require two calls to 'rangeImpl' + creating and twice advancing subtree iterator in the worst case.

I wonder if it's worth to reimplement this method in TreeMap/TreeSet for better performance?

Also, it would be interesting to benchmark these four methods, especially "to(5)" vs. "until(6)".
I believe fast subrange creation is one of important scenarios for use of TreeSet/TreeMap.

@erikrozendaal erikrozendaal Optimized implementation of TreeMap/TreeSet#to method.
Performance of `to` and `until` is now the same.
00b5cb8
@erikrozendaal

Hi Pavel,

I benchmarked the range operations and improved the implementation a bit such that to now has the same performance as until.

old tooptimized tojava.util.TreeMap
range(-1e8, 1e8).head984,096938,33825,801,491
from(1).head1,339,8411,214,76123,861,585
to(0).last551,2141,132,46728,157,299
until(1).last1,148,8341,137,26529,828,129

As you can see the java.util.TreeMap version is still much faster due to the use of views instead of building new trees. So to really get competitive with the Java version we need to support views (or ranged iterators, or something) and probably also implement methods like lower/floor/ceiling/higher, like java.util.NavigableMap.

@pavelpavlov

Hi Erik,

It's very cool that you could achieve such impressive performance improvements for your scenario.
However, I'm thinking about another performance-sensitive scenario: when we have zillion short-lived maps of moderate size each (say from 1 to 100 elements) instead of your scenario of having one map with zillion elements.

In that case another overheads come to the fore.
For example, looking into your last commit, you create 4 temporary objects on each call to range: two Options and two functions. Replacing option test here with null drops 2 of them and also gives faster calls to after and before in rng.

Also, you check ***Inclusive invariants each time when after or before is called. It's better to create specialized closure for each case in range.

In general, it would be great to benchmark the scenario I mentionned above and get the real numbers.

Regards,
Pavel

@erikrozendaal erikrozendaal Custom coded version of range/from/to/until.
This avoids unnecessary allocation of Option and Function objects,
mostly helping performance of small trees.
7824dbd
@paulp
paulp commented Jan 23, 2012

FYI since everyone is doing such a fine job shaking the tree, I figure I'll let that continue a while longer. (In other words, I haven't looked seriously at this patch and intend to let it mature as far as it can before I do.)

@erikrozendaal

Pavel,

I've some more improvements to range/from/to/until and take/drop/slice, basically by specializing RedBlackTree.rng for each of these cases. I'll try to get the benchmark numbers and patches posted here tonight CET.

erikrozendaal added some commits Jan 22, 2012
@erikrozendaal erikrozendaal Custom implementations of drop/take/slice.
This mainly helps performance when comparing keys is expensive.
78374f3
@erikrozendaal erikrozendaal Removed TODOs. 51667dc
@erikrozendaal

Here are the benchmark numbers related to the last few commits. I had to change the benchmarks somewhat to get decent numbers with small trees. So instead of getting the head or last element I just get the size of the resulting tree. For some benchmarks the random seed also had to be changed. All noted in the table below.

masterdefault to (f26f610)optimized to (00b5cb8)specialized from/to/until/range (7824dbd)specialized drop/take/slice (78374f3)
Size 10000, Seed 1234
range(-1e8, 1e8).head984,096938,3381,274,9071,274,907
from(1).head1,339,8411,214,7611,358,4721,358,472
to(0).last551,2141,132,4671,208,3181,208,318
until(1).last1,148,8341,137,2651,246,5511,246,551
Size 10000, Seed 1234
drop(5).size762,210761,143783,775825,173
take(5).size8,613,2326,491,0559,433,66914,746,586
splitAt(size / 2).size578,984504,674616,517654,141
slice(3, size - 3).size373,338348,044397,642414,725
Size 10, Seed 1234
range(-1e9, 1e9).size8,213,66111,389,1618,291,25014,930,02815,113,249
from(1).size8,630,65713,060,92711,276,41514,540,51214,371,185
to(0).size1,052,9893,840,60011,524,85614,487,38714,262,957
until(1).size7,775,26111,946,96411,365,58614,217,50214,295,698
Size 10, Seed 1235
drop(5).size281,20211,900,66510,404,72012,763,17614,397,612
take(5).size347,41611,831,71810,036,92715,386,88016,705,482
splitAt(size / 2).size587,8405,683,2904,796,2277,146,4728,209,405
slice(3, size - 3).size320,6066,529,8635,025,3849,531,22511,963,432

I think I'm pretty much done with performance optimizations. I think it is now more interesting to discuss how we can get this patch into 2.10 and also see if there are binary compatible parts that can be put into 2.9. Maybe at least deprecate RedBlack starting with 2.9.2?

@dcsobral

A migration warning on TreeSet/TreeMap is necessary as well. I don't think there's any need to rush deprecation to 2.9.2 -- it can be deprecated on 2.10.

@erikrozendaal

So the migration warning should go into 2.9.x branch? Or 2.10? Warning about RedBlack inheritance, isSmaller, etc?

@dcsobral

Actually, I've changed my mind. I was thinking about warning about the fact that TreeSet no longer extends RedBlack, but any dependency on it will result in a compile error, so there's no need for it.

@erikrozendaal

So, anything else that needs to be done before merging this into 2.10?

@paulp
paulp commented Feb 14, 2012

I will interpret the petering out of discussion as implied endorsement by the involved parties if someone doesn't say otherwise. (It's still possible I personally will discover objections, but let's hope not.)

@paulp paulp merged commit 51667dc into scala:master Feb 16, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment