Skip to content

RFE: Replace immutable.Vector with a faster implementation (src included) #3724

@scabug

Description

@scabug

The data structure used in collection.immutable.Vector is quite fast, but it's possible to do even better. I created a port of Clojure's PersistentVector?, reimplementing it from the ground up within the Scala 2.8 collections framework. The underlying data structure is a bitmapped dense trie, rather than the skip-list-like structure used by the Vector in the standard library. Because of the implementation as a dense trie, the locality of reference for Clojure's vector is substantially better than the Vector in the Scala standard library. It also lends itself nicely to a manual loop unrolling optimization which allows HotSpot? to inline almost all of the calls into O(6) array accesses.

I've done some extensive benchmarking, the results of which I've included below. It looks like my port of Clojure's PersistentVector? is around 50% faster than scala.collection.immutable.Vector on both random and sequential reads (index-based reads, not traversal). Performance is almost equivalent for writes. The port of Clojure's PersistentVector? uses almost 50% less memory on writes, and exactly the same memory for reads (i.e. none). The only operation which the stdlib Vector seems to do better than the port of Clojure's is the reverse operation. The raw results are included at the bottom of this ticket.

As I mentioned, this is a complete rewrite of Clojure's PersistentVector?. It retains the BSD license and source attribution just because I wanted to play it safe, but the source is now so far removed from the original that we can probably discard that (though I'd like to hear from Rich before we just delete the header).

It is worth noting that the stdlib Vector seems to support a fast prepend operation, whereas the dense trie data structure requires O(n) for this operation (append is still O(1)). In every other respect, the port of Clojure's PersistentVector? is either dramatically superior, or equivalent to the implementation in the stdlib. On top of it all, my port includes an update(Int, A) method, while this is glaringly absent from the stdlib API.

Is there any reason why we can't replace the stdlib Vector implementation with this one? APIs (such as updated, :+ and +:) could be added for backwards compatibility, but it seems to me that the performance characteristics of this port are so far beyond those of the stdlib that it really doesn't make any technical sense to stick with the current implementation.

Source and specs are attached.

Hardware: 2.66 Ghz Dual Core i7, 8 GB RAM, SATA2 SSD
OS:       MacOS X 10.6.4
JVM:      Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01-279, mixed mode)
JVM Args: -Xmx2048m -Xms512m -server

Setup:    All tests run 12 times with the highest and lowest results dropped.
          System.gc() was called after each test run.  Runtime#freeMemory()
          was used to measure memory usage and System.currentTimeMillis() was
          used to measure temporal performance.

My port of Clojure's Vector is represented by 'Vector', while the 
scala.collection.immutable.Vector class is represented by 'DefVector'.

================================================================================
  Fill 100000 Sequential Indexes
--------------------------------------------------------------------------------
            Vector Time: 11.174800ms
       ArrayBuffer Time: 1.823500ms
          Vector Memory: 23856.264648 KB
     ArrayBuffer Memory: 6266.987305 KB

        Time Difference: 9.351300ms
  Percentage Difference: 512.821497%

      Memory Difference: 17589.277344 KB
--------------------------------------------------------------------------------
            Vector Time: 11.174800ms
         DefVector Time: 11.018000ms
          Vector Memory: 23856.264648 KB
       DefVector Memory: 41946.000000 KB

        Time Difference: 0.156800ms
  Percentage Difference: 1.423126%

      Memory Difference: -18089.735352 KB
--------------------------------------------------------------------------------
            Vector Time: 11.174800ms
            IntMap Time: 9.771800ms
          Vector Memory: 23856.264648 KB
          IntMap Memory: 39322.117188 KB

        Time Difference: 1.403000ms
  Percentage Difference: 14.357641%

      Memory Difference: -15465.852539 KB
--------------------------------------------------------------------------------
            Vector Time: 11.174800ms
       Map[Int, _] Time: 49.916800ms
          Vector Memory: 23856.264648 KB
     Map[Int, _] Memory: 104858.377930 KB

        Time Difference: -38.742000ms
  Percentage Difference: -77.613148%

      Memory Difference: -81002.113281 KB
================================================================================

================================================================================
  Read 100000 Sequential Indexes
--------------------------------------------------------------------------------
            Vector Time: 0.939000ms
       ArrayBuffer Time: 0.186600ms
          Vector Memory: 0.031250 KB
     ArrayBuffer Memory: 0.031250 KB

        Time Difference: 0.752400ms
  Percentage Difference: 403.215434%

      Memory Difference: 0.000000 KB
--------------------------------------------------------------------------------
            Vector Time: 0.939000ms
         DefVector Time: 1.932000ms
          Vector Memory: 0.031250 KB
       DefVector Memory: 0.031250 KB

        Time Difference: -0.993000ms
  Percentage Difference: -51.397516%

      Memory Difference: 0.000000 KB
--------------------------------------------------------------------------------
            Vector Time: 0.939000ms
            IntMap Time: 7.413300ms
          Vector Memory: 0.031250 KB
          IntMap Memory: 0.031250 KB

        Time Difference: -6.474300ms
  Percentage Difference: -87.333576%

      Memory Difference: 0.000000 KB
--------------------------------------------------------------------------------
            Vector Time: 0.939000ms
       Map[Int, _] Time: 30.118900ms
          Vector Memory: 0.031250 KB
     Map[Int, _] Memory: 5242.539063 KB

        Time Difference: -29.179900ms
  Percentage Difference: -96.882356%

      Memory Difference: -5242.507813 KB
================================================================================

================================================================================
  Read 100000 Random Indexes
--------------------------------------------------------------------------------
            Vector Time: 10.785900ms
       ArrayBuffer Time: 0.226100ms
          Vector Memory: 0.031250 KB
     ArrayBuffer Memory: 0.031250 KB

        Time Difference: 10.559800ms
  Percentage Difference: 4670.411322%

      Memory Difference: 0.000000 KB
--------------------------------------------------------------------------------
            Vector Time: 10.785900ms
         DefVector Time: 18.195500ms
          Vector Memory: 0.031250 KB
       DefVector Memory: 0.031250 KB

        Time Difference: -7.409600ms
  Percentage Difference: -40.722157%

      Memory Difference: 0.000000 KB
--------------------------------------------------------------------------------
            Vector Time: 10.785900ms
            IntMap Time: 39.081400ms
          Vector Memory: 0.031250 KB
          IntMap Memory: 0.031250 KB

        Time Difference: -28.295500ms
  Percentage Difference: -72.401449%

      Memory Difference: 0.000000 KB
--------------------------------------------------------------------------------
            Vector Time: 10.785900ms
       Map[Int, _] Time: 39.914500ms
          Vector Memory: 0.031250 KB
     Map[Int, _] Memory: 7832.765625 KB

        Time Difference: -29.128600ms
  Percentage Difference: -72.977489%

      Memory Difference: -7832.734375 KB
================================================================================

================================================================================
  Reverse of Length 100000
--------------------------------------------------------------------------------
            Vector Time: 16.455500ms
       ArrayBuffer Time: 1.142500ms
          Vector Memory: 23457.923828 KB
     ArrayBuffer Memory: 7832.945313 KB

        Time Difference: 15.313000ms
  Percentage Difference: 1340.306346%

      Memory Difference: 15624.978516 KB
--------------------------------------------------------------------------------
            Vector Time: 16.455500ms
         DefVector Time: 4.231400ms
          Vector Memory: 23457.923828 KB
       DefVector Memory: 7833.148438 KB

        Time Difference: 12.224100ms
  Percentage Difference: 288.890202%

      Memory Difference: 15624.775391 KB
================================================================================

================================================================================
  Compute Length (100000)
--------------------------------------------------------------------------------
            Vector Time: 0.006600ms
       ArrayBuffer Time: 0.017300ms
          Vector Memory: 0.031250 KB
     ArrayBuffer Memory: 0.031250 KB

        Time Difference: -0.010700ms
  Percentage Difference: -61.849711%

      Memory Difference: 0.000000 KB
--------------------------------------------------------------------------------
            Vector Time: 0.006600ms
         DefVector Time: 0.016400ms
          Vector Memory: 0.031250 KB
       DefVector Memory: 0.031250 KB

        Time Difference: -0.009800ms
  Percentage Difference: -59.756098%

      Memory Difference: 0.000000 KB
--------------------------------------------------------------------------------
            Vector Time: 0.006600ms
            IntMap Time: 2.059400ms
          Vector Memory: 0.031250 KB
          IntMap Memory: 0.031250 KB

        Time Difference: -2.052800ms
  Percentage Difference: -99.679518%

      Memory Difference: 0.000000 KB
--------------------------------------------------------------------------------
            Vector Time: 0.006600ms
       Map[Int, _] Time: 0.013500ms
          Vector Memory: 0.031250 KB
     Map[Int, _] Memory: 0.031250 KB

        Time Difference: -0.006900ms
  Percentage Difference: -51.111111%

      Memory Difference: 0.000000 KB
================================================================================

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions