Some high-performance immutable data structures #214
Around a month ago I said some stuff about having unfinished immutable data structures that I thought might be useful. It so happened I got a bit busy and hadn't had time to finish them properly, until around a week ago.
My goals were very performance oriented and I ran into a few dead-ends in that regard, so I decided to implement them in C#. I ran some benchmarks and I'm quite happy with the results.
Note that I'm not sure how much the change of language affected the results. The C# compiler seems to optimize more, but more importantly, what you write in C# translates very closely (with some obvious exceptions) to IL code. What you write in F# doesn't, and it is thus very difficult to optimize when optimization boils down to things like minimizing method calls.
I'm not sure whether you'll want to use them because both the language and the interface are quite different from other the other collections you have (though an F# assembly includes some inline functions and operators). However, according to my benchmarks the Vector implementation performs several times better than FSharpx's Vector at random access (although it is implemented using a similar data structure) and the deque (called Sequence; it's not exactly a deque, to be honest. ) supports many fast operations, including get and set by index, subsequences and splits, concat, insert in the middle, and some more things. This is in addition to practically real-time access and add/drop from either end. It is implemented as 2-3 finger tree that is highly optimized for .NET
There is also a HashMap, implemented as an array mapped trie (similar to the Vector), which uses equality semantics instead of comparison semantics, and early benchmarks showed it performed significantly better than FSharp's own map for certain inputs, such as strings.
There is still some testing to be done and some modifications to make. They're pretty much completed though.
Link to the library: https://github.com/GregRos/Solid
Here are some benchmarks, in ms. tests were performed on 100k length collections added from the 'Last' end. https://docs.google.com/spreadsheet/ccc?key=0Aja21aci5kT6dGlxeFRkWG12N0pBVEZHZGppQnBZOXc#gid=0
The benchmarks were performed with all optimizations enabled, no debugging, and without the DEBUG compilation symbol. The latter is important because there are some conditional consistency checks that have a serious impact on performance that will get compiled.
The text was updated successfully, but these errors were encountered:
On a different note, does anyone know what I should call the
I'm also not sure whether the data structures should implement any particular interfaces, and if so, which ones.
@GregRos I've looked at Solid.Sequence<'T> (the C# version, which I accessed from F#) and more than ever I think "Segment", as in "line segment", is the best name for it. It looks like you can perform all the line segment operations of the integer number line, which is very exciting. I also think it would be a great addition to the linear functional data structures in the FSharpx.Collections namespace.
I think you have a better naming standard for your functions (and you should probably keep it for C#), but I feel even more strongly an F# library should follow the naming standard set by Microsoft.FSharp.Collections.List for linear functional data structures.
So my first suggestion is renaming almost all Segment's functions (except the ones that are completely new) to follow the other FSharpx.Collections linear functional data structures.
I also performed the following benchmarks against FSharpx.Collections.Deque and Vector. See https://github.com/jackfoxy/DS_Benchmark and http://jackfoxy.com/benchmarking-f-data-structures-introduction for more on the framework and methodology I used. I'd be happy to make the complete results spreadsheet available to you.
All in all Segment performs very well and since it is so versatile will be a great addition to FSharpx.Collections. I only benchmarked functions that are comparable to functions in Deque and Vector, but I'm looking forward to learning what the time complexity of the other functions are.
"Scale" refers to the number of elements in the structure benchmarked, or the size of the resulting structure. Every benchmark performed at scales 100 - 100,000 by power of 10.
initial data is always an F# int32 array
the following benchmarks compared Deque/Segment
Both performed nearly identically until scale 100,000, when Segment took over 400% as much time
same result as cons
same result as cons and conj, except the slowdown for Segment at scale 100,000 was only 300%
comparable times until scale 10,000 -- Segment took 300% as much time, and scale 100,000 -- Segment took 3,100% as much time
essentially the same performance at all scales; at scale 100,000 Segment took 133% as much time...hardly worth mentioning given the vagaries of benchmarking
Segment took 184% as much time at scale 10,000 and 360% as much time at scale 100,000
both structures have very fast O(1) reverse, even so, Segment smoked Deque on this benchmark: on a fast i7 -- 3 ticks vs. 250 ticks
The remaining benchmarks compare Segment against FSharpx.Collections.Vector
Vector beat Segment by almost 10X at the low range of scale to almost 20X at scale 100,000
Vector was only about twice as fast at scale 100 -- by scale 100,000 performance was nearly identical
I've today performed several commits that implement most of your suggestions.
There are two general reasons why I want to keep the instance interface C# inspired instead of F# inspired.
Incidentally, the main reason why certain members are
It is possible to rename the methods to better fit in with
Integration into FSharpx
I would like to see XList in FSharpx.Collections, but it should go into FSharpx.Collections.Experimental first, so it can go through a couple of rounds of breaking changes, if need be.
There's probably a good way to make xlist a full-fledged F# type (and capitalized to XList), I just don't know the right way:
It seems you are on the road to making XList a first-class F# citizen...care to reconsider function naming? :)
Your benchmarking is measuring simple function timings. That's useful. Many of my benchmarks incorporate the function in some simple iterative process that's a common pattern in functional programming. Of tail-recursion, tail-recursion with continuation passing, and non-tail-recursion, the only recursive wrapping process I benchmark with is tail recursion. The critical thing is like tests have like wrapping code and like code leading up to the benchmarked code so the GC state entering the benchmarks is consistent. Ultimately like benchmarks are still apples to apples comparisons.
This explains how you see Deque consistantly beating XList by 150-200%, while I am seeing better performance until the larger scales.
Yes, something went wrong with my xlist.Reverse() benchmark. It was a cut-n-paste bug. I created the xlist benchmark from the Deque benchmark, replacing "Deque" with "xlist". [Deque obj].rev is O(1) and doesn't take closed parens, but [xlist obj].reverse just returns the function and not the function's result. (I should have known something was wrong taking just 3 ticks, but I thought you had written some fast code!)
Between updates and correcting the reverse benchmark, some additional information
I didn't explain my benchmarking very well. Here are the kind of things I want to benchmark for data structures:
How long does it take to load sequential data of length X into the structure using ofSeq?
As to function naming, you can use the
@jackfoxy Those are actually the exact same these I've been measuring. I'm not measuring individual function calls, but the exact things you mentioned.
After playing around with the benchmarks I reached the conclusion that they were even more sensitive than I thought. In the end I got rid of all test infrastructure, including lambdas, wrappers, and extraneous method calls. I performed every benchmark many times, discarded the first few iterations, and summed up the remaining results. Then I iterated this process several more times, and averaged it out.
This is in addition to standard precautions such as garbage collection, etc.
I did all this because seemingly inconsequential things made the results unstable. With this sort of routine I believe I achieved stable results, at least where the testing environment is concerned.
My results were that in general,
This is due to
Changing the access pattern also changes the incidence of the worst case. In general, the least favorable sequence of operations is a long sequence of additions to one end. Other access patterns, such as adding items from different ends, generally provide more favorable results.
I'm not entirely sure how, given the circumstances, to empirically compare the performance of the two data structures. Especially when considering the worst-case performance of
As an example, a single call to
I'll look into this. The source code is in
...However, if I moved the implementation of the wrapper to the F# assembly I could indeed make use of the CompiledName attribute, as well as certain other F#-centric features. In fact, I don't believe there is anything related to the wrapper I could not implement in F#. I will see how much work this requires.
Benchmarking makes data analysis look like science...
I overlooked something in your benchmarks (the link at the top of this threaad). You are including Fsharpx.Collections.SkewBinaryRandomAccessList in your benchmarks. That structure does not exist in FSharpx.Collections. It does exist in FSharpx.Collections.Experimental (and in the obsolete FSharpx.DataSructures). That leads me to suspect you were comparing against Vector and Deque in Experimental as well. The Vector in FSharpx.Collections is slightly more performant than the one in Experimental, and the Deque in FSharpx.Collections is not even the same Deque as in Experimental.
By the way, the implementation of RandomAccessList now in Fsharpx.Collections is the Fsharpx.Collections.Vector inverted.
There's always going to be something less than ideal about any benchmark and many cycles of reasoning over the meaning of worst case scenarios, GC, etc. GC is a real issue with all data structures, so benchmarks that call GC during execution also make useful measurments.
And of course once you stick your neck out and publish a benchmark, the first things others see in it are the problems.
Two different approaches provides more information, which can only be good because ultimately benchmarking only serves to update our Bayesian priors.
I just committed https://github.com/jackfoxy/DS_Benchmark with the latest Solid benchmark code.
Unrelated, but this was throwing me for a while: there is a Vector in Microsoft.FSharp.Math within PowerPack and if you reference PowerPack in your project, even though you do not open it in your current file, the tooltips for that module will mix-up with the tooltips for Solid.Vector or FSharpx.Collections.Vector. Annoying.
I recenlty received "FlatList", from Don Syme. It's immutable, has a limited sub-set of the Vector functions, and some of it's functions are very fast. I put it in the benchmarks with the 2 vectors, where appropriate.
very close between Solid and FSharpx, with FSharpx gaining a small advantage from scale 10^4 - 10^6.
Solid and FSharpx are nearly identical until scale 10^5, where FSharpx becomes nearly 1:2 faster.
I even re-benchmarked FlatList with input data in a list (instead of an array), and results were nearly identical (it did take twice as long at 10^6, but that was still 43:1 faster than FSharpx using an array).
Close at smaller scales, but by 10^5 Solid is 1:3 faster than FSharpx. and at 10^6 is over 1:4.
Solid consistently outperforms FSharpx by about 5:8 accross all scales.
FSharpx consistantly beats Solid by 1:2 until 10^6, where it beats Solid 1:5.
FSharpx beat Solid consistantly by 2:3 across all scales.
Solid beats FSharpx at scale 10^2 by 2:3, and by scale 10^6 it is 2.5:11.1
There is definitely room for all sorts of human error on my part in any of these benchmarks. And I never verified Solid or FlatList actually function as expected.
Neither Solid or FSharpx is actually slow in any of these tests. FlatList performs so fast on certain tests because it implements a simple array internally.
Here are my recommendations:
Some of the results of the benchmarks don't exactly mirror mine. For example, this benchmark and a few more like it show that
The one feature
Another feature I intend to implement is efficient bulk loading that modifies arrays in-place and should perform a lot faster than iterating
Here is an explanation of how it's done: https://gist.github.com/GregRos/5271345
I recommend starting with the
Another thing I've been thinking about is the following paper: http://infoscience.epfl.ch/record/169879/files/RMTrees.pdf
However, these modifications are rather complicated and I am afraid the performance of the data structure will go down.
One type of bug F# does not protect against is cut-and-paste errors.
Here is the corrected benchmark for 10,000 random lookups (times in mill.)
flatlist 10^2 0.0
flatlist 10^3 0.0
flatlist 10^4 0.0
flatlist 10^5 0.0
flatlist 10^6 0.0
Whoops. I realized that while writing the explanation for the bulk loading I've already sorted out all the difficulties. I committed a version of
If you call
This is done by trying to find the collection's concrete type from a small list of alternatives (such as
If a type has a
As such, the performance depends on 1) if the type is convertible to one of the predefined types and 2) how well ToArray is implemented for that type. For example, if the input is convertible to
If we are forced to default,
These numbers are for comparison only. Performance scales upwards in general, meaning the speedup factor grows with the size of the input sequence and a little with the current size of the Vector.
It may be possible to improve performance slightly.
It hasn't been tested thoroughly yet, and there are likely some edge cases in which it may not work.
@GregRos I can also benchmark with strings, but from my past experience the relative comparison of structure A to structure B is usually pretty similar, so I just didn't think it was worth my while.
Your bulk load change to
flatlist 10^2 0.1
flatlist 10^3 0.1
flatlist 10^4 0.1
flatlist 10^5 0.2
flatlist 10^6 0.7
You should work on getting
Ugh. Rewriting the wrappers in F# and using the
I thought of another addition to
However, it will be a bit of time before I add this in, as most likely I will be inactive in the next days due to real life issues. The same goes for the interface changes.
Oh no, there is no deferred execution involved. All the work involved here is allocating and copying an array. It is no more difficult to copy all indices up to N than to copy the entire array.
That said, additional logic may need to be added to all the operations, but at no point will any operation actually be deferred.
Hey there. Sorry for the longish absence.
I'm going to add the F# wrappers today. I've also added a
One thing we haven't talked about is the
It has a few bugs. If you think it might be useful, I'll fix those bugs and release it. Otherwise, I will implement some performance changes and see where that gets me.
By the way, I haven't implemented the remove method for the vector I was talking about. However, even as it is now, the vector still has a
I have been thinking about a different revision that is more complicated and probably involves a redesign of the data structure. It does involve real deferred execution and hopefully it should make every operation as fast as a bulk operation. I'm not sure when I'll start working on this though.
On a completely different note, I scaled back the required framework version to 3.5 Client Profile to make the library more accessible and removed tons of unnecessary references. In addition, the reference to
It would be sweet if