Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HashMap performance issues #9

Open
2 of 3 tasks
forki opened this issue Dec 9, 2013 · 8 comments
Open
2 of 3 tasks

HashMap performance issues #9

forki opened this issue Dec 9, 2013 · 8 comments

Comments

@forki
Copy link
Member

forki commented Dec 9, 2013

@buybackoff
Copy link

Continuing the discussion from fslaborg/Deedle#52 (comment):

Recently I have read a lot about different persistent data structures and thought that Clojure's version of hash array mapped trie is probably the best. Thanks to this implementation in F# I could now test it and I have just benchmarked PersistentHashMap against FSharp.Core.Map.

Base on my draft/unchecked/approximate numbers current version of PHM is slower and consumes significantly more memory than the simple AVL tree. I would expect PHM to be faster given that its depth is much less than of binary tree, and memory usage should be less given that there are less references from nodes. Your Vector implementation based on similar algo is very fast, so the current results are somewhat unexpected.

The benchmark is just simple iteration of 1 Mn inserts and lookups of <string, int64> and <int64, int64> (https://gist.github.com/buybackoff/ee57b2fe5919448b9b2b). The results from my machine below. Interesting that switching from string to int64 keys doubles AVL tree performance, but the numbers for PHM increase just slightly. That could indicate that there is some constant overhead not related to data in the current implementation of PHM.

PersistentHashMap

Inserts per sec: 151630
Lookup per sec: 1163829
Private set after 1M <int64.ToString(),int64> inserts: 324Mb

int64,int64:

Inserts per sec: 173250
Lookup per sec: 1626016

MapTree from FSharp.Core Map internal implementation

Inserts per sec: 245459
Lookup per sec: 1410437
Private set after 1M <int64.ToString(),int64> inserts: 105Mb

int64,int64:

Inserts per sec: 515463
Lookup per sec: 3322259

For 10Mn (int64,int64) PHM runs much longer than 10x of 1Mn time (killed the test, didn't wait). Memory is raising and falling in TaskManager - obviously GC does a lot of work. For TreeMap performance drops by c.30% (near expected log(1M)/log(10M)).

P.S. As a separate note, I wonder why there are at least two places with different implementations of persistent collections: this repo (FShapx), and ExtCore (https://github.com/jack-pappas/ExtCore/blob/master/ExtCore/Collections.HashMap.fs)? In the last one a persistent Map is already implemented using Patricia trie vs array-mapped here.

Some collection duplicate each other (e.g. IntMap), are they synchronized?

@forki
Copy link
Member Author

forki commented Dec 12, 2013

I ported the HashMap from Clojure and honestly I don't know why it is so slow.
Maybe @jackfoxy has ideas
Please try to benchmark te original clojure implementation.

I think the port is correct in functionality, but maybe there should be used other node types in .NET

@buybackoff
Copy link

From profiler:

Function Name Exclusive Samples %
Microsoft.FSharp.Collections.ArrayModule.Copy(!!0[]) 32.41
JIT_TailCallReturnFromVSD@0 13.83
?JIT_New@@YIPAVObject@@PAUCORINFO_CLASS_STRUCT
@@@z 10.48
Microsoft.FSharp.Core.LanguagePrimitives.HashCompare.GenericEqualityIntrinsic(!!0,!!0) 5.13
FSharpx.Collections.ArrayNode.FSharpx-Collections-INode-assoc(int32,int32,object,object,class FSharpx.Collections.Box) 4.92

50+% of inserts work is done in

interface INode with member this.assoc(shift, hashKey, key, value, addedLeaf) : INode =

For reads only tail call is the greatest overhead.

@forki
Copy link
Member Author

forki commented Dec 13, 2013

Can you fix it?

@buybackoff
Copy link

Not sure there is an easy "fix": array copying is by design, recursive methods are on interfaces with two implementations that call each other - couldn't easily change this to a loop. I do not see any particular bug here.

I am not even sure that this is slow in absolute terms. My expectations were based on Vector performance and I expected this map to perform closer to Vector at least on reads (and at least in int64-int64 case).

@jackfoxy jackfoxy changed the title Implement HashMap HashMap performance issues Jul 7, 2018
@jackfoxy
Copy link
Contributor

jackfoxy commented Jul 7, 2018

Renaming to reflect this issue has turned into a performance issue. No idea if this is still current,. #110 should be used to shed light on this.

@buybackoff
Copy link

buybackoff commented May 30, 2019

Same benchmark for PersistentHashMap as in dotnet/fsharp#5360

 Case                |    MOPS |  Elapsed |   GC0 |   GC1 |   GC2 |  Memory
----                 |--------:|---------:|------:|------:|------:|--------:
Get                  |    4.80 | 208 ms |  19.0 |   0.0 |   0.0 | 0.782 MB
Add                  |    0.46 | 2,179 ms |  26.0 |   9.0 |   3.0 | 223.584 MB

Asymptotically (BigO) Log32 is 5x faster than Log2 for AVL. Constant factor could be optimized to some extent (e.g. popcount via intrinsics, virtual calls, etc.). But memory usage is still too big vs intuition.

In the above table memory is measured before forced GC. If I force GC from inside the benchmark the results are:

 Case                |    MOPS |  Elapsed |   GC0 |   GC1 |   GC2 |  Memory
----                 |--------:|---------:|------:|------:|------:|--------:
Get                  |    4.64 | 216 ms |  19.0 |   0.0 |   0.0 | 0.782 MB
Add                  |    0.41 | 2,458 ms |  26.0 |  10.0 |   5.0 | 156.055 MB

(only 5 runs without warm up, only memory column is relevant)

Still memory is 3x more than AVL.

What is theoretical memory overhead for an ideal HAMT?

@krauthaufen
Copy link

Hey, I recently started implementing a persistent HashMap for F# and did some benchmarks (incuding the FSharpX PersistenHashMap) in my current sketch-repo https://github.com/krauthaufen/ImmutableHashCollections

I think the main reason the FSharpX implementation isn't that fast is boxing/unboxing (due to object arrays) and the memory issue might be caused by capturing the creating-thread for each cell.
I will try to come up with a new implementation using the same techniques for comparison with our current implementation, but help would be greatly appreciated 😀

Also see discussion at fsprojects/FSharp.Data.Adaptive#17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants