Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HashMap and HashSet performance #24

Open
rbuckton opened this issue Sep 6, 2021 · 15 comments
Open

HashMap and HashSet performance #24

rbuckton opened this issue Sep 6, 2021 · 15 comments

Comments

@rbuckton
Copy link
Contributor

rbuckton commented Sep 6, 2021

The current implementations of HashMap and HashSet are orders of magnitude slower than native. I have been investigating performance improvements in hashing on NodeJS using native bindings, but need to perform a thorough performance investigation.

@unicomp21
Copy link

How do I get the absolute best performance when it comes to hashsets and hashmaps? Would it make sense to use rust/wasm w/ typescript bindings? In order to get fast collections?

Or maybe I should be using dotnetjs?
https://github.com/Elringus/DotNetJS

Lol, or am I confused? And this is simply a stale issue?

@unicomp21
Copy link

@rbuckton ^

@rbuckton
Copy link
Contributor Author

rbuckton commented Sep 23, 2022

This is partially complete as I now have native hashing on Node.js, but not in the browser. I can't currently rely on WASM since that would require an await somewhere before any hashing could be used.

@unicomp21
Copy link

unicomp21 commented Sep 24, 2022

What's wrong w/ having an await? Lol, wouldn't the benefit be worth await'ing for? Maybe everything defaults to the slow path (ie no await), but we could have something like collection extensions which require an await? In the browser?

@rbuckton
Copy link
Contributor Author

An await would break commonjs modules, though it could potentially work fine in an ES module at the top level. It's something I might consider for the web in the future, when I have a bit more time to work on it.

@unicomp21
Copy link

For what it's forth, in frameworks like babylonjs, we are already accustomed to having to do the await thing when using deps like navmesh.

@unicomp21
Copy link

unicomp21 commented Feb 11, 2023

I'm hoping this creates an argument for webassembly support :)

@unicomp21
Copy link

unicomp21 commented Feb 11, 2023

Given you're smarter than me, I'd also be keenly interested in following your automation techniques for handling emscripten/rust, etc. in the build.

@rbuckton
Copy link
Contributor Author

rbuckton commented Feb 11, 2023

There are already WASM versions of most of the hash functions as well. The native versions are still faster on node, especially because I can leverage the built-in identity hash functions for Objects in the V8 api.

https://github.com/esfx/esfx/tree/main/packages/equatable-native contains the native code package generator.

https://github.com/esfx/esfx/blob/main/packages/equatable/src/internal/hashers/xxhash64.wast contains the WASM sources for string hashing using XXHash64.

Also, it's possible to load WASM w/o using await if you embed the WASM binary as a Uint8Array directly in a JS file. I wouldn't recommend it for large binaries, but the hashing function is fairly small.

@unicomp21
Copy link

Does this mean murmur is slow? Given it's not done in wasm?

https://github.com/esfx/esfx/blob/main/packages/equatable/src/internal/hashers/murmur3.ts

@rbuckton
Copy link
Contributor Author

rbuckton commented Feb 11, 2023

I no longer use murmur, it is only present for the purpose of comparative benchmarks. For non-native, non-WASM hashing I now use either XXHash64 (if Bigint is available), or XXHash32. Both are faster and have fewer hash collisions than murmur3.

@unicomp21
Copy link

I'm a bit of a toddler when it comes to hashes, I guess this means XXHash64 is better than murmur3? For example, if I were to attempt implementing the kafka protocol using esfx, XXHash64 would be the better choice? I "think" kafka currently uses murmur3?

@rbuckton
Copy link
Contributor Author

Based on the benchmarks at https://cyan4973.github.io/xxHash/, XXH32 outperforms murmur3 at more than double the speed for large data sets and is about 40% faster for small data sets, while XXH64 is 5x the speed of murmur3 for large data sets.

I misspoke about hash collisions though. All three seem comparable with respect to hash quality, collisions, and avalanche properties from what I can tell.

I can't speak to kafka's use of murmur3, though.

@unicomp21
Copy link

unicomp21 commented Feb 12, 2023

@rbuckton, last question, then I'll go silent for another month, lol.

I might be out in left field on this one. Per Gor's talk below, and the fact we'll soon have webassembly tail calls when chrome 113 gets released. In a nutshell this will mean, from a webassembly perspective using clang, that we can co_await cache line misses? Would there be implications here for many strands/coroutines accessing the same collection? Like reader/writer lock type stuff? I'm thinking in the case of only readers, no active writers present, if one hits a cache line miss when pulling a value out of the map, other readers of the map could be unsuspended when their cache-lines have been loaded? Leading to significant throughput improvements?

I'm unsure how this would be surfaced at the typescript level. Maybe some sort of completion queue? Similar to i/o completion ports?

https://www.youtube.com/watch?v=_fu0gx-xseY&t=4s

The intention is to heavily utilize the "out of order" execution available on many cpu's today.

unicomp21 referenced this issue in GorNishanov/await Feb 12, 2023
@unicomp21
Copy link

Correction, tentatively, chrome 112 should have tail calls.

WebAssembly/tail-call#15 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants