Skip to content
This repository has been archived by the owner on Dec 26, 2023. It is now read-only.

FYI: We build a library around your hash #110

Closed
bigerl opened this issue Jan 20, 2021 · 3 comments
Closed

FYI: We build a library around your hash #110

bigerl opened this issue Jan 20, 2021 · 3 comments

Comments

@bigerl
Copy link

bigerl commented Jan 20, 2021

Hi @martinus,
this seemed to me to be the easiest way to reach out to you.

We at the DICE research group found that the hash functions for strings and int you provide in this repo provide among the best performance with all hash maps/sets we tested. So we built a hashing library around it that provides predefined hashes for primitive types, many stl containers and an easy way to define hashes for user defined types.

Please check it out at https://github.com/dice-group/dice-hash

Best
Alex

@martinus
Copy link
Owner

Hi! I had a quick look at your code:

for dice_hash_invertible_combine you might want to have a look at https://www.preprints.org/manuscript/201710.0192/v1
The polynomial 3860031 + (h+y)*2779 + (h*y*2) looks good and fast enough

Note that the hashes I am using in robin_hood map are not really high quality, but they seem to be good enough (and fast!) for hashmaps. I wouldn't use them for anything else.

@bigerl
Copy link
Author

bigerl commented Jan 22, 2021

Thx for the hint regarding the invertible combine. We will definitely look into that.

To get you right on the hash quality: Are you sure that hashes are not of good quality, e.g. with respect to quality benchmarks like SMHasher or theoretical properties? Or was that simply never investigated (probably because it is not necessary for a hash map)?

@martinus
Copy link
Owner

I've spent quite some time especially on hash_int. E.g. one good test is to hash consecutive numbers (0, 1, 2, 3, ...), and put the result through PractRand. For a good high quality mixer, this should not fail quickly. For my hash_int version I am pretty sure this fails tests for randomness quite quickly. But as far as I can say this shouldn't be a big concern for hashmaps, it does not necessarily have to be of such a high quality.

I've got a separate project to find a good mixer here: https://github.com/martinus/better-faster-stronger-mixer

I'd say one of the highest quality mixer is NASAM: http://mostlymangling.blogspot.com/2020/01/nasam-not-another-strange-acronym-mixer.html Pelle Evensen's blog is a really good read for that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants