Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split test=Speed into SpeedBulk and SpeedSmall and report weighted average for Small key speed test #293

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

darkk
Copy link
Contributor

@darkk darkk commented Aug 30, 2024

I hope, that also addresses the goal that @wangyi-fudan had in #113 while being way more "stable" in computational terms.

It gives results looking like that:

$ ./SMHasher --test=SpeedSmall --extra SipHash
--- Testing SipHash "SipHash 2-4 - SSSE3 optimized" GOOD
...
Small key speed test -   64-byte keys -   231.26 cycles/hash
Average                                   151.602 cycles/hash
Average DNS query (| 99.8% of query.log)  123.685 cycles/hash
Average DNS name (| 99.4% of top-1m.csv)  118.621 cycles/hash
Average DNS name (| 99.6% of 200MiB.zone) 114.610 cycles/hash
Average UMASH (| 70.0% of startup-1M.bz2) 139.403 cycles/hash

or

$ SMHASHER_SMALLKEY_MAX=256 ./SMHasher --test=SpeedSmall --extra SipHash
--- Testing SipHash "SipHash 2-4 - SSSE3 optimized" GOOD
...
Small key speed test -  255-byte keys -   688.32 cycles/hash
Average                                   374.459 cycles/hash
Average DNS query (|100.0% of query.log)  124.329 cycles/hash
Average DNS name (|100.0% of top-1m.csv)  120.263 cycles/hash
Average DNS name (|100.0% of 200MiB.zone) 115.890 cycles/hash
Average UMASH (| 99.9% of startup-1M.bz2) 192.407 cycles/hash

Possible improvements are:

  • computed distributions instead of tabulated ones might reduce size of SMHasher binary a bit, but 8 KiB of doubles is not that much anyway
  • fit a linear model to extrapolate timings for keys having length greater than N instead of running a test, but I'm not that confident that it's useful unless we have real-world dataset with key-lengths being somewhat large

…MAX}

It adds SMHASHER_SMALLKEY_MAX environment variable to override default
value of the longest "Small key" for hash and changes default value from
31 to 32 to make the Average a bit more fair to the hashes reading the
memory word-by-word (dword, qword) and not byte-by-byte.

SMHASHER_SMALLKEY_MIN is also added as a counterpart to benchmark hashes
when the range of small key lengths is known.
main.cpp Outdated Show resolved Hide resolved
It addresses the question at rurban#113

What is the "real" average cycles/hash value for a given hash function?

We can't know, but we can estimate it better if we assume that the
function timing does not depend on input (that's not true for hashes
based on multiplication) and we know distribution of key length in
advance (that might be somewhat known for certain classes of inputs,
but the distribution varies across classes measurably).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants