-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch ctable to use SipHash #1155
Conversation
Instead of always hashing a single uint32, "snabbmark hash" hashes a byte vector which is filled each iteration. This imposes some overhead but lets us get closer to checking ctable performance.
This is a somewhat experimental commit, while I explore performance options.
Clean up the SipHash implementation, separating out input stride from size, and fall back to SSE if AVX2 isn't available.
This should speed up hash table lookups. This commit also refactors the siphash interface to hide the granularity of the parallelism.
I really love the work here and on @petebristow's LPM branch at #1136 . Having tight implementations of important data structures with cycle-counting benchmark coverage just adds a lot of joy to life :). I briefly tested this branch on a non-AVX2 CPU (grindelwald - Ivy Bridge) and the fallback seems to be working fine. Performance here on Ivy Bridge also looks in line with your numbers from Haswell+ which is nice.
One fine day I would love to make a little R hack to summarize all of these various microbenchmarks and how they compare on each relevant CPU microarchitecture... |
* src/lib/yang/binary.lua: Bump binary version for ctable hash change.
Python tests that share setup and teardown are "vulnerable" to side effects from one subtest affecting future subtests. In this case, the set of tests resulted in multiple softwires terminating at ::1, which then caused problems when projecting the binding table onto the ietf-softwire model: whereas before there was no issue because the answer that we were looking for happened to be first (or last?) in the set of softwires associated with ::1, the change in hash function changed the retrieval order, and thus caused the test to fail. Hacked around by changing the tests to not define multiple softwires with the same IPv6 B4 address.
Fixed the test failure by fixing up the test in question, and fixed the doc failure by merging from |
Merged! Thanks for the patience. |
This branch adds an implementation of SipHash from https://131002.net/siphash/. SipHash is a cryptographic hash function designed for short inputs, suitable for using as the hash function for hash table implementations. It incorporates a "key" to prevent hash flooding attacks, where a malicious user causes many program inputs to hash to the same location.
This change fixes the problem where keys that are not either 4 or 8 bytes end up using a really inefficient hash function. See takikawa#1 for a full investigation.
SipHash is more complicated than the Jenkins hash that we used. We mitigate this cost in 4 ways:
We use DynASM to compile optimized versions of the hash function.
The DynASM hash functions are pre-unrolled for the key size; LuaJIT was not unrolling "normal" loops like we thought it would.
We have SSE and AVX2 implementations that can hash multiple keys at once. See
snabb snabbmark hash 16
for example performance data.We reduce the number of per-block rounds of SipHash from 2 to 1 and reduce the final rounds from 4 to 2. Again, see the performance numbers for more information.
Note that in ctable's use of SipHash, we still use a reference key; this is because hash tables are sometimes written out to disk and we would need to include the key in the hash table in order for the code that loads the ctable to load the appropriate hash function.
We could opt for a faster hash function for smaller inputs; it does not seem necessary to have 16 bytes of key for a hash function with 4 bytes of input and 4 bytes of output. That's the only case that regresses (4-byte inputs); all others are as fast or (sometimes) much faster. And in the real world I don't see any difference with this change in lwaftr performance.
WDYT @asumu and @lukego? /cc also @alexandergall for his hash table work on the learning bridge and MurmurHash.