Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch ctable to use SipHash #1155

Merged
merged 21 commits into from
Jul 3, 2017
Merged

Switch ctable to use SipHash #1155

merged 21 commits into from
Jul 3, 2017

Conversation

wingo
Copy link
Contributor

@wingo wingo commented Jun 2, 2017

This branch adds an implementation of SipHash from https://131002.net/siphash/. SipHash is a cryptographic hash function designed for short inputs, suitable for using as the hash function for hash table implementations. It incorporates a "key" to prevent hash flooding attacks, where a malicious user causes many program inputs to hash to the same location.

This change fixes the problem where keys that are not either 4 or 8 bytes end up using a really inefficient hash function. See takikawa#1 for a full investigation.

SipHash is more complicated than the Jenkins hash that we used. We mitigate this cost in 4 ways:

  • We use DynASM to compile optimized versions of the hash function.

  • The DynASM hash functions are pre-unrolled for the key size; LuaJIT was not unrolling "normal" loops like we thought it would.

  • We have SSE and AVX2 implementations that can hash multiple keys at once. See snabb snabbmark hash 16 for example performance data.

  • We reduce the number of per-block rounds of SipHash from 2 to 1 and reduce the final rounds from 4 to 2. Again, see the performance numbers for more information.

Note that in ctable's use of SipHash, we still use a reference key; this is because hash tables are sometimes written out to disk and we would need to include the key in the hash table in order for the code that loads the ctable to load the appropriate hash function.

We could opt for a faster hash function for smaller inputs; it does not seem necessary to have 16 bytes of key for a hash function with 4 bytes of input and 4 bytes of output. That's the only case that regresses (4-byte inputs); all others are as fast or (sometimes) much faster. And in the real world I don't see any difference with this change in lwaftr performance.

WDYT @asumu and @lukego? /cc also @alexandergall for his hash table work on the learning bridge and MurmurHash.

@lukego
Copy link
Member

lukego commented Jun 6, 2017

I really love the work here and on @petebristow's LPM branch at #1136 .

Having tight implementations of important data structures with cycle-counting benchmark coverage just adds a lot of joy to life :).

I briefly tested this branch on a non-AVX2 CPU (grindelwald - Ivy Bridge) and the fallback seems to be working fine. Performance here on Ivy Bridge also looks in line with your numbers from Haswell+ which is nice.

$ sudo taskset -c 1 ./snabb snabbmark hash 16
baseline: 15.02 cycles, 5.56 ns per iteration (result: 0)
murmur hash (32 bit): 402.31 cycles, 149.00 ns per iteration (result: 2167721464)
sip hash c=1,d=2 (x1): 41.17 cycles, 15.25 ns per iteration (result: 3935169236)
sip hash c=1,d=2 (x2): 31.03 cycles, 11.49 ns per iteration (result: 3935169236)
sip hash c=1,d=2 (x4): 28.73 cycles, 10.64 ns per iteration (result: 3935169236)
sip hash c=1,d=2 (x8): 28.74 cycles, 10.65 ns per iteration (result: 3935169236)
sip hash c=2,d=4 (x1): 66.87 cycles, 24.77 ns per iteration (result: 4024933664)
sip hash c=2,d=4 (x2): 55.18 cycles, 20.44 ns per iteration (result: 4024933664)
sip hash c=2,d=4 (x4): 52.80 cycles, 19.56 ns per iteration (result: 4024933664)
sip hash c=2,d=4 (x8): 51.87 cycles, 19.21 ns per iteration (result: 4024933664)

One fine day I would love to make a little R hack to summarize all of these various microbenchmarks and how they compare on each relevant CPU microarchitecture...

wingo added 3 commits June 9, 2017 11:41
* src/lib/yang/binary.lua: Bump binary version for ctable hash
  change.
Python tests that share setup and teardown are "vulnerable" to side
effects from one subtest affecting future subtests.  In this case, the
set of tests resulted in multiple softwires terminating at ::1, which
then caused problems when projecting the binding table onto the
ietf-softwire model: whereas before there was no issue because the
answer that we were looking for happened to be first (or last?) in the
set of softwires associated with ::1, the change in hash function
changed the retrieval order, and thus caused the test to fail.

Hacked around by changing the tests to not define multiple softwires
with the same IPv6 B4 address.
@wingo
Copy link
Contributor Author

wingo commented Jun 9, 2017

Fixed the test failure by fixing up the test in question, and fixed the doc failure by merging from next. Ready to go in, I think :)

@lukego lukego merged commit 2d6ebd8 into snabbco:next Jul 3, 2017
lukego added a commit that referenced this pull request Jul 3, 2017
@lukego
Copy link
Member

lukego commented Jul 3, 2017

Merged! Thanks for the patience.

@lukego lukego added the merged label Jul 3, 2017
@wingo wingo deleted the siphash-for-next branch July 7, 2017 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants