Switch ctable to use SipHash #1155

wingo · 2017-06-02T13:31:56Z

This branch adds an implementation of SipHash from https://131002.net/siphash/. SipHash is a cryptographic hash function designed for short inputs, suitable for using as the hash function for hash table implementations. It incorporates a "key" to prevent hash flooding attacks, where a malicious user causes many program inputs to hash to the same location.

This change fixes the problem where keys that are not either 4 or 8 bytes end up using a really inefficient hash function. See takikawa#1 for a full investigation.

SipHash is more complicated than the Jenkins hash that we used. We mitigate this cost in 4 ways:

We use DynASM to compile optimized versions of the hash function.
The DynASM hash functions are pre-unrolled for the key size; LuaJIT was not unrolling "normal" loops like we thought it would.
We have SSE and AVX2 implementations that can hash multiple keys at once. See snabb snabbmark hash 16 for example performance data.
We reduce the number of per-block rounds of SipHash from 2 to 1 and reduce the final rounds from 4 to 2. Again, see the performance numbers for more information.

Note that in ctable's use of SipHash, we still use a reference key; this is because hash tables are sometimes written out to disk and we would need to include the key in the hash table in order for the code that loads the ctable to load the appropriate hash function.

We could opt for a faster hash function for smaller inputs; it does not seem necessary to have 16 bytes of key for a hash function with 4 bytes of input and 4 bytes of output. That's the only case that regresses (4-byte inputs); all others are as fast or (sometimes) much faster. And in the real world I don't see any difference with this change in lwaftr performance.

WDYT @asumu and @lukego? /cc also @alexandergall for his hash table work on the learning bridge and MurmurHash.

Instead of always hashing a single uint32, "snabbmark hash" hashes a byte vector which is filled each iteration. This imposes some overhead but lets us get closer to checking ctable performance.

This is a somewhat experimental commit, while I explore performance options.

Clean up the SipHash implementation, separating out input stride from size, and fall back to SSE if AVX2 isn't available.

This should speed up hash table lookups. This commit also refactors the siphash interface to hide the granularity of the parallelism.

lukego · 2017-06-06T09:17:53Z

I really love the work here and on @petebristow's LPM branch at #1136 .

Having tight implementations of important data structures with cycle-counting benchmark coverage just adds a lot of joy to life :).

I briefly tested this branch on a non-AVX2 CPU (grindelwald - Ivy Bridge) and the fallback seems to be working fine. Performance here on Ivy Bridge also looks in line with your numbers from Haswell+ which is nice.

$ sudo taskset -c 1 ./snabb snabbmark hash 16
baseline: 15.02 cycles, 5.56 ns per iteration (result: 0)
murmur hash (32 bit): 402.31 cycles, 149.00 ns per iteration (result: 2167721464)
sip hash c=1,d=2 (x1): 41.17 cycles, 15.25 ns per iteration (result: 3935169236)
sip hash c=1,d=2 (x2): 31.03 cycles, 11.49 ns per iteration (result: 3935169236)
sip hash c=1,d=2 (x4): 28.73 cycles, 10.64 ns per iteration (result: 3935169236)
sip hash c=1,d=2 (x8): 28.74 cycles, 10.65 ns per iteration (result: 3935169236)
sip hash c=2,d=4 (x1): 66.87 cycles, 24.77 ns per iteration (result: 4024933664)
sip hash c=2,d=4 (x2): 55.18 cycles, 20.44 ns per iteration (result: 4024933664)
sip hash c=2,d=4 (x4): 52.80 cycles, 19.56 ns per iteration (result: 4024933664)
sip hash c=2,d=4 (x8): 51.87 cycles, 19.21 ns per iteration (result: 4024933664)

One fine day I would love to make a little R hack to summarize all of these various microbenchmarks and how they compare on each relevant CPU microarchitecture...

* src/lib/yang/binary.lua: Bump binary version for ctable hash change.

Python tests that share setup and teardown are "vulnerable" to side effects from one subtest affecting future subtests. In this case, the set of tests resulted in multiple softwires terminating at ::1, which then caused problems when projecting the binding table onto the ietf-softwire model: whereas before there was no issue because the answer that we were looking for happened to be first (or last?) in the set of softwires associated with ::1, the change in hash function changed the retrieval order, and thus caused the test to fail. Hacked around by changing the tests to not define multiple softwires with the same IPv6 B4 address.

wingo · 2017-06-09T16:07:17Z

Fixed the test failure by fixing up the test in question, and fixed the doc failure by merging from next. Ready to go in, I think :)

lukego · 2017-07-03T08:51:35Z

Merged! Thanks for the patience.

wingo added 18 commits June 1, 2017 13:37

Allow "snabbmark hash" to test different key sizes

cbb5331

Instead of always hashing a single uint32, "snabbmark hash" hashes a byte vector which is filled each iteration. This imposes some overhead but lets us get closer to checking ctable performance.

Add siphash implementation

97d40c9

Allow customization of number of siphash rounds

429a3b0

Allow specialization of SipHash to fixed-size inputs

e50ad4c

Add SipHash1,2 to snabbmark

2f7983d

Refactor siphash implementation

f1b7053

Add SSE and AVX2 parallelism to SipHash

1e3ea13

This is a somewhat experimental commit, while I explore performance options.

Fixups to make AVX2 work also

da86b3d

fix up generic assembler output, except for avx2

e914c77

Use vzeroupper before leaving AVX2

fda842d

Add generic X86-64 backend to SipHash assembler

0caaa4b

Remove hand-coded SipHash and simplify snabbmark

24fca9f

Fix AVX2 result extraction.

d722f3c

siphash: Fall back to SSE if AVX2 unavailable

18dbf26

Clean up the SipHash implementation, separating out input stride from size, and fall back to SSE if AVX2 isn't available.

siphash: Take a table of options instead of many positional args

296073c

siphash: Add a function that can hash an immediate

bf60d21

Switch ctable over to SipHash1,2

56c0b94

Integrate multi-hash with ctable lookup streamer

76cbc59

This should speed up hash table lookups. This commit also refactors the siphash interface to hide the granularity of the parallelism.

wingo mentioned this pull request Jun 5, 2017

[WIP] IPFIX/Netflow exporter app #1149

Closed

wingo added 3 commits June 9, 2017 11:41

Bump binary version for ctable hash change

d1ab616

* src/lib/yang/binary.lua: Bump binary version for ctable hash change.

Merge remote-tracking branch 'SnabbCo/next' into siphash-for-next

2d6ebd8

lukego merged commit 2d6ebd8 into snabbco:next Jul 3, 2017

lukego added a commit that referenced this pull request Jul 3, 2017

Merge #1155 branch 'igalia/siphash-for-next' into next

1b9778d

lukego added the merged label Jul 3, 2017

wingo deleted the siphash-for-next branch July 7, 2017 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch ctable to use SipHash #1155

Switch ctable to use SipHash #1155

wingo commented Jun 2, 2017

lukego commented Jun 6, 2017

wingo commented Jun 9, 2017

lukego commented Jul 3, 2017

Switch ctable to use SipHash #1155

Switch ctable to use SipHash #1155

Conversation

wingo commented Jun 2, 2017

lukego commented Jun 6, 2017

wingo commented Jun 9, 2017

lukego commented Jul 3, 2017