Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve hashtable performance. #192

Merged
merged 6 commits into from Apr 29, 2020
Merged

Conversation

dbaarda
Copy link
Member

@dbaarda dbaarda commented Apr 28, 2020

Add a simple/small k=1 bloom filter to the hashtable to reduce hashtable probing (and thrashing cpu memory cache) by more than 50%. It can be turned off by defining HASHTABLE_NBLOOM.

Reduce the max hashtable load factor from 80% to 70% to ensure that most hashtable probing for lookups is within the same 64 byte cache line.

Fix hashcmp_count stats to include comparing against empty buckets. Previously these were not counted, but still required a hashtable lookup. This will increase the hashcmp_count stats compared to previous versions, but with the new bloom filter we avoid many lookups entirely.

Slightly tidy the _for_probe() and _KEY_HASH() macro's using a nozero() static inline to reserve zero entries for empty buckets.

This improves delta performance by between 20%~40%, particularly for small blocksizes and large files.

Add nozero() static inline for avoiding zero hash keys, and make _KEY_HASH()
use it. Simplify _for_probe() to take the hash and not do the nozero() now
it's done by _KEY_HASH().
Also rename the "bloom" attribute to "kbloom" to be more consistent with the
other attribute names.
Analysis suggests at 80% load factor, 5 quadratic probes going a distance of
15 from the initally probed location are likely to spill over into the next 64
byte cache line. For a 70% load factor, <4 probes going a distance of <8 from
the initial location are likely to be within the same cache line,
significantly speeding the lookups.

This also reduces the loading of the bloomfilter to below optimal for k=1, but
modelling suggests using k=2 for the same size bloom filter would be adding
another 50% chance L2 cache lookup and would be slower. Sticking with k=1 is
not only simpler, but faster.
@dbaarda dbaarda merged commit 0750228 into librsync:master Apr 29, 2020
@dbaarda dbaarda deleted the opt/hashtable2 branch April 29, 2020 09:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant