Skip to content
Zobrist hashing in C
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
benchmarks
include
src
.gitignore
LICENSE
Makefile
README.md

README.md

zobristhashing

Zobrist hashing in C

Zobrist hashing is the simplest form of tabulation-based hashing. It can be shown to be 3-wise independent. The Zobrist approach tested here is used in real systems, e.g., Gigablast https://www.gigablast.com/ Alternatively, one could use a tabulation-based function as a complement to other hash functions: first hash the content down to a few bytes (e.g., 4) and then apply a tabulation-based hash on the result.

Tabulation-based hashing uses a lot of memory and is susceptible to cache faults. E.g., to hash 4-byte strings to 64-bit values, you need 8 KB. Moreover, its speed is limited (in part) by the system's ability to issue random access loads.

In an exhaustive experimental evaluation of hash-table performance, Richter et al. (VLDB, 2016) found that Zobrist hashing produces a low throughput. Consequently, the authors declare it to be "less attractive in practice" than its strong randomness properties would suggest.

This C code expects a GCC-like compiler on an x64 system.

The code demonstrates that it is difficult on a x64 to hash much more than 0.65 bytes per cycle on recent Intel processors, even when repeatedly hashing the same short string. In contrast, it is possible to hash 4 to 10 bytes per cycle using fast hash families. See https://github.com/lemire/StronglyUniversalStringHashing

C Usage

zobrist_t k;
init_zobrist(& k);// call once

// then you can hash as many strings as you want:
uint64_t hashvalue = zobrist (mystring, mystringsize, &k)

// for null-terminated strings, you can use:
uint64_t hashvalue = zobrist_nt (mystring,  &k)

// strings longer than 256 bytes fail the theoretical bounds

Usage

make
./benchmark

Relevant software libraries

References

  • Daniel Lemire, The universality of iterated hashing over variable-length strings, Discrete Applied Mathematics 160 (4-5), 2012. http://arxiv.org/abs/1008.1715
  • Owen Kaser, Daniel Lemire, Strongly universal string hashing is fast, Computer Journal 57(11), 2014. http://arxiv.org/abs/1202.4961
  • Daniel Lemire, Owen Kaser, Faster 64-bit universal hashing using carry-less multiplications, Journal of Cryptographic Engineering (to appear) http://arxiv.org/abs/1503.03465
You can’t perform that action at this time.