C++ C Assembly CMake Shell Batchfile
Switch branches/tags
Nothing to show
Clone or download
rurban re-enable cmetrohash64_1* on 32bit x64
rather probe for __SSE4_2__ for such processors.
Fixes #45
Latest commit 862a519 Jul 28, 2018
Permalink
Failed to load latest commit information.
doc add judyhash speed and quality Jan 10, 2018
t1ha update t1ha to upstream master. Jun 20, 2018
.appveyor.yml appveyor: enforce Release type and proper x64 target Jul 26, 2018
.gitignore ignore more temp dirs and files Nov 1, 2015
.travis.yml Fix 32bit regressions Jul 26, 2018
AvalancheTest.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
AvalancheTest.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Bitslice.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Bitvec.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Bitvec.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
CMakeLists.txt msvc: define __x86_64__ Jul 28, 2018
City.cpp Successfully builds on FreeBSD 10.2 Mar 8, 2016
City.h Add experimental CityHash32WithSeed Jun 16, 2015
CityCrc.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
CityTest.cpp Add experimental CityHash32WithSeed Jun 16, 2015
DifferentialTest.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
DifferentialTest.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
FarmTest.cc add Google FarmHash, deprecating CityHash Jun 17, 2015
Hashes.cpp add judyhash speed and quality Jan 10, 2018
Hashes.h fix jody_hash for building 32- and 64-bit targets simultaneously. Jul 27, 2018
KeysetTest.cpp FNV32a_YoshimitsuTRIAD: Fix AppendedZeroes Jun 20, 2015
KeysetTest.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
MurmurHash1.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
MurmurHash1.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
MurmurHash2.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
MurmurHash2.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
MurmurHash3.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
MurmurHash3.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
PMurHash.c Fix PMurHash.c mingw clang 64-bit compilation. Jul 9, 2018
PMurHash.h fixes for Windows and building with MSVC. Dec 9, 2016
Platform.cpp Successfully builds on FreeBSD 10.2 Mar 8, 2016
Platform.h msvc: define __x86_64__ Jul 28, 2018
README.md fix crc64 doc links Jun 11, 2018
Random.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Random.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
SpeedTest.cpp Serialize invocations of hash functions Aug 27, 2016
SpeedTest.h Add Average results to SpeedTest Jun 17, 2015
Spooky.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Spooky.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
SpookyTest.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Stats.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Stats.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
SuperFastHash.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Types.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Types.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
build-msys2.bat add windows smoker Jul 26, 2018
build.sh build.sh: build with GCC on Linux by default Jan 10, 2018
build32.sh Fix 32bit regressions Jul 26, 2018
cmetrohash.h Added C implementation of MetroHash64 Jun 27, 2015
cmetrohash64.c make cmetrohash optimized variant standalone Jun 28, 2015
crc.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
crc32-generated-constants.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
crc32_hw.c initial extension of https://code.google.com/p/smhasher Mar 27, 2014
crc32_hw1.c initial extension of https://code.google.com/p/smhasher Mar 27, 2014
crc32c.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
crc32c.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
falkhash-elf64.o add falkhash.asm from https://github.com/gamozolabs/falkhash Oct 27, 2015
falkhash-macho64.o add falkhash.asm from https://github.com/gamozolabs/falkhash Oct 27, 2015
falkhash.asm add falkhash.asm from https://github.com/gamozolabs/falkhash Oct 27, 2015
farmhash-c-test.cc add farmhash (C99) Feb 12, 2016
farmhash-c.c smhasher: fix MSVC2015 x64 build. Jan 12, 2017
farmhash-c.h add farmhash (C99) Feb 12, 2016
farmhash.cc smhasher: fix MSVC2015 x64 build. Jan 12, 2017
farmhash.h update farmhash (1.1) Feb 12, 2016
fasthash.cpp smhasher: fix MSVC2015 x64 build. Jan 12, 2017
fasthash.h add fasthash Nov 1, 2015
fhtw-elf64.o prepare fhtw binaries Jan 10, 2018
fhtw-macho64.o prepare fhtw binaries Jan 10, 2018
fhtw.asm prepare fhtw binaries Jan 10, 2018
fhtw.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
halfsiphash.c more fixes building by MSVC. Jul 27, 2018
hasshe2.c initial extension of https://code.google.com/p/smhasher Mar 27, 2014
jody_hash32.c fix jody_hash for building 32- and 64-bit targets simultaneously. Jul 27, 2018
jody_hash32.h fix jody_hash for building 32- and 64-bit targets simultaneously. Jul 27, 2018
jody_hash64.c fix jody_hash for building 32- and 64-bit targets simultaneously. Jul 27, 2018
jody_hash64.h fix jody_hash for building 32- and 64-bit targets simultaneously. Jul 27, 2018
log.hashes redo all benchmarks Aug 29, 2016
log.speed new speed with gcc-7 and pti patches on linux 4.14 Jan 10, 2018
lookup3.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
main.cpp re-enable cmetrohash64_1* on 32bit x64 Jul 28, 2018
md5.cpp Fixes for issue #24 (#25) Feb 2, 2017
metrohash.h add metrohash64crc, do crc hw detection for metro Jun 16, 2015
metrohash128.cpp add metrohash May 27, 2015
metrohash128crc.cpp add metrohash May 27, 2015
metrohash64.cpp add metrohash May 27, 2015
metrohash64crc.cpp add metrohash64crc, do crc hw detection for metro Jun 16, 2015
mum.cc t1ha: adds MUM hash into SMHasher. Nov 18, 2016
mum.h HalfSipHash, SipHash13, fix crc32_hw1 Feb 3, 2017
opt_cmetrohash.h Added my optimized cmetrohash64, noop oaat read (speed reference) Jun 28, 2015
opt_cmetrohash64_1.c smhasher: fix MSVC2015 x64 build. Jan 12, 2017
os_dependent_stuff.asm initial extension of https://code.google.com/p/smhasher Mar 27, 2014
sha1.cpp Fix sha1 on BIG_ENDIAN Feb 5, 2017
sha1.h Fixes for issue #24 (#25) Feb 2, 2017
siphash.c more fixes building by MSVC. Jul 27, 2018
siphash.h HalfSipHash, SipHash13, fix crc32_hw1 Feb 3, 2017
siphash_impl.h add SipHash Mar 29, 2014
siphash_sse2.c HalfSipHash, SipHash13, fix crc32_hw1 Feb 3, 2017
siphash_ssse3.c HalfSipHash, SipHash13, fix crc32_hw1 Feb 3, 2017
speed.sh Update README from re-done benchmarks Jan 10, 2018
speedall.sh add speedall.sh Oct 27, 2015
split.pl initial extension of https://code.google.com/p/smhasher Mar 27, 2014
t1ha.h update t1ha to upstream master. Jun 20, 2018
testall.sh testall.sh: use cmake in build dir Jun 17, 2015
testspeed.sh add judyhash speed and quality Jan 10, 2018
xxhash.c XXH_256: harmonize int types Jun 16, 2015
xxhash.h add xxHash May 24, 2015

README.md

SMhasher

Hash function MiB/sec cycles/hash Quality problems
donothing32 23271327.59 5.00 test NOP
donothing64 22524919.33 5.00 test NOP
donothing128 35531868.85 5.00 test NOP
NOP_OAAT_read64 2098.56 33.66 test NOP
BadHash 452.47 108.78 test FAIL
sumhash 7220.74 33.77 test FAIL
sumhash32 21605.70 14.16 test FAIL
--------------------------------------
crc32 392.09 135.20 insecure, 8589.93x collisions, distrib
md5_32a 352.25 674.76 8589.93x collisions, distrib
sha1_32a 373.41 1492.19 collisions, 36.6% distrib
hasshe2 3139.96 70.18 insecure, 100% bias, collisions, distrib
crc32_hw 6331.24 29.89 insecure, 100% bias, collisions, distrib, machine-specific (x86 SSE4.2)
crc32_hw1 23011.78 35.72 insecure, 100% bias, collisions, distrib, machine-specific (x86 SSE4.2)
crc64_hw 8423.86 29.36 insecure, 100% bias, collisions, distrib, machine-specific (x86_64 SSE4.2)
FNV1a 790.45 69.32 zeros, 100% bias, collisions, distrib
FNV1a_YT 8949.71 27.97 100% bias, collisions, distrib
FNV64 791.85 69.31 100% bias, collisions, distrib
bernstein 791.84 67.09 100% bias, collisions, distrib
sdbm 790.50 66.69 100% bias, collisions, distrib
x17 527.91 96.67 99.98% bias, collisions, distrib
JenkinsOOAT 452.49 141.18 53.5% bias, collisions, distrib
JenkinsOOAT_pl 452.49 118.65 1.5-11.5% bias, 7.2x collisions
MicroOAAT 972.14 59.82 100% bias, distrib
jodyhash32 1428.46 44.25 bias, collisions, distr
jodyhash64 2843.60 39.53 bias, collisions, distr
lookup3 1744.87 47.23 28% bias, collisions, 30% distr
superfast 1570.59 57.55 91% bias, 5273.01x collisions, 37% distr
MurmurOAAT 451.66 114.34 collisions, 99.998% distr
Crap8 3149.87 34.14 2.42% bias, collisions, 2% distrib
Murmur2 3139.49 40.63 1.7% bias, 81x coll, 1.7% distrib
Murmur2A 3139.50 45.14 12.7% bias
Murmur2B 4867.23 46.49 1.8% bias, collisions, 3.4% distrib
Murmur2C 3919.19 46.27 91% bias, collisions, distr
HalfSipHash 747.06 121.13 zeroes
SipHash13 1865.13 100.55 0.9% bias
--------------------------------------
SipHash 978.37 139.56
GoodOAAT 1047.85 71.39
PMurHash32 2348.59 57.56
Murmur3A 2329.88 50.22
Murmur3C 3186.78 66.49
Murmur3F 5256.15 50.23
fasthash32 4661.31 50.81
fasthash64 4612.00 48.14
MUM 6564.38 39.85 machine-specific (32/64 differs)
City32 3818.33 53.02
City64 9200.87 55.70 2 minor collisions
City128 10105.87 89.41
CityCrc128 12638.87 90.65
FarmHash64 9187.94 62.93
FarmHash128 9877.60 82.89
FarmHash32 24831.45 24.99 machine-specific (x86_64 SSE4/AVX)
farmhash32_c 24647.21 25.36 machine-specific (x86_64 SSE4/AVX)
farmhash64_c 9149.44 74.39
farmhash128_c 9959.95 99.01
xxHash32 5414.57 46.85 collisions with 4bit diff
xxHash64 10288.36 58.79
Spooky32 9899.74 68.10
Spooky64 9885.49 68.00
Spooky128 9901.48 68.56
metrohash64_1 9541.68 50.85
metrohash64_2 9569.20 50.86
metrohash128_1 9908.72 81.39
metrohash128_2 9943.95 81.31
metrohash64crc_1 14007.26 55.84 cyclic collisions 8 byte, machine-specific (x86_64 SSE4.2)
metrohash64crc_2 13932.90 55.90 cyclic collisions 8 byte, machine-specific (x86_64 SSE4.2)
metrohash128crc_1 13993.55 86.92 machine-specific (x86_64 SSE4.2)
metrohash128crc_2 13929.50 86.90 machine-specific (x86_64 SSE4.2)
cmetrohash64_1o 8665.09 50.82
cmetrohash64_1 9522.76 51.04
cmetrohash64_2 9470.74 50.80
falkhash 19984.13 173.46 machine-specific (x86_64 AES-NI)
t1ha_64be 7146.84 39.78
t1ha_32le 5577.12 42.53
t1ha_32be 4266.89 41.72
t1ha 9590.96 36.51
t1ha_crc 13775.04 35.87 machine-specific (x86 SSE4.2)
t1ha_aes 19927.77 36.02 machine-specific (x86 AES-NI)

Summary

I added some SSE assisted hashes and fast intel/arm CRC32-C and AES HW variants, but not the fastest crcutil yet. See our crcutil results. See also the old https://code.google.com/p/smhasher/w/list.

So the fastest hash functions on x86_64 without quality problems are:

  • t1ha
  • falkhash (macho64 and elf64 nasm only, with HW AES extension)
  • Metro (but not 64crc yet, WIP)
  • FarmHash (not portable, too machine specific: 64 vs 32bit, old gcc, ...)
  • Spooky32
  • xxHash64
  • fasthash
  • City (deprecated)
  • mum (machine specific, mum: different results on 32/64-bit archs)

Hash functions for symbol tables or hash tables typically use 32 bit hashes, for databases, file systems and file checksums typically 64 or 128bit, for crypto now starting with 256 bit.

Typical median key size in perl5 is 20, the most common 4. Similar for all other dynamic languages. See github.com/rurban/perl-hash-stats

When used in a hash table the instruction cache will usually beat the CPU and throughput measured here. In my tests the smallest FNV1A beats the fastest crc32_hw1 with Perl 5 hash tables. Even if those worse hash functions will lead to more collisions, the overall speed advantage beats the slightly worse quality. See e.g. A Seven-Dimensional Analysis of Hashing Methods and its Implications on Query Processing for a concise overview of the best hash table strategies, confirming that the simpliest Mult hashing (bernstein, FNV*, x17, sdbm) always beat "better" hash functions (Tabulation, Murmur, Farm, ...) when used in a hash table.

The fast hash functions tested here are recommendable as fast for file digests and maybe bigger databases, but not for 32bit hash tables. The "Quality problems" lead to less uniform distribution, i.e. more collisions and worse performance, but are rarely related to real security attacks, just the 2nd sanity test against \0 invariance is security relevant.

Other

TODO

Some popular SSE-improved FNV1 (sanmayce) variants, fletcher (ZFS), ... and slower cryptographic hashes or more secure hashes are still missing. BLAKE2, SHA-2, SHA-3 (Keccak), Grøstl, JH, Skein, ...

SECURITY

The hash table attacks described in SipHash against City, Murmur or Perl JenkinsOAAT or at Hash Function Lounge are not included here.

Such an attack avoidance cannot be the problem of the hash function, but the hash table collision resolution scheme. You can attack every single hash function, even the best and most secure if you detect the seed, e.g. from language (mis-)features, side-channel attacks, collision timings and independly the sort-order, so you need to protect your collision handling scheme from the worst-case O(n), i.e. separate chaining with linked lists. Linked lists chaining allows high load factors, but is very cache-unfriendly. The only recommendable linked list scheme is inlining the key or hash into the array. Nowadays everybody uses fast open addressing, even if the load factor needs to be ~50%, unless you use Cuckoo Hashing.

I.e. the usage of SipHash for their hash table in Python 3.4, ruby, rust, systemd, OpenDNS, Haskell and OpenBSD is pure security theatre. SipHash is not secure enough for security purposes and not fast enough for general usage. Brute-force generation of ~32k collisions need 2-4m for all these hashes. siphash being the slowest needs max 4m, other typically max 2m30s, with <10s for practical 16k collision attacks with all hash functions. Using Murmur is usually slower than a simple Mult, even in the worst case. Provable secure is only uniform hashing, i.e. 2-5 independent Mult or Tabulation, or using a guaranteed logarithmic collision scheme (a tree) or a linear collision scheme, such as Robin Hood or Cockoo hashing with collision counting.

One more note regarding security: Nowadays even SHA1 can be solved in a solver, like Z3 (or faster ones) for practical hash table collision attacks (i.e. 14-20 bits). So all hash functions with less than 256 bits tested here cannot be considered "secure" at all.

The '\0' vulnerability attack with binary keys is tested in the 2nd Sanity test.