C++ C Assembly Other
Latest commit 92b69f0 Jan 17, 2017 @rurban committed on GitHub Merge pull request #23 from leo-yuriev/master
t1ha_aes and -march=native
Permalink
Failed to load latest commit information.
doc results for GoodOAAT, MicroOAAT Dec 11, 2016
.gitignore ignore more temp dirs and files Nov 1, 2015
.travis.yml travis: give up on osx there Mar 15, 2016
AvalancheTest.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
AvalancheTest.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Bitslice.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Bitvec.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Bitvec.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
CMakeLists.txt smhasher: set CMAKE_POSITION_INDEPENDENT_CODE for t1ha. Jan 13, 2017
City.cpp Successfully builds on FreeBSD 10.2 Mar 8, 2016
City.h Add experimental CityHash32WithSeed Jun 16, 2015
CityCrc.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
CityTest.cpp Add experimental CityHash32WithSeed Jun 16, 2015
DifferentialTest.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
DifferentialTest.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
FarmTest.cc add Google FarmHash, deprecating CityHash Jun 17, 2015
Hashes.cpp Small One-At-A-Time functions that passes SMHasher Dec 11, 2016
Hashes.h t1ha: fix enabling t1ha_ia32aes() for MSVC. Jan 12, 2017
KeysetTest.cpp FNV32a_YoshimitsuTRIAD: Fix AppendedZeroes Jun 20, 2015
KeysetTest.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
MurmurHash1.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
MurmurHash1.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
MurmurHash2.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
MurmurHash2.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
MurmurHash3.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
MurmurHash3.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
PMurHash.c add JenkinsOOAT_perl (an old bad artefact, used in perl until 5.16) Apr 11, 2014
PMurHash.h fixes for Windows and building with MSVC. Dec 9, 2016
Platform.cpp Successfully builds on FreeBSD 10.2 Mar 8, 2016
Platform.h fixes for Windows and building with MSVC. Dec 9, 2016
README.md results for GoodOAAT, MicroOAAT Dec 11, 2016
Random.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Random.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
SpeedTest.cpp Serialize invocations of hash functions Aug 27, 2016
SpeedTest.h Add Average results to SpeedTest Jun 17, 2015
Spooky.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Spooky.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
SpookyTest.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Stats.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Stats.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
SuperFastHash.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Types.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
Types.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
build.sh Merge branch 'osx-clang' of https://github.com/mqudsi/smhasher into m… Mar 9, 2016
build32.sh minor fixes Nov 18, 2016
cmetrohash.h Added C implementation of MetroHash64 Jun 27, 2015
cmetrohash64.c make cmetrohash optimized variant standalone Jun 28, 2015
crc.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
crc32-generated-constants.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
crc32_hw.c initial extension of https://code.google.com/p/smhasher Mar 27, 2014
crc32_hw1.c initial extension of https://code.google.com/p/smhasher Mar 27, 2014
crc32c.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
crc32c.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
falkhash-elf64.o add falkhash.asm from https://github.com/gamozolabs/falkhash Oct 27, 2015
falkhash-macho64.o add falkhash.asm from https://github.com/gamozolabs/falkhash Oct 27, 2015
falkhash.asm add falkhash.asm from https://github.com/gamozolabs/falkhash Oct 27, 2015
farmhash-c-test.cc add farmhash (C99) Feb 12, 2016
farmhash-c.c smhasher: fix MSVC2015 x64 build. Jan 12, 2017
farmhash-c.h add farmhash (C99) Feb 12, 2016
farmhash.cc smhasher: fix MSVC2015 x64 build. Jan 12, 2017
farmhash.h update farmhash (1.1) Feb 12, 2016
fasthash.cpp smhasher: fix MSVC2015 x64 build. Jan 12, 2017
fasthash.h add fasthash Nov 1, 2015
fhtw.asm initial extension of https://code.google.com/p/smhasher Mar 27, 2014
fhtw.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
hasshe2.c initial extension of https://code.google.com/p/smhasher Mar 27, 2014
log.hashes redo all benchmarks Aug 29, 2016
log.speed redo all benchmarks Aug 29, 2016
lookup3.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
main.cpp smhasher: don't default to metrohash64crc when it not available. Jan 12, 2017
md5.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
metrohash.h add metrohash64crc, do crc hw detection for metro Jun 16, 2015
metrohash128.cpp add metrohash May 27, 2015
metrohash128crc.cpp add metrohash May 27, 2015
metrohash64.cpp add metrohash May 27, 2015
metrohash64crc.cpp add metrohash64crc, do crc hw detection for metro Jun 16, 2015
mum.cc t1ha: adds MUM hash into SMHasher. Nov 18, 2016
mum.h t1ha: adds MUM hash into SMHasher. Nov 18, 2016
opt_cmetrohash.h Added my optimized cmetrohash64, noop oaat read (speed reference) Jun 28, 2015
opt_cmetrohash64_1.c smhasher: fix MSVC2015 x64 build. Jan 12, 2017
os_dependent_stuff.asm initial extension of https://code.google.com/p/smhasher Mar 27, 2014
sha1.cpp initial extension of https://code.google.com/p/smhasher Mar 27, 2014
sha1.h initial extension of https://code.google.com/p/smhasher Mar 27, 2014
siphash.c add SipHash Mar 29, 2014
siphash.h fixes for Windows and building with MSVC. Dec 9, 2016
siphash_impl.h add SipHash Mar 29, 2014
siphash_sse2.c add SipHash Mar 29, 2014
siphash_ssse3.c add SipHash Mar 29, 2014
speed.sh apply new speed benchmark to README.md Jun 17, 2015
speedall.sh add speedall.sh Oct 27, 2015
split.pl initial extension of https://code.google.com/p/smhasher Mar 27, 2014
t1ha.c t1ha: update (fix build without -march=native). Jan 12, 2017
t1ha.h t1ha: update (fix build without -march=native). Jan 12, 2017
testall.sh testall.sh: use cmake in build dir Jun 17, 2015
testspeed.sh add falkhash.asm from https://github.com/gamozolabs/falkhash Oct 27, 2015
xxhash.c XXH_256: harmonize int types Jun 16, 2015
xxhash.h add xxHash May 24, 2015

README.md

SMhasher

Hash function MiB/sec cycles/hash Quality problems
donothing32 9637878.60 5.48 overall bad
donothing64 8211016.32 5.15 overall bad
donothing128 7676888.89 5.83 overall bad
NOP_OAAT_read64 2414.92 45.03 100% bias, 2.17x collisions
crc32 569.71 97.67 insecure, 8589.93x collisions, distrib
md5_32a 414.75 533.59 8589.93x collisions, distrib
sha1_32a 657.77 971.34 collisions, 36.6% distrib
hasshe2 2292.04 72.23 insecure,100% bias, collisions, distrib
crc32_hw 7733.44 43.48 insecure,100% bias, collisions, distrib
crc64_hw 10937.53 23.33 insecure,100% bias, collisions, distrib
crc32_hw1 30703.38 27.08 insecure,100% bias, collisions, distrib
FNV1a 1026.17 52.90 zeros,100% bias, collisions, distrib
FNV1a_YoshimitsuTRIAD 15700.04 20.62 100% bias, collisions, distrib
FNV64 1015.60 53.55 100% bias, collisions, distrib
bernstein 1026.17 51.13 100% bias, collisions, distrib
sdbm 977.48 51.75 100% bias, collisions, distrib
x17 893.28 62.36 99.98% bias, collisions, distrib
JenkinsOOAT 563.23 104.31 53.5% bias, collisions, distrib
JenkinsOOAT_perl 676.91 87.82 1.5-11.5% bias, 7.2x collisions
MicroOAAT 913.58 60.92 100% bias, distrib
lookup3 3303.80 31.66 28% bias, collisions, 30% distr
superfast 3158.88 35.44 91% bias, 5273.01x collisions, 37% distr
MurmurOAAT 719.76 76.48 collisions, 99.998% distr
Crap8 4069.57 26.83 2.42% bias, collisions, 2% distrib
Murmur2 4049.53 31.64 1.7% bias, 81x coll, 1.7% distrib
Murmur2A 4031.16 34.56 12.7% bias
Murmur2B 8062.20 33.62 1.8% bias, collisions, 3.4% distrib
Murmur2C 5381.36 35.33 91% bias, collisions, distr
----------------------
GoodOAAT 1352.94 50.80
PMurHash32 3034.35 45.11
Murmur3A 3263.93 37.86
Murmur3C 3733.90 48.66
Murmur3F 7136.07 36.33
fasthash32 7198.92 33.05
fasthash64 6911.62 33.57
City32 6054.93 41.96 2 minor collisions
City64 14116.86 46.76
City128 13512.87 47.20
CityCrc128 21009.15 51.60
FarmHash64 12795.07 49.34 machine-specific
FarmHash128 13514.39 59.63 machine-specific
FarmHash32 24831.45 24.99 disabled. too machine-specific
farmhash32_c 24647.21 25.36
farmhash64_c 14976.48 41.88
farmhash128_c 15856.60 56.52
SipHash 1264.31 117.20
Spooky32 15352.40 43.98
Spooky64 16411.41 43.69
Spooky128 16358.37 45.04 collisions with 4bit diff
xxHash32 4494.48 47.90
xxHash64 15580.86 44.80
metrohash64_1 16555.13 37.22
metrohash64_2 16287.46 39.82
metrohash128_1 15802.90 47.65
metrohash128_2 15500.74 48.16 cyclic collisions 8 byte
metrohash64crc_1 23946.99 45.67 cyclic collisions 8 byte
metrohash64crc_2 25105.50 38.75
metrohash128crc_1 27411.55 46.06
metrohash128crc_2 27790.85 45.67
cmetrohash64_1_o 17237.75 36.95
cmetrohash64_1 16463.21 36.30
cmetrohash64_2 17188.28 35.40
falkhash 39817.46 124.81
t1ha 15480.28 26.41
t1ha_64be 5203.00 53.69
t1ha_32le 8930.90 29.79
t1ha_32be 6931.84 34.17
t1ha_crc 16757.73 28.69
MUM 11942.99 30.75 machine-specific

Summary

I added some SSE assisted hashes and fast intel/arm CRC32-C and AES HW variants, but not the fastest crcutil yet. See our crcutil results. See also the old https://code.google.com/p/smhasher/w/list.

So the fastest hash functions on x86_64 without quality problems are:

  • falkhash (macho64 and elf64 nasm only, with HW AES extension)
  • t1ha + mum (machine specific, mum: different arch results)
  • FarmHash (not portable, too machine specific: 64 vs 32bit, old gcc, ...)
  • Metro (but not 64crc yet, WIP)
  • Spooky32
  • xxHash64
  • fasthash
  • City (deprecated)

Hash functions for symbol tables or hash tables typically use 32 bit hashes, for databases, file systems and file checksums typically 64 or 128bit, for crypto now starting with 256 bit.

Typical median key size in perl5 is 20, the most common 4. See github.com/rurban/perl-hash-stats

When used in a hash table the instruction cache will usually beat the CPU and throughput measured here. In my tests the smallest FNV1A beats the fastest crc32_hw1 with Perl 5 hash tables. Even if those worse hash functions will lead to more collisions, the overall speed advantage beats the slightly worse quality. See e.g. A Seven-Dimensional Analysis of Hashing Methods and its Implications on Query Processing for a concise overview of the best hash table strategies, confirming that the simpliest Mult hashing (bernstein, FNV*, x17, sdbm) always beat "better" hash functions (Tabulation, Murmur, Farm, ...) when used in a hash table.

The fast hash functions tested here are recommendable as fast for file digests and maybe bigger databases, but not for 32bit hash tables. The "Quality problems" lead to less uniform distribution, i.e. more collisions and worse performance, but are rarely related to real security attacks, just the 2nd sanity test against \0 invariance is security relevant.

Other

TODO

Some popular SSE-improved FNV1 (sanmayce) variants, fletcher (ZFS), ... and slower cryptographic hashes or more secure hashes are still missing. BLAKE2, SHA-2, SHA-3 (Keccak), Grøstl, JH, Skein, ...

SECURITY

The hash table attacks described in SipHash against City, Murmur or Perl JenkinsOAAT or at Hash Function Lounge are not included here.

Such an attack avoidance cannot not be the problem of the hash function, but the hash table collision resolution scheme. You can attack every single hash function, even the best and most secure if you detect the seed, e.g. from the sort-order, so you need to protect your collision handling scheme from the worst-case O(n), i.e. separate chaining with linked lists. Linked lists chaining allows high load factors, but is very cache-unfriendly. The only recommendable linked list scheme is inlining the key or hash into the array. Nowadays everybody uses fast open addressing, even if the load factor needs to be ~50%, unless you use Cuckoo Hashing.

I.e. the usage of siphash for their hash table in Python 3.4, ruby, rust, systemd, OpenDNS, Haskell and OpenBSD is pure security theatre. siphash is not secure enough for security purposes and not fast enough for general usage. Brute-force generation of ~32k collisions need 2-4m for all these hashes. siphash being the slowest needs max 4m, other typically max 2m30s, with <10s for practical 16k collision attacks with all hash functions. Using Murmur is usually slower than a simple Mult, even in the worst case. Provable secure is only uniform hashing, i.e. 2-5 independent Mult or Tabulation, or using a guaranteed logarithmic collision scheme (a tree) or a linear collision scheme, such as Robin Hood or Cockoo hashing with collision counting.

One more note regarding security: Nowadays even SHA1 can be solved in a solver, like Z3 (or faster ones) for practical hash table collision attacks (i.e. 14-20 bits). So all hash functions with less than 256 bits tested here cannot be considered "secure" at all.

The '\0' vulnerability attack with binary keys is tested in the 2nd Sanity test.