hash func 是我们将 string 作为 map key 时常用的技巧。今天碰巧看到一个 webpage 上介绍了几个 hash func。
文中作者推荐 djb2 hash func,号称是最快的 string hash algorithm 了。比目前项目中用的 Peter K. Pearson 还快上几分。
下面是两个算法对同一个字符串,运算了 9,999,999 次的耗时比较。不过自己仅仅做了性能测试,对于不同的字符串,哪个算法能得到更分散的值,就没测试了。(也不知道如何测试 - -#)
Peter K. Pearson
real 0m0.633s
user 0m0.633s
sys 0m0.001s
djb2
real 0m0.519s
user 0m0.519s
sys 0m0.000s
this algorithm (k=33) was first reported by dan bernstein many years ago in comp.lang.c. another version of this algorithm (now favored by bernstein) uses xor: hash(i) = hash(i - 1) * 33 ^ str[i]; the magic of number 33 (why it works better than many other constants, prime or not) has never been adequately explained.
unsigned long
hash(unsigned char *str)
{
unsigned long hash = 5381;
int c;
while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}