Skip to content
This repository has been archived by the owner on Dec 26, 2023. It is now read-only.

Fast CRC32 based hash for hash_bytes and hash_int #91

Closed
3 of 4 tasks
martinus opened this issue Aug 13, 2020 · 6 comments
Closed
3 of 4 tasks

Fast CRC32 based hash for hash_bytes and hash_int #91

martinus opened this issue Aug 13, 2020 · 6 comments
Assignees
Labels
performance performance improvements

Comments

@martinus
Copy link
Owner

martinus commented Aug 13, 2020

  • CRC with runtime dispatch
  • Use _mm_crc32_u64
  • Don't use upper bits of the hash, only use lower bits. Then a single crc command would be enough
  • Implement optimized hash_bytes that makes use of CRC32 instruction
@martinus martinus self-assigned this Aug 14, 2020
@martinus martinus added the performance performance improvements label Aug 14, 2020
@martinus martinus changed the title CRC, next try Fast CRC32 based hash for hash_bytes and hash_int Aug 26, 2020
@martinus
Copy link
Owner Author

After lots of experimenting, I've created a CRC32-hardware based hash_bytes algorithm that is quite a bit faster than any other hash algorithms that I know of:

robin_hood und complexityN

The hash implementation favors speed over hash quality, so it is definitely quite a bit worse than any of the contender. The only quality criteria is to be good enough for hashmap.

martinus added a commit that referenced this issue Aug 26, 2020
@ktprime
Copy link

ktprime commented Aug 26, 2020

you need a full test on different os/compiler/cpu. on my cpu AMD 1700, _mm_crc32_u64 is just a liitle fast than FNV-A.

@ktprime
Copy link

ktprime commented Aug 27, 2020

I have do some bench your newest hash function with rand string and rand size(4-64).


clang 9.0.0 (tags/RELEASE_900/final) on llvm/gcc 4.2.1 __cplusplus = 201402 arm64 OS = linux, cpu = Kunpeng 920

stdhash time use = 419 ms
wyhash time use = 264 ms
martin time use = 351 ms
phmap time use = 361 ms

stdhash time use = 623 ms
wyhash time use = 441 ms
martin time use = 620 ms
phmap time use = 619 ms

stdhash time use = 591 ms
wyhash time use = 380 ms
martin time use = 589 ms
phmap time use = 591 ms

stdhash time use = 624 ms
wyhash time use = 449 ms
martin time use = 625 ms
phmap time use = 623 ms


clang 8.0.0 (tags/RELEASE_800/final) on llvm/gcc 4.2.1 __cplusplus = 201703 x86-64 OS = linux, cpu = Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

stdhash time use = 676 ms
wyhash time use = 290 ms
martin time use = 262 ms
phmap time use = 396 ms

stdhash time use = 486 ms
wyhash time use = 329 ms
martin time use = 291 ms
phmap time use = 465 ms

stdhash time use = 424 ms
wyhash time use = 304 ms
martin time use = 273 ms
phmap time use = 410 ms

stdhash time use = 480 ms
wyhash time use = 337 ms
martin time use = 300 ms
phmap time use = 443 ms


clang 10.0.0 on llvm/ms vc++ 1927 __cplusplus = 201703 x86-64 OS = Win, cpu = AMD Ryzen 7 1700 Eight-Core Processor

stdhash time use = 632 ms
wyhash time use = 274 ms
martin time use = 345 ms
phmap time use = 630 ms

stdhash time use = 634 ms
wyhash time use = 277 ms
martin time use = 346 ms
phmap time use = 628 ms

stdhash time use = 637 ms
wyhash time use = 277 ms
martin time use = 357 ms
phmap time use = 633 ms

stdhash time use = 634 ms
wyhash time use = 275 ms
martin time use = 346 ms
phmap time use = 628 ms


gcc 9.3.0 __cplusplus = 201703 x86-64 OS = linux, cpu = Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

stdhash time use = 497 ms
wyhash time use = 257 ms
martin time use = 233 ms
phmap time use = 436 ms

stdhash time use = 498 ms
wyhash time use = 289 ms
martin time use = 247 ms
phmap time use = 438 ms

stdhash time use = 484 ms
wyhash time use = 296 ms
martin time use = 251 ms
phmap time use = 428 ms

stdhash time use = 464 ms
wyhash time use = 297 ms
martin time use = 252 ms
phmap time use = 445 ms


gcc 7.5.0 __cplusplus = 201402 arm64 OS = linux, cpu = Kunpeng 920

stdhash time use = 624 ms
wyhash time use = 444 ms
martin time use = 537 ms
phmap time use = 534 ms

stdhash time use = 665 ms
wyhash time use = 420 ms
martin time use = 613 ms
phmap time use = 625 ms

stdhash time use = 622 ms
wyhash time use = 408 ms
martin time use = 581 ms
phmap time use = 597 ms

stdhash time use = 613 ms
wyhash time use = 436 ms
martin time use = 591 ms
phmap time use = 607 ms


gcc 10.2.0 __cplusplus = 201703 x86-64 OS = Win, cpu = AMD Ryzen 7 1700 Eight-Core Processor

stdhash time use = 339 ms
wyhash time use = 228 ms
martin time use = 297 ms
phmap time use = 338 ms

stdhash time use = 336 ms
wyhash time use = 225 ms
martin time use = 295 ms
phmap time use = 335 ms

stdhash time use = 338 ms
wyhash time use = 226 ms
martin time use = 290 ms
phmap time use = 336 ms

stdhash time use = 339 ms
wyhash time use = 224 ms
martin time use = 296 ms
phmap time use = 336 ms


ms vc++ 1927 x86-64 OS = Win, cpu = AMD Ryzen 7 1700 Eight-Core Processor

stdhash time use = 635 ms
wyhash time use = 336 ms
martin time use = 346 ms
phmap time use = 648 ms

stdhash time use = 652 ms
wyhash time use = 346 ms
martin time use = 348 ms
phmap time use = 632 ms

stdhash time use = 651 ms
wyhash time use = 343 ms
martin time use = 346 ms
phmap time use = 630 ms

stdhash time use = 625 ms
wyhash time use = 344 ms
martin time use = 345 ms
phmap time use = 633 ms

@martinus
Copy link
Owner Author

Thanks a lot for the benchmarks! I only have access to an Intel i7-8700, so this is very appreciated.

What are the 4 benchmark results, is these from 4 different evaluations?

@ktprime
Copy link

ktprime commented Aug 27, 2020

//bench code like this.

static void buildRandString(int size, std::vectorstd::string& rndstring, int str_min, int str_max)
{
std::mt19937_64 srng; srng.seed(time(0));
for (int i = 0; i < size; i++)
rndstring.emplace_back(get_random_alphanum_string(srng() % (str_max - str_min) + str_min));
}

static void testHashString(int size, int str_min, int str_max)
{
std::vectorstd::string rndstring;
rndstring.reserve(size + 8);

char os_info[2048]; printInfo(os_info);
long sum = 0;
for (int i = 0; i < 4; i++)
{
    rndstring.clear();
    buildRandString(size + i, rndstring, str_min, str_max);
    int64_t start = 0;
    int t_find = 0;

    start = getTime();
    for (auto& v : rndstring)
        sum += std::hash<std::string>()(v);
    t_find = (getTime() - start) / 1000;
    printf("stdhash time use = %4d ms\n", t_find);

    start = getTime();
    for (auto& v : rndstring)
        sum += wyhash(v.data(), v.size(), 1);
    t_find = (getTime() - start) / 1000;
    printf("wyhash  time use = %4d ms\n", t_find);

    start = getTime();
    for (auto& v : rndstring)
        sum += robin_hood::hash<std::string>()(v);
    t_find = (getTime() - start) / 1000;
    printf("martin  time use = %4d ms\n", t_find);

    start = getTime();
    for (auto& v : rndstring)
        sum += phmap::Hash<std::string>()(v);
    t_find = (getTime() - start) / 1000;
    printf("phmap   time use = %4d ms\n\n", t_find);
}
printf("sum = %ld\n", sum);

}

testHashString(12345678, 4, 64);

@martinus
Copy link
Owner Author

won't do, I don't want to fix intrinsics and compatibility bugs any more.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
performance performance improvements
Projects
None yet
Development

No branches or pull requests

2 participants