Compute xxHash hash codes for 8 keys in parallel
C++ CMake
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src
CMakeLists.txt
README.md

README.md

parallel-xxhash

Two implementations of the xxHash hash function (specifically, the 32 bit version).

  • parallel: Computing the hash values of 8 keys in parallel, using AVX2 intrinsics.
  • scalar: Computing the hash value of a single key.

These are very special purpose implementations, and will not be of any interest in most programs. Use these functions if:

  • Your keys have a constant size, and are a multiple of 4 bytes long. Variable size keys are not supported. Nor are non-word aligned keys.
  • You have an application that can receive and process inputs in batches such that you usually have 8 keys to process at once. (Doesn't need to be full batches of 8, but the break-even point vs. a fast scalar hash is probably around batches of 3-4 keys).
  • You can arrange for your hash keys to be in a column-major order without too much pain.
  • You can compile the program with -mavx2. (While there is a non-avx2 fallback, there's not a lot of point to it).
  • A single 32 bit hash value per key is sufficient.

The scalar implementation is mainly included as a fallback, for programs that generally use the parallel code, but have some exceptional cases that need identical hash codes for individual keys. Except for very small keys (at most 20 bytes), you are probably better off with the reference implementation of some modern hash function.

DATA LAYOUT

For the parallel version the keys should be laid out adjacent to each other, in column-major order. That is, the first word in keys should be the first word of the first key. The second word of keys should be the first word of the second key. And so on:

  key1[0] key2[0] ... key7[0]
  key2[1] key2[1] ... key7[1]
  ...
  key1[SizeWords-1] key2[SizeWords-1] ... key7[SizeWords-1]

EXAMPLES

Assume the following definitions:

   static const uint32_t KEY_LENGTH = 3;

   static uint32_t rows[][KEY_LENGTH] = {
      {1, 2, 3},
      {4, 5, 6},
      {7, 8, 9},
      {10, 11, 12},
      {13, 14, 15},
      {16, 17, 18},
      {19, 20, 21},
      {22, 23, 24},
  };

  static uint32_t cols[][8] = {
      {1, 4, 7, 10, 13, 16, 19, 22},
      {2, 5, 8, 11, 14, 17, 20, 23},
      {3, 6, 9, 12, 15, 18, 21, 24},
  };

  static uint32_t seeds[] = { 0x3afc8e77, 0x924f408d };
  static const uint32_t seed_count = sizeof(seeds) / sizeof(uint32_t);

The implementations could be used as follows to compute hash values for each of the key / seed combinations:

  void example_scalar() {
      for (int s = 0; s < seed_count; ++s) {
          for (int i = 0; i < 8; ++i) {
              uint32_t* row = rows[i];
              uint32_t res = xxhash32<3>::scalar(row, seeds[s]);
              printf("seed=%08x, key={%u,%u,%u}, hash=%u\n",
                     seeds[s], row[0], row[1], row[2], res);
          }
      }
  }

  void example_parallel() {
      for (int s = 0; s < seed_count; ++s) {
          uint32_t res[8];
          __m256i hash = xxhash32<3>::parallel(cols[0], seeds[s], res);
          for (int i = 0; i < 8; ++i) {
              printf("seed=%08x, key={%u,%u,%u}, hash=%u\n",
                     seeds[s], cols[0][i], cols[1][i], cols[2][i],
                     res[i]);
          }
      }
  }