Skip to content
This repository has been archived by the owner on Feb 14, 2023. It is now read-only.

Attempt to make consistent hashing more uniform #7

Closed
wants to merge 1 commit into from

Conversation

paulmach
Copy link
Contributor

This code runs rand strings against the consistent hash function and checks uniformity. Unfortunately the results are not even close to uniform:

  58856 10.0.16.69:11211
  18389 10.0.22.188:11211
  22755 10.0.26.214:11211

This is matches what we're seeing in production (memcache load is not uniform and matches the distribution above). The issue is due to the non-uniformity of the crc32 hash function. :(

So I updated to use a different hash table as well as to default to more replicas on the consistent hash "ring". This gave better results, but I was unable to find something that would be better for all inputs

  29161 10.0.16.69:11211
  37910 10.0.22.188:11211
  32929 10.0.26.214:11211

@mlerner

@mrdmnd
Copy link

mrdmnd commented Aug 18, 2015

I left some comments on the snippet, but:

I've run into this issue before, when building Airbnb's experiment assignment framework's randomizer. It has to do with how the CRC algorithm treats the low-order bits of the input.

We solved this by moving to Murmur3 hash:

uint32_t murmur3_32(const char *key, uint32_t len, uint32_t seed) {
    static const uint32_t c1 = 0xcc9e2d51;
    static const uint32_t c2 = 0x1b873593;
    static const uint32_t r1 = 15;
    static const uint32_t r2 = 13;
    static const uint32_t m = 5;
    static const uint32_t n = 0xe6546b64;

    uint32_t hash = seed;

    const int nblocks = len / 4;
    const uint32_t *blocks = (const uint32_t *) key;
    int i;
    for (i = 0; i < nblocks; i++) {
        uint32_t k = blocks[i];
        k *= c1;
        k = (k << r1) | (k >> (32 - r1));
        k *= c2;

        hash ^= k;
        hash = ((hash << r2) | (hash >> (32 - r2))) * m + n;
    }

    const uint8_t *tail = (const uint8_t *) (key + nblocks * 4);
    uint32_t k1 = 0;

    switch (len & 3) {
    case 3:
        k1 ^= tail[2] << 16;
    case 2:
        k1 ^= tail[1] << 8;
    case 1:
        k1 ^= tail[0];

        k1 *= c1;
        k1 = (k1 << r1) | (k1 >> (32 - r1));
        k1 *= c2;
        hash ^= k1;
    }

    hash ^= len;
    hash ^= (hash >> 16);
    hash *= 0x85ebca6b;
    hash ^= (hash >> 13);
    hash *= 0xc2b2ae35;
    hash ^= (hash >> 16);

    return hash;
}

See this article for more: http://michiel.buddingh.eu/distribution-of-hash-values

@mrdmnd
Copy link

mrdmnd commented Aug 18, 2015

@paulmach
Copy link
Contributor Author

Thanks for the tip. I reversed the bytes as suggested here and got results similar to this fix.

@mrdmnd
Copy link

mrdmnd commented Aug 18, 2015

Awesome. Glad that got sorted.

@paulmach
Copy link
Contributor Author

yeah, but then I tweaked some other things and it got bad again. I'm working on murmur3 now

@paulmach
Copy link
Contributor Author

closed in favor or #8

@paulmach paulmach closed this Aug 18, 2015
@paulmach paulmach deleted the pm/uniform branch August 18, 2015 20:17
@andyxning
Copy link

May be we can chose some existing hash algorithm that works better than crc32 with consistent hashing. Implementing an hash algorithm self is not an optimal solution, existing library is. :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants