Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to murmurhash3 to route documents to shards. #7954

Closed
wants to merge 4 commits into from

Commits on Oct 16, 2014

  1. Switch to murmurhash3 to route documents to shards.

    We currently use the djb2 hash function in order to compute the shard a
    document should go to. Unfortunately this hash function is not very
    sophisticated and you can sometimes hit adversarial cases, such as numeric ids
    on 33 shards.
    
    Murmur3 generates hashes with a better distribution, which should avoid the
    adversarial cases.
    
    Here are some examples of how 100000 incremental ids are distributed to shards
    using either djb2 or murmur3.
    
    5 shards:
    Murmur3: [19933, 19964, 19940, 20030, 20133]
    DJB:     [20000, 20000, 20000, 20000, 20000]
    
    3 shards:
    Murmur3: [33185, 33347, 33468]
    DJB:     [30100, 30000, 39900]
    
    33 shards:
    Murmur3: [2999, 3096, 2930, 2986, 3070, 3093, 3023, 3052, 3112, 2940, 3036, 2985, 3031, 3048, 3127, 2961, 2901, 3105, 3041, 3130, 3013, 3035, 3031, 3019, 3008, 3022, 3111, 3086, 3016, 2996, 3075, 2945, 2977]
    DJB:     [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 900, 900, 900, 900, 1000, 1000, 10000, 10000, 10000, 10000, 9100, 9100, 9100, 9100, 9000, 9000, 0, 0, 0, 0, 0, 0]
    
    Even if djb2 looks ideal in some cases (5 shards), the fact that the
    distribution of its hashes has some patterns can raise issues with some shard
    counts (eg. 3, or even worse 33).
    
    Some tests have been modified because they relied on implementation details of
    the routing hash function.
    jpountz committed Oct 16, 2014
    Configuration menu
    Copy the full SHA
    a7c83c5 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5bf4127 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e8c89d5 View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2014

  1. New iteration.

     - index creation version as a first-class citizen of index metadata
     - 32-bits murmur3 instead of 128 (we only took 32 bits of it anyway)
     - more tests
    jpountz committed Oct 17, 2014
    Configuration menu
    Copy the full SHA
    38a384d View commit details
    Browse the repository at this point in the history