Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-34751: improved hash function for tuples #9534

Closed
wants to merge 1 commit into from

Conversation

jdemeyer
Copy link
Contributor

@jdemeyer jdemeyer commented Sep 24, 2018

This patch improves the hash code for tuples to avoid the obvious hash collision

hash((3, 3)) == hash((-3, -3))

Pseudo-code of the new hash:

def tuplehash(t):
    h = INITIALVALUE
    for x in t:
        y = hash(x)
        y = mangle(y)
        h = (h ^ y) * MULTIPLIER
    return h + FINALVALUE

def mangle(y):
    return y ^ (2 * y)

This has the structure of a standard FNV-1a hash. The line y = mangle(y) was added to avoid hash collisions for nested tuples and to work around collisions due to x ^ -2 = -x for odd x.

The constants were chosen as follows:

  • INITIALVALUE = 3430008: kept from old algorithm
  • FINALVALUE = 97531: kept from old algorithm
  • MULTIPLIER = 3**41 (truncated to the platform bitsize): a sufficiently big odd number without obvious bit structure. The standard FNV multipliers tended to create more collisions, probably due to the high number of 0 bits.

https://bugs.python.org/issue34751

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants