-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unified hash for numeric types. #52435
Comments
Here's a patch that makes hash(x) == hash(y) for any numeric types (int, float, complex, Decimal, Fraction, bool) when x and y are numerically equal. This is a prerequisite for making all numeric types accurately comparable with each other. |
Uploaded to Rietveld: |
Updated patch, with a bit of cleanup and some comments describing the hashing strategy; I'll update the Rietveld issue as well. |
Whoops; that patch included some accidental Lib/test/test_decimal changes. Here's the correct patch. |
Why aren't you using 64-bit hashes on 64-bit architectures? |
Mostly because I haven't got around to putting that in yet. :) Ideal would be to use _PyHASH_BITS=61 for 64-bit machines, throughout. |
I assume you mean 63. ;) |
No, I mean 61. 2**61 - 1 is prime; 2**63-1 is not. (So 2 bits of the hash get wasted, but that's not a big deal, especially since they're the high-end bits and Python mostly cares about the lower-order bits.) |
Restore tests accidentally omitted from second patch. |
Updated patch:
|
Another update, partly to address comments raised by Guido on Rietveld. I'll upload these changes to Rietveld later today.
|
Here's a version of the patch that adds exact comparisons between the various numeric types. The only slightly tricky comparison is the Fraction <-> Decimal one: an obvious strategy is to convert the Decimal exactly to a Fraction and then use the fraction comparison, but this is inefficient for Decimal instances with large exponent. So instead, we compare a Decimal |
New patch:
I think this is close to final form: I intend to apply this patch (or something very much like it) soon; any review would be appreciated. |
I've refreshed the Rietveld patch as well: |
Mark, very nice concept! - I'm just starting to review the patch, but I For instance, now on Windows 64-bit: >>> hash(2**61-1)
1073741823 |
Actually the current long_hash() is affected as well. On Windows 64-bit: >>> hash(2**31)
-2147483648
>>> hash(2**32)
1 |
Yes, hash values are C longs, regardless of platform. I think that's probably too ingrained to consider changing it (we'd have to change hashes of all the non-numeric types, too). |
I've finished reviewing the patch and I think it's quite ready to be The documentation in stdtypes.rst says that P = 2**61-1 on 64-bit |
Many thanks for reviewing, Stefan, and for the patch. Here's an updated patch:
One unresolved issue: it would probably make sense to specify (publicly) a hash algorithm for complex types, too, so that someone implementing e.g. Gaussian integers can make their hash function agree with that for the complex type where appropriate. That hash algorithm would probably be as simple as: hr = hash(x.real)
hi = hash(x.imag)
return <some suitably bit-mixing combination of hi and hr> where the algorithm for the combination needs to be specified explicitly, and any relevant parameters put into sys.hash_info. Another tiny detail: I'm wondering whether hash(m/P) should care about the sign of m: currently it doesn't, which means that the symmetry hash(-x) = -hash(x) *almost* always holds for a rational x, but not always. An almost-always-true symmetry seems like a recipe for hard-to-find bugs. |
... and here's the actual patch... Forget my own head next. :) |
I've spent some time working through the new hash function by re-implementing it for gmpy. Very elegant design. I like _PyHASH_MODULUS better, too. Regarding a hash function for complex types, I think documenting the existing behavior for PyComplex is sufficient. The magic number 1000003 could be documented in hash_info as 'multiplier' and _PyHASH_MULTIPLIER. The same constant, but a different algorithm, is also used when hashing a tuple. I think hash(m/P) should preserve sign. It just seems more symmetrical. :) |
Seems reasonable; I'm tempted to call the constant it hash_info.imaginary and _PyHASH_IMAGINARY, though. :) There's also an implicit parameter in this algorithm, namely the size of a C long; I think this should go into sys.hash_info, too. complex_hash does need fixing in one respect: it currently depends on signed overflow wrapping modulo 2**BIT_IN_LONG, but that's undefined behaviour; it should use unsigned arithmetic instead.
Agreed. Along similar lines, I think I'm also going to get rid of _PyHASH_NINF and just use -PyHASH_INF instead. |
Updated patch:
|
Committed the hash changes in r81486. This commit just changes the method for computing hash values; it doesn't include the changes to the decimal module that make Decimal instances comparable with Fraction instances. |
Committed the Decimal-to-Fraction comparisons in r81893. All numeric types should now compare nicely with each other. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: