Skip to content

Conversation

@SwayamInSync
Copy link
Member

Took me a while to understand the CPython's implementation, hope I mapped it correctly here.
Implementation is referenced from https://github.com/python/cpython/blob/20b69aac0d19a5e5358362410d9710887762f0e7/Python/pyhash.c#L87

This is also present inside comment, writing here as well for explicit expectations

  • This implements the same algorithm as CPython's _Py_HashDouble, adapted for quad precision floating point. The algorithm computes a hash based on the reduction of the value modulo the prime P = 2**PYHASH_BITS - 1

  • Key invariant: hash(x) == hash(y) whenever x and y are numerically equal, even if x and y have different types. This ensures that: hash(QuadPrecision(1.0)) == hash(1.0) == hash(1)

The algorithm:

  1. Handle special cases: inf returns PYHASH_INF, nan uses pointer hash
  2. Extract mantissa m in [0.5, 1.0) and exponent e via frexp(v) = m * 2^e
  3. Process mantissa 28 bits at a time, accumulating into hash value x
  4. Adjust for exponent using bit rotation (since 2^PYHASH_BITS ≡ 1 mod P)
  5. Apply sign and handle the special case of -1 -> -2

closes #231

@SwayamInSync SwayamInSync added this to the v1.0 milestone Dec 23, 2025
assert hash(QuadPrecision("-inf")) == hash(float("-inf"))
# Standard PyHASH_INF values
assert hash(QuadPrecision("inf")) == 314159
assert hash(QuadPrecision("-inf")) == -314159
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL, this is a great easter egg.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha that's indeed pretty awesome

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My one guess was since the language is π-thon right, hence π (3.14159) xD

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM mostly - I left a few comments below.

// Check for NaN - use pointer hash (each NaN instance gets unique hash)
// This prevents hash table catastrophic pileups from NaN instances
if (Sleef_iunordq1(value, value)) {
return _Py_HashPointer((void *)self);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Py_HashPointer was added in newer versions of the C API. You should probably vendor pythoncapi-compat and use the definition exposed by that header. That way when, in the future, CPython removes this symbol, your code won't break.

In general you should avoid directly using private symbols in the C API - they can change or break without notice, even in minor version bumps.

# define PYHASH_BITS 31
#endif
#define PYHASH_MODULUS (((Py_uhash_t)1 << PYHASH_BITS) - 1)
#define PYHASH_INF 314159
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these symbols are public in the CPython C API starting in Python 3.13. You can use the compat header like I explained in my other comment for older versions.

"""Test hash works for extreme values without errors."""
quad_val = QuadPrecision(value)
h = hash(quad_val)
assert isinstance(h, int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way you can check that these are the "correct" hash values you expect from the algorithm? Probably also best to use values that aren't exactly representable as doubles - I don't know if these values are. And for values that are representable exactly as doubles - check that the hash you get is identical to the hash of the python float with the same value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I'll edit this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so one way can be since Python support arbitrary long integers so we can use intergers that are power of 2 and bigger than double's max range.

The comparison will be between corresponding quadprecision value and python int

Copy link
Member Author

@SwayamInSync SwayamInSync Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh mpmath also supports, this will be straightforward then, so I'll add both above approach as well the mpmath comparison

1e500 is out of double range

In [9]: hash(QuadPrecision("1e500")) == hash(mp.mpf("1e500"))
Out[9]: True

x = (Py_uhash_t)-2;
}

return (Py_hash_t)x;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert on this, but this does appear to implement the same algorithm as _Py_HashDouble.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it does, its just the here mantissa has more bits to be consumed inside the loop and some SLEEF specific handlings.
I didn't saw any place where it needs to be specialized for 128-bit values atleast from the comments and articles available.

Copy link
Contributor

@juntyr juntyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SwayamInSync
Copy link
Member Author

These changes will resolve all the reviews, I just wonder when to run this from pythoncpi-compat

python3 upgrade_pythoncapi.py src/

probably within the meson file itself, so that later during includes those headers would get updated?

@ngoldbaum
Copy link
Member

probably within the meson file itself, so that later during includes those headers would get updated?

I think you're only supposed to run that script once, manually, and commit it, and there's no need to automatically run it in the build.

@SwayamInSync
Copy link
Member Author

I think you're only supposed to run that script once, manually, and commit it, and there's no need to automatically run it in the build.

I see, in prev commit I manually edited the files to use the header, just ran the command and nothing changed so hopefully files are in good condition

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks great!

@SwayamInSync
Copy link
Member Author

Great, merging this in!
Thanks everyone.

@SwayamInSync SwayamInSync merged commit db1ee6a into numpy:main Dec 23, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issues with hashing/saving QuadPrecDType

4 participants