FEAT: Implementing hash support in QuadDtype #245

SwayamInSync · 2025-12-23T07:20:50Z

Took me a while to understand the CPython's implementation, hope I mapped it correctly here.
Implementation is referenced from https://github.com/python/cpython/blob/20b69aac0d19a5e5358362410d9710887762f0e7/Python/pyhash.c#L87

This is also present inside comment, writing here as well for explicit expectations

This implements the same algorithm as CPython's _Py_HashDouble, adapted for quad precision floating point. The algorithm computes a hash based on the reduction of the value modulo the prime P = 2**PYHASH_BITS - 1
Key invariant: hash(x) == hash(y) whenever x and y are numerically equal, even if x and y have different types. This ensures that: hash(QuadPrecision(1.0)) == hash(1.0) == hash(1)

The algorithm:

Handle special cases: inf returns PYHASH_INF, nan uses pointer hash
Extract mantissa m in [0.5, 1.0) and exponent e via frexp(v) = m * 2^e
Process mantissa 28 bits at a time, accumulating into hash value x
Adjust for exponent using bit rotation (since 2^PYHASH_BITS ≡ 1 mod P)
Apply sign and handle the special case of -1 -> -2

closes #231

ngoldbaum · 2025-12-23T15:52:43Z

quaddtype/tests/test_quaddtype.py

+        assert hash(QuadPrecision("-inf")) == hash(float("-inf"))
+        # Standard PyHASH_INF values
+        assert hash(QuadPrecision("inf")) == 314159
+        assert hash(QuadPrecision("-inf")) == -314159


TIL, this is a great easter egg.

Haha that's indeed pretty awesome

My one guess was since the language is π-thon right, hence π (3.14159) xD

ngoldbaum

LGTM mostly - I left a few comments below.

ngoldbaum · 2025-12-23T16:04:00Z

quaddtype/numpy_quaddtype/src/scalar.c

+    // Check for NaN - use pointer hash (each NaN instance gets unique hash)
+    // This prevents hash table catastrophic pileups from NaN instances
+    if (Sleef_iunordq1(value, value)) {
+        return _Py_HashPointer((void *)self);


Py_HashPointer was added in newer versions of the C API. You should probably vendor pythoncapi-compat and use the definition exposed by that header. That way when, in the future, CPython removes this symbol, your code won't break.

In general you should avoid directly using private symbols in the C API - they can change or break without notice, even in minor version bumps.

ngoldbaum · 2025-12-23T16:07:50Z

quaddtype/numpy_quaddtype/src/scalar.c

+#  define PYHASH_BITS 31
+#endif
+#define PYHASH_MODULUS (((Py_uhash_t)1 << PYHASH_BITS) - 1)
+#define PYHASH_INF 314159


All these symbols are public in the CPython C API starting in Python 3.13. You can use the compat header like I explained in my other comment for older versions.

ngoldbaum · 2025-12-23T16:10:21Z

quaddtype/tests/test_quaddtype.py

+        """Test hash works for extreme values without errors."""
+        quad_val = QuadPrecision(value)
+        h = hash(quad_val)
+        assert isinstance(h, int)


Is there any way you can check that these are the "correct" hash values you expect from the algorithm? Probably also best to use values that aren't exactly representable as doubles - I don't know if these values are. And for values that are representable exactly as doubles - check that the hash you get is identical to the hash of the python float with the same value.

Got it, I'll edit this

so one way can be since Python support arbitrary long integers so we can use intergers that are power of 2 and bigger than double's max range.

The comparison will be between corresponding quadprecision value and python int

oh mpmath also supports, this will be straightforward then, so I'll add both above approach as well the mpmath comparison

1e500 is out of double range

In [9]: hash(QuadPrecision("1e500")) == hash(mp.mpf("1e500")) Out[9]: True

ngoldbaum · 2025-12-23T16:11:02Z

quaddtype/numpy_quaddtype/src/scalar.c

+        x = (Py_uhash_t)-2;
+    }
+
+    return (Py_hash_t)x;


I'm not an expert on this, but this does appear to implement the same algorithm as _Py_HashDouble.

yeah it does, its just the here mantissa has more bits to be consumed inside the loop and some SLEEF specific handlings.
I didn't saw any place where it needs to be specialized for 128-bit values atleast from the comments and articles available.

juntyr

LGTM

SwayamInSync · 2025-12-23T17:48:10Z

These changes will resolve all the reviews, I just wonder when to run this from pythoncpi-compat

python3 upgrade_pythoncapi.py src/

probably within the meson file itself, so that later during includes those headers would get updated?

ngoldbaum · 2025-12-23T17:53:12Z

probably within the meson file itself, so that later during includes those headers would get updated?

I think you're only supposed to run that script once, manually, and commit it, and there's no need to automatically run it in the build.

SwayamInSync · 2025-12-23T18:31:05Z

I think you're only supposed to run that script once, manually, and commit it, and there's no need to automatically run it in the build.

I see, in prev commit I manually edited the files to use the header, just ran the command and nothing changed so hopefully files are in good condition

ngoldbaum

Nice, looks great!

SwayamInSync · 2025-12-23T19:36:44Z

Great, merging this in!
Thanks everyone.

SwayamInSync added 2 commits December 23, 2025 12:43

more tests

4d34a85

-1 => -2

b26924a

SwayamInSync added this to the v1.0 milestone Dec 23, 2025

SwayamInSync added the numpy_quaddtype label Dec 23, 2025

SwayamInSync requested a review from ngoldbaum December 23, 2025 14:22

ngoldbaum reviewed Dec 23, 2025

View reviewed changes

juntyr approved these changes Dec 23, 2025

View reviewed changes

SwayamInSync added 4 commits December 23, 2025 16:46

test extreme bigger int + mpmath quad

146ba87

using pythoncpi-compat

5957d10

patch dir is not needed

fc13f22

define test path in toml

90db2cd

ngoldbaum approved these changes Dec 23, 2025

View reviewed changes

SwayamInSync merged commit db1ee6a into numpy:main Dec 23, 2025
11 checks passed

Uh oh!

FEAT: Implementing hash support in QuadDtype #245

FEAT: Implementing hash support in QuadDtype #245

Conversation

SwayamInSync commented Dec 23, 2025

The algorithm:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngoldbaum left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SwayamInSync Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juntyr left a comment

Choose a reason for hiding this comment

Uh oh!

SwayamInSync commented Dec 23, 2025

Uh oh!

ngoldbaum commented Dec 23, 2025

Uh oh!

SwayamInSync commented Dec 23, 2025

Uh oh!

ngoldbaum left a comment

Choose a reason for hiding this comment

Uh oh!

SwayamInSync commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SwayamInSync Dec 23, 2025 •

edited

Loading