-
-
Notifications
You must be signed in to change notification settings - Fork 15
FEAT: Implementing hash support in QuadDtype #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| assert hash(QuadPrecision("-inf")) == hash(float("-inf")) | ||
| # Standard PyHASH_INF values | ||
| assert hash(QuadPrecision("inf")) == 314159 | ||
| assert hash(QuadPrecision("-inf")) == -314159 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL, this is a great easter egg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha that's indeed pretty awesome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My one guess was since the language is π-thon right, hence π (3.14159) xD
ngoldbaum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM mostly - I left a few comments below.
| // Check for NaN - use pointer hash (each NaN instance gets unique hash) | ||
| // This prevents hash table catastrophic pileups from NaN instances | ||
| if (Sleef_iunordq1(value, value)) { | ||
| return _Py_HashPointer((void *)self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Py_HashPointer was added in newer versions of the C API. You should probably vendor pythoncapi-compat and use the definition exposed by that header. That way when, in the future, CPython removes this symbol, your code won't break.
In general you should avoid directly using private symbols in the C API - they can change or break without notice, even in minor version bumps.
| # define PYHASH_BITS 31 | ||
| #endif | ||
| #define PYHASH_MODULUS (((Py_uhash_t)1 << PYHASH_BITS) - 1) | ||
| #define PYHASH_INF 314159 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these symbols are public in the CPython C API starting in Python 3.13. You can use the compat header like I explained in my other comment for older versions.
quaddtype/tests/test_quaddtype.py
Outdated
| """Test hash works for extreme values without errors.""" | ||
| quad_val = QuadPrecision(value) | ||
| h = hash(quad_val) | ||
| assert isinstance(h, int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way you can check that these are the "correct" hash values you expect from the algorithm? Probably also best to use values that aren't exactly representable as doubles - I don't know if these values are. And for values that are representable exactly as doubles - check that the hash you get is identical to the hash of the python float with the same value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I'll edit this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so one way can be since Python support arbitrary long integers so we can use intergers that are power of 2 and bigger than double's max range.
The comparison will be between corresponding quadprecision value and python int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh mpmath also supports, this will be straightforward then, so I'll add both above approach as well the mpmath comparison
1e500 is out of double range
In [9]: hash(QuadPrecision("1e500")) == hash(mp.mpf("1e500"))
Out[9]: True| x = (Py_uhash_t)-2; | ||
| } | ||
|
|
||
| return (Py_hash_t)x; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert on this, but this does appear to implement the same algorithm as _Py_HashDouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it does, its just the here mantissa has more bits to be consumed inside the loop and some SLEEF specific handlings.
I didn't saw any place where it needs to be specialized for 128-bit values atleast from the comments and articles available.
juntyr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
These changes will resolve all the reviews, I just wonder when to run this from probably within the meson file itself, so that later during includes those headers would get updated? |
I think you're only supposed to run that script once, manually, and commit it, and there's no need to automatically run it in the build. |
I see, in prev commit I manually edited the files to use the header, just ran the command and nothing changed so hopefully files are in good condition |
ngoldbaum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, looks great!
|
Great, merging this in! |
Took me a while to understand the CPython's implementation, hope I mapped it correctly here.
Implementation is referenced from https://github.com/python/cpython/blob/20b69aac0d19a5e5358362410d9710887762f0e7/Python/pyhash.c#L87
This is also present inside comment, writing here as well for explicit expectations
This implements the same algorithm as CPython's
_Py_HashDouble, adapted for quad precision floating point. The algorithm computes a hash based on the reduction of the value modulo the primeP = 2**PYHASH_BITS - 1Key invariant: hash(x) == hash(y) whenever x and y are numerically equal, even if x and y have different types. This ensures that:
hash(QuadPrecision(1.0)) == hash(1.0) == hash(1)The algorithm:
infreturnsPYHASH_INF,nanuses pointer hashfrexp(v) = m * 2^e2^PYHASH_BITS ≡ 1 mod P)closes #231