-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_stable_hash for negative and special floats #69
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plaese bump the version file and the changelog because we already did the release for 0.5.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add unit tests for the hash functions
sharedbuffers/mapped_struct.py
Outdated
@@ -950,12 +950,19 @@ def _stable_hash(key): | |||
except OverflowError: | |||
hval = key & 0xFFFFFFFFFFFFFFFF | |||
elif isinstance(key, float): | |||
trunc_key = int(key) | |||
trunc_key = int(key) if not math.isinf(key) and not math.isnan(key) else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should cimport isinf and isnan in the pxd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not using isinf
and isnan
on C doubles, but rather on python objects. Does that make any difference ? It should also be noted that those functions are present on C99 onwards, so if Cython doesn't compile with -std=c99
(or -std=gnu
or -std=c11
), then it won't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python floats are just wrappers for C doubles. Copy the value into a native double local and use that, it'll be substantially faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simple test with gcc seems to work fine without nondefault arguments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, wouldn't it be simpler to just catch the errors? If you declare trunc_key as cython.longlong
, you can just:
try:
trunc_key = key
except OverflowError:
# float code
else:
# int code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, and whatever exception type is raised by nan/inf
tests/mapped_struct.py
Outdated
self.assertTrue(isinstance(mapped_struct._stable_hash(v), (int, long))) | ||
|
||
def testHashFloats(self): | ||
values = (1., -1., 1e+50, 1e-50, float("inf"), float("-inf"), float("nan")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about negative float values with decimal places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1e-50
has plenty of decimal places, doesn't it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it is a positive value with decimals. I mean a negative value with decimals. Something like -1e-50
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to update the version in the bidder/bdt.
tests/mapped_struct.py
Outdated
def testHashFloats(self): | ||
values = (1., -1., 1e+50, 1e-50, float("inf"), float("-inf"), float("nan")) | ||
for v in values: | ||
self.assertTrue(isinstance(mapped_struct._stable_hash(v), (int, long))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only checks that it doesn't blow up, and the type of the returned value. But you should also check the returned value itself. It should be stable, so you know it shouldn't change.
sharedbuffers/mapped_struct.py
Outdated
if expo < 0: | ||
# A double's exponent is usually limited to [-1024, 1024] | ||
expo += 0xFFFF | ||
if math.isinf(mant): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can speed this function access if you declare them as cython globals.
mant = 2 | ||
if isinf(mant): | ||
mant = 1. if mant > 0 else -1. | ||
elif isnan(mant): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to import isinf/isnan from math in pure python or you'll break pure python. Also import frexp into a fast global to avoid the attribute access to the math module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frexp
has a different calling convention than most other functions in that it takes an output parameter (a pointer to an int), so it'd look and work differently whether cython.compiled
is true or not.
Furthermore, the standard doesn't really say what frexp
has to return when the input float is +/- inf or NaN. The behaviour is unspecified. I think it's better if we just rely on Python's implementation handling all these subtleties.
tests/mapped_struct.py
Outdated
values = (1, 1033, 8620194, 20913041029, 66210231110, 752) | ||
for x in values: | ||
self.assertEqual(x, mapped_struct._stable_hash(x)) | ||
|
||
def testHashIntegers(self): | ||
self.assertHashesOK((1, -1, 1 << 100, -(1 << 100), 1L, -1L)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for value, hash in ((1, 1), (1 << 100, 32487324), ...)
self.assertEqual(mapped_struct._stable_hash(value), hash)
tests/mapped_struct.py
Outdated
values_hashes = ( | ||
("abcdef", 18053520794346263629L), | ||
("123456789", 10139926970967174787L), | ||
("!@#%&$", 15648343848775299486L)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP formatting:
(
(blah),
(blah),
(blah),
)
Ie: trailing comma on the last entry, closing paren on its own line
tests/mapped_struct.py
Outdated
] | ||
|
||
class StableHashTest(unittest.TestCase): | ||
|
||
def assertHashesOK(self, values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the tests that use assertHashesOk
fail to check the stability of the hash values. assertHashesOk
should get the hash-value pairs instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 types of tests in the class StableHashTest
:
The first kind is the one that receives the value array, computes the hash values once, then computes them again, and checks that they are equal (to ensure hash values don't depend on any global state) and that they are of type int
or long
.
The second kind uses the same value array, but with the addition of the precomputed hashes in order to test for stability. They are the ones with the suffix Fixed
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think about it, the second test is stricter than the first, and implies correctness under the first tests' conditions, so you don't need the first kind of tests, just the second kind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you have different test values, and the test values for the "fixed" kind is a smaller set. So better unify them. Merge all the test values into a single "Fixed" test for each kind of object and that's both less code and more thorough testing.
sharedbuffers/mapped_struct.py
Outdated
@@ -20,6 +20,7 @@ | |||
import weakref | |||
from datetime import timedelta, datetime, date | |||
from decimal import Decimal | |||
from math import frexp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to cython.declare
frexp as object for it to be a "fast" global (otherwise it still goes through the module dict). Or add it to the pxd, both are equivalent.
This PR aims to fix
_stable_hash
when called with negative floats and other 'special' values like infinity or NaN.