Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_stable_hash for negative and special floats #69

Merged
merged 13 commits into from
Dec 17, 2018

Conversation

llogiudice
Copy link
Contributor

This PR aims to fix _stable_hash when called with negative floats and other 'special' values like infinity or NaN.

@llogiudice llogiudice requested review from gefarion, klaussfreire and tzulberti-jampp and removed request for gefarion December 17, 2018 15:06
Copy link
Contributor

@tzulberti-jampp tzulberti-jampp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plaese bump the version file and the changelog because we already did the release for 0.5.0

Copy link
Contributor

@klaussfreire klaussfreire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add unit tests for the hash functions

@@ -950,12 +950,19 @@ def _stable_hash(key):
except OverflowError:
hval = key & 0xFFFFFFFFFFFFFFFF
elif isinstance(key, float):
trunc_key = int(key)
trunc_key = int(key) if not math.isinf(key) and not math.isnan(key) else 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should cimport isinf and isnan in the pxd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not using isinf and isnan on C doubles, but rather on python objects. Does that make any difference ? It should also be noted that those functions are present on C99 onwards, so if Cython doesn't compile with -std=c99 (or -std=gnu or -std=c11), then it won't work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python floats are just wrappers for C doubles. Copy the value into a native double local and use that, it'll be substantially faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple test with gcc seems to work fine without nondefault arguments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, wouldn't it be simpler to just catch the errors? If you declare trunc_key as cython.longlong, you can just:

try:
    trunc_key = key
except OverflowError:
    # float code
else:
    # int code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, and whatever exception type is raised by nan/inf

self.assertTrue(isinstance(mapped_struct._stable_hash(v), (int, long)))

def testHashFloats(self):
values = (1., -1., 1e+50, 1e-50, float("inf"), float("-inf"), float("nan"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about negative float values with decimal places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1e-50 has plenty of decimal places, doesn't it ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it is a positive value with decimals. I mean a negative value with decimals. Something like -1e-50

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

Copy link
Contributor

@gefarion gefarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember to update the version in the bidder/bdt.

def testHashFloats(self):
values = (1., -1., 1e+50, 1e-50, float("inf"), float("-inf"), float("nan"))
for v in values:
self.assertTrue(isinstance(mapped_struct._stable_hash(v), (int, long)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only checks that it doesn't blow up, and the type of the returned value. But you should also check the returned value itself. It should be stable, so you know it shouldn't change.

if expo < 0:
# A double's exponent is usually limited to [-1024, 1024]
expo += 0xFFFF
if math.isinf(mant):
Copy link
Contributor

@gefarion gefarion Dec 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can speed this function access if you declare them as cython globals.

mant = 2
if isinf(mant):
mant = 1. if mant > 0 else -1.
elif isnan(mant):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to import isinf/isnan from math in pure python or you'll break pure python. Also import frexp into a fast global to avoid the attribute access to the math module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frexp has a different calling convention than most other functions in that it takes an output parameter (a pointer to an int), so it'd look and work differently whether cython.compiled is true or not.

Furthermore, the standard doesn't really say what frexp has to return when the input float is +/- inf or NaN. The behaviour is unspecified. I think it's better if we just rely on Python's implementation handling all these subtleties.

values = (1, 1033, 8620194, 20913041029, 66210231110, 752)
for x in values:
self.assertEqual(x, mapped_struct._stable_hash(x))

def testHashIntegers(self):
self.assertHashesOK((1, -1, 1 << 100, -(1 << 100), 1L, -1L))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for value, hash in ((1, 1), (1 << 100, 32487324), ...)
   self.assertEqual(mapped_struct._stable_hash(value), hash)

values_hashes = (
("abcdef", 18053520794346263629L),
("123456789", 10139926970967174787L),
("!@#%&$", 15648343848775299486L))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP formatting:

(
    (blah),
    (blah),
    (blah),
)

Ie: trailing comma on the last entry, closing paren on its own line

]

class StableHashTest(unittest.TestCase):

def assertHashesOK(self, values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the tests that use assertHashesOk fail to check the stability of the hash values. assertHashesOk should get the hash-value pairs instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 types of tests in the class StableHashTest:

The first kind is the one that receives the value array, computes the hash values once, then computes them again, and checks that they are equal (to ensure hash values don't depend on any global state) and that they are of type int or long.

The second kind uses the same value array, but with the addition of the precomputed hashes in order to test for stability. They are the ones with the suffix Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think about it, the second test is stricter than the first, and implies correctness under the first tests' conditions, so you don't need the first kind of tests, just the second kind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you have different test values, and the test values for the "fixed" kind is a smaller set. So better unify them. Merge all the test values into a single "Fixed" test for each kind of object and that's both less code and more thorough testing.

@@ -20,6 +20,7 @@
import weakref
from datetime import timedelta, datetime, date
from decimal import Decimal
from math import frexp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to cython.declare frexp as object for it to be a "fast" global (otherwise it still goes through the module dict). Or add it to the pxd, both are equivalent.

@llogiudice llogiudice merged commit 24f9c19 into master Dec 17, 2018
@llogiudice llogiudice deleted the stable_hash_negative_floats branch December 17, 2018 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants