_stable_hash for negative and special floats #69

llogiudice · 2018-12-17T15:06:35Z

This PR aims to fix _stable_hash when called with negative floats and other 'special' values like infinity or NaN.

tzulberti-jampp

Plaese bump the version file and the changelog because we already did the release for 0.5.0

klaussfreire

Add unit tests for the hash functions

klaussfreire · 2018-12-17T15:22:59Z

sharedbuffers/mapped_struct.py

@@ -950,12 +950,19 @@ def _stable_hash(key):
        except OverflowError:
            hval = key & 0xFFFFFFFFFFFFFFFF
    elif isinstance(key, float):
-        trunc_key = int(key)
+        trunc_key = int(key) if not math.isinf(key) and not math.isnan(key) else 0


You should cimport isinf and isnan in the pxd

I'm not using isinf and isnan on C doubles, but rather on python objects. Does that make any difference ? It should also be noted that those functions are present on C99 onwards, so if Cython doesn't compile with -std=c99 (or -std=gnu or -std=c11), then it won't work.

Python floats are just wrappers for C doubles. Copy the value into a native double local and use that, it'll be substantially faster.

A simple test with gcc seems to work fine without nondefault arguments

In any case, wouldn't it be simpler to just catch the errors? If you declare trunc_key as cython.longlong, you can just:

try: trunc_key = key except OverflowError: # float code else: # int code

Well, and whatever exception type is raised by nan/inf

tzulberti-jampp · 2018-12-17T15:42:17Z

tests/mapped_struct.py

+            self.assertTrue(isinstance(mapped_struct._stable_hash(v), (int, long)))
+
+    def testHashFloats(self):
+        values = (1., -1., 1e+50, 1e-50, float("inf"), float("-inf"), float("nan"))


What about negative float values with decimal places?

1e-50 has plenty of decimal places, doesn't it ?

Yes, but it is a positive value with decimals. I mean a negative value with decimals. Something like -1e-50

gefarion

Remember to update the version in the bidder/bdt.

klaussfreire · 2018-12-17T15:51:09Z

tests/mapped_struct.py

+    def testHashFloats(self):
+        values = (1., -1., 1e+50, 1e-50, float("inf"), float("-inf"), float("nan"))
+        for v in values:
+            self.assertTrue(isinstance(mapped_struct._stable_hash(v), (int, long)))


This only checks that it doesn't blow up, and the type of the returned value. But you should also check the returned value itself. It should be stable, so you know it shouldn't change.

gefarion · 2018-12-17T16:58:15Z

sharedbuffers/mapped_struct.py

+            if expo < 0:
+                # A double's exponent is usually limited to [-1024, 1024]
+                expo += 0xFFFF
+            if math.isinf(mant):


I think you can speed this function access if you declare them as cython globals.

klaussfreire · 2018-12-17T17:10:41Z

sharedbuffers/mapped_struct.py

-                mant = 2
+            if isinf(mant):
+                mant = 1. if mant > 0 else -1.
+            elif isnan(mant):


You have to import isinf/isnan from math in pure python or you'll break pure python. Also import frexp into a fast global to avoid the attribute access to the math module.

frexp has a different calling convention than most other functions in that it takes an output parameter (a pointer to an int), so it'd look and work differently whether cython.compiled is true or not.

Furthermore, the standard doesn't really say what frexp has to return when the input float is +/- inf or NaN. The behaviour is unspecified. I think it's better if we just rely on Python's implementation handling all these subtleties.

klaussfreire · 2018-12-17T17:33:09Z

tests/mapped_struct.py

+        values = (1, 1033, 8620194, 20913041029, 66210231110, 752)
+        for x in values:
+            self.assertEqual(x, mapped_struct._stable_hash(x))
+
    def testHashIntegers(self):
        self.assertHashesOK((1, -1, 1 << 100, -(1 << 100), 1L, -1L))


for value, hash in ((1, 1), (1 << 100, 32487324), ...) self.assertEqual(mapped_struct._stable_hash(value), hash)

klaussfreire · 2018-12-17T17:51:49Z

tests/mapped_struct.py

+        values_hashes = (
+            ("abcdef", 18053520794346263629L),
+            ("123456789", 10139926970967174787L),
+            ("!@#%&$", 15648343848775299486L))


PEP formatting:

( (blah), (blah), (blah), )

Ie: trailing comma on the last entry, closing paren on its own line

klaussfreire · 2018-12-17T18:03:08Z

tests/mapped_struct.py

    ]

+class StableHashTest(unittest.TestCase):
+
+    def assertHashesOK(self, values):


All the tests that use assertHashesOk fail to check the stability of the hash values. assertHashesOk should get the hash-value pairs instead.

There are 2 types of tests in the class StableHashTest:

The first kind is the one that receives the value array, computes the hash values once, then computes them again, and checks that they are equal (to ensure hash values don't depend on any global state) and that they are of type int or long.

The second kind uses the same value array, but with the addition of the precomputed hashes in order to test for stability. They are the ones with the suffix Fixed.

If you think about it, the second test is stricter than the first, and implies correctness under the first tests' conditions, so you don't need the first kind of tests, just the second kind.

But you have different test values, and the test values for the "fixed" kind is a smaller set. So better unify them. Merge all the test values into a single "Fixed" test for each kind of object and that's both less code and more thorough testing.

klaussfreire · 2018-12-17T18:49:15Z

sharedbuffers/mapped_struct.py

@@ -20,6 +20,7 @@
 import weakref
 from datetime import timedelta, datetime, date
 from decimal import Decimal
+from math import frexp


You have to cython.declare frexp as object for it to be a "fast" global (otherwise it still goes through the module dict). Or add it to the pxd, both are equivalent.

Fix _stable_hash for negative and special floats

15a1b71

llogiudice requested review from gefarion, klaussfreire and tzulberti-jampp and removed request for gefarion December 17, 2018 15:06

Make _stable_hash work with NaN

07bcbb2

tzulberti-jampp reviewed Dec 17, 2018

View reviewed changes