FIX: fixing hashing with mixed dtype + test #286

aabadie · 2015-12-18T17:23:56Z

aabadie · 2015-12-21T08:35:23Z

A few tests are failing because with python 2. Will fix them.

ogrisel · 2015-12-22T09:18:12Z

joblib/hashing.py

+            Pickler._batch_setitems(self, sorted(items,
+                                                 key=lambda k: hash(k[0])))
+        else:
+            Pickler._batch_setitems(self, iter(sorted(items)))


Why is it not possible to use the same code under both Python 2 and Python 3? Maybe it's good not to change the Python hash values under Python 2.

@ogrisel, this was related to some test failing. I wanted to keep the same test cases to work. See expected_dict values in https://github.com/joblib/joblib/blob/master/joblib/test/test_hashing.py#L368 and in https://github.com/joblib/joblib/blob/master/joblib/test/test_hashing.py#L415
If I use the same sorting code for both python 2 and 3, this breaks one of the expected values for py2.

Indeed it's good to not change the hash values for Python 2 if possible and easy. Let's add an inline comment to say so.

aabadie · 2016-01-13T15:40:27Z

For the record, I compared the speed between this PR and master using the following snippet:

import joblib
d = {i:str(i) for i in range(int(1e6))}
%time print(joblib.hash(d))

result:

Master:

f20908515f896d797430df0a5be37b46
Wall time: 12.2 s

This PR:

06626d40926a81cd027b523c13427c6f
Wall time: 39.9 s

So using the key parameter is slowing down things.

aabadie · 2016-01-14T09:50:10Z

@ogrisel, I pushed a change that keeps the actual behavior but uses a different sorting strategy if a TypeErroris raised.

ogrisel · 2016-01-14T10:17:32Z

LGTM, +1 for merge once squashed.

ogrisel · 2016-01-14T10:23:51Z

joblib/test/test_hashing.py

@@ -97,7 +97,13 @@ def test_trival_hash():
                None,
                gc.collect,
                [1, ].append,
-               ]
+                # Next 2 tuples have unorderable elements in python 3.
+                ('a', 1),


You should test on set instead:

set(('a', 1)),

a tuple is already ordered.

aabadie · 2016-01-14T12:13:52Z

@ogrisel, commits are now squashed. Waiting for appveyor to pass (it should arrive soon).

ogrisel · 2016-01-14T12:57:54Z

joblib/hashing.py

+        except TypeError:
+            # If keys are unorderable, sorting them using their hash. This is
+            # slower but works in any case.
+            Pickler._batch_setitems(self,


style: is the line break after self, really necessary?

I wanted to stay within the 80 characters width limit. I found a better line break.

ogrisel · 2016-01-14T12:58:30Z

Could you please add an entry in the changelog?

aabadie · 2016-01-14T13:41:38Z

Could you please add an entry in the changelog?

Done !

lesteve · 2016-01-14T14:02:24Z

LGTM, merging, thanks !

FIX: fixing hashing with mixed dtype + test

ogrisel reviewed Dec 22, 2015
View reviewed changes

aabadie force-pushed the fix_hashing branch 4 times, most recently from ff2161f to eb64a31 Compare January 13, 2016 16:16

aabadie force-pushed the fix_hashing branch from b3d8dac to 7de7807 Compare January 14, 2016 10:19

ogrisel reviewed Jan 14, 2016
View reviewed changes

aabadie force-pushed the fix_hashing branch 5 times, most recently from 9827232 to 6aee76f Compare January 14, 2016 12:04

aabadie force-pushed the fix_hashing branch from 6aee76f to 6c06b80 Compare January 14, 2016 12:22

ogrisel reviewed Jan 14, 2016
View reviewed changes

fixing hashing with mixed dtype + test

2a07fec

aabadie force-pushed the fix_hashing branch from 6c06b80 to 2a07fec Compare January 14, 2016 13:34

add entry in changelog

9db8607

lesteve added a commit that referenced this pull request Jan 14, 2016

Merge pull request #286 from aabadie/fix_hashing

4a9c63d

FIX: fixing hashing with mixed dtype + test

lesteve merged commit 4a9c63d into joblib:master Jan 14, 2016

aabadie deleted the fix_hashing branch January 14, 2016 14:04

ogrisel mentioned this pull request Jan 15, 2016

[Python 3] joblib.hashing.hash error for mixed types sets or dict with mixed types keys #254

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: fixing hashing with mixed dtype + test #286

FIX: fixing hashing with mixed dtype + test #286

aabadie commented Dec 18, 2015

aabadie commented Dec 21, 2015

ogrisel Dec 22, 2015

aabadie Jan 4, 2016

ogrisel Jan 12, 2016

aabadie commented Jan 13, 2016

aabadie commented Jan 14, 2016

ogrisel commented Jan 14, 2016

ogrisel Jan 14, 2016

aabadie commented Jan 14, 2016

ogrisel Jan 14, 2016

aabadie Jan 14, 2016

ogrisel commented Jan 14, 2016

aabadie commented Jan 14, 2016

lesteve commented Jan 14, 2016

FIX: fixing hashing with mixed dtype + test #286

FIX: fixing hashing with mixed dtype + test #286

Conversation

aabadie commented Dec 18, 2015

aabadie commented Dec 21, 2015

ogrisel Dec 22, 2015

Choose a reason for hiding this comment

aabadie Jan 4, 2016

Choose a reason for hiding this comment

ogrisel Jan 12, 2016

Choose a reason for hiding this comment

aabadie commented Jan 13, 2016

aabadie commented Jan 14, 2016

ogrisel commented Jan 14, 2016

ogrisel Jan 14, 2016

Choose a reason for hiding this comment

aabadie commented Jan 14, 2016

ogrisel Jan 14, 2016

Choose a reason for hiding this comment

aabadie Jan 14, 2016

Choose a reason for hiding this comment

ogrisel commented Jan 14, 2016

aabadie commented Jan 14, 2016

lesteve commented Jan 14, 2016