Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: fixing hashing with mixed dtype + test #286

Merged
merged 2 commits into from
Jan 14, 2016

Conversation

aabadie
Copy link
Contributor

@aabadie aabadie commented Dec 18, 2015

fix #254

@aabadie
Copy link
Contributor Author

aabadie commented Dec 21, 2015

A few tests are failing because with python 2. Will fix them.

Pickler._batch_setitems(self, sorted(items,
key=lambda k: hash(k[0])))
else:
Pickler._batch_setitems(self, iter(sorted(items)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it not possible to use the same code under both Python 2 and Python 3? Maybe it's good not to change the Python hash values under Python 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogrisel, this was related to some test failing. I wanted to keep the same test cases to work. See expected_dict values in https://github.com/joblib/joblib/blob/master/joblib/test/test_hashing.py#L368 and in https://github.com/joblib/joblib/blob/master/joblib/test/test_hashing.py#L415
If I use the same sorting code for both python 2 and 3, this breaks one of the expected values for py2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it's good to not change the hash values for Python 2 if possible and easy. Let's add an inline comment to say so.

@aabadie
Copy link
Contributor Author

aabadie commented Jan 13, 2016

For the record, I compared the speed between this PR and master using the following snippet:

import joblib
d = {i:str(i) for i in range(int(1e6))}
%time print(joblib.hash(d))

result:

  • Master:
f20908515f896d797430df0a5be37b46
Wall time: 12.2 s
  • This PR:
06626d40926a81cd027b523c13427c6f
Wall time: 39.9 s

So using the key parameter is slowing down things.

@aabadie aabadie force-pushed the fix_hashing branch 4 times, most recently from ff2161f to eb64a31 Compare January 13, 2016 16:16
@aabadie
Copy link
Contributor Author

aabadie commented Jan 14, 2016

@ogrisel, I pushed a change that keeps the actual behavior but uses a different sorting strategy if a TypeErroris raised.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 14, 2016

LGTM, +1 for merge once squashed.

@@ -97,7 +97,13 @@ def test_trival_hash():
None,
gc.collect,
[1, ].append,
]
# Next 2 tuples have unorderable elements in python 3.
('a', 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should test on set instead:

set(('a', 1)),

a tuple is already ordered.

@aabadie aabadie force-pushed the fix_hashing branch 5 times, most recently from 9827232 to 6aee76f Compare January 14, 2016 12:04
@aabadie
Copy link
Contributor Author

aabadie commented Jan 14, 2016

@ogrisel, commits are now squashed. Waiting for appveyor to pass (it should arrive soon).

except TypeError:
# If keys are unorderable, sorting them using their hash. This is
# slower but works in any case.
Pickler._batch_setitems(self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: is the line break after self, really necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to stay within the 80 characters width limit. I found a better line break.

@ogrisel
Copy link
Contributor

ogrisel commented Jan 14, 2016

Could you please add an entry in the changelog?

@aabadie
Copy link
Contributor Author

aabadie commented Jan 14, 2016

Could you please add an entry in the changelog?

Done !

@lesteve
Copy link
Member

lesteve commented Jan 14, 2016

LGTM, merging, thanks !

lesteve added a commit that referenced this pull request Jan 14, 2016
FIX: fixing hashing with mixed dtype + test
@lesteve lesteve merged commit 4a9c63d into joblib:master Jan 14, 2016
@aabadie aabadie deleted the fix_hashing branch January 14, 2016 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Python 3] joblib.hashing.hash error for mixed types sets or dict with mixed types keys
3 participants