-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: fixing hashing with mixed dtype + test #286
Conversation
A few tests are failing because with python 2. Will fix them. |
Pickler._batch_setitems(self, sorted(items, | ||
key=lambda k: hash(k[0]))) | ||
else: | ||
Pickler._batch_setitems(self, iter(sorted(items))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it not possible to use the same code under both Python 2 and Python 3? Maybe it's good not to change the Python hash values under Python 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ogrisel, this was related to some test failing. I wanted to keep the same test cases to work. See expected_dict
values in https://github.com/joblib/joblib/blob/master/joblib/test/test_hashing.py#L368 and in https://github.com/joblib/joblib/blob/master/joblib/test/test_hashing.py#L415
If I use the same sorting code for both python 2 and 3, this breaks one of the expected values for py2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it's good to not change the hash values for Python 2 if possible and easy. Let's add an inline comment to say so.
For the record, I compared the speed between this PR and master using the following snippet: import joblib
d = {i:str(i) for i in range(int(1e6))}
%time print(joblib.hash(d)) result:
So using the |
ff2161f
to
eb64a31
Compare
@ogrisel, I pushed a change that keeps the actual behavior but uses a different sorting strategy if a |
LGTM, +1 for merge once squashed. |
@@ -97,7 +97,13 @@ def test_trival_hash(): | |||
None, | |||
gc.collect, | |||
[1, ].append, | |||
] | |||
# Next 2 tuples have unorderable elements in python 3. | |||
('a', 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should test on set instead:
set(('a', 1)),
a tuple is already ordered.
9827232
to
6aee76f
Compare
@ogrisel, commits are now squashed. Waiting for appveyor to pass (it should arrive soon). |
except TypeError: | ||
# If keys are unorderable, sorting them using their hash. This is | ||
# slower but works in any case. | ||
Pickler._batch_setitems(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: is the line break after self,
really necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to stay within the 80 characters width limit. I found a better line break.
Could you please add an entry in the changelog? |
Done ! |
LGTM, merging, thanks ! |
FIX: fixing hashing with mixed dtype + test
fix #254