New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix numpy DeprecationWarning in hashing.py #308
Comments
For the record, this deprecation warning announce on numpy github project: https://github.com/numpy/numpy/blob/master/doc/release/1.10.2-notes.rst#deprecate-views-changing-dimensions-in-fortran-order |
jdanbrown
added a commit
to jdanbrown/joblib
that referenced
this issue
Feb 16, 2020
- Previously, Hasher would collect the full bytes output of `self.dump(obj)` into the `BytesIO` at `self.stream`, and then `self.stream.getvalue()` the bytes to put into `self._hash.update(bytes)`. This requires O(n) space inside `self.stream`. - This PR proposes a change that hands off all the Pickler's `file.write(bytes)` calls directly to `self._hash.update(bytes)`, without first accumulating O(n) of space in an in-memory stream. This requires O(1) space. - I haven't measured performance improvements (or regressions?), but I'd expect this to be a meaningful win when hashing large objects (~gigs), since they no longer have to allocate a ton of ram - And even when runtime performance isn't noticeably better, this at least makes hashing large objects memory safe, in that you don't need ~1x extra ram to compute the hash of a large object Questions: - Do you see any reasons why this wouldn't be a desirable change? - Also, this makes 1/1185 tests fail. Any guess as to why? ``` ___________________________________________________ test_hashes_stay_the_same_with_numpy_objects ____________________________________________________ @with_numpy def test_hashes_stay_the_same_with_numpy_objects(): # We want to make sure that hashes don't change with joblib # version. For end users, that would mean that they have to # regenerate their cache from scratch, which potentially means # lengthy recomputations. rng = np.random.RandomState(42) # Being explicit about dtypes in order to avoid # architecture-related differences. Also using 'f4' rather than # 'f8' for float arrays because 'f8' arrays generated by # rng.random.randn don't seem to be bit-identical on 32bit and # 64bit machines. to_hash_list = [ rng.randint(-1000, high=1000, size=50).astype('<i8'), tuple(rng.randn(3).astype('<f4') for _ in range(5)), [rng.randn(3).astype('<f4') for _ in range(5)], { -3333: rng.randn(3, 5).astype('<f4'), 0: [ rng.randint(10, size=20).astype('<i8'), rng.randn(10).astype('<f4') ] }, # Non regression cases for joblib#308. # Generated with joblib 0.9.4. np.arange(100, dtype='<i8').reshape((10, 10)), # Fortran contiguous array np.asfortranarray(np.arange(100, dtype='<i8').reshape((10, 10))), # Non contiguous array np.arange(100, dtype='<i8').reshape((10, 10))[:, :2], ] # These expected results have been generated with joblib 0.9.0 expected_dict = {'py2': ['80f2387e7752abbda2658aafed49e086', '0d700f7f25ea670fd305e4cd93b0e8cd', '83a2bdf843e79e4b3e26521db73088b9', '63e0efd43c0a9ad92a07e8ce04338dd3', '03fef702946b602c852b8b4e60929914', '07074691e90d7098a85956367045c81e', 'd264cf79f353aa7bbfa8349e3df72d8f'], 'py3': ['10a6afc379ca2708acfbaef0ab676eab', '988a7114f337f381393025911ebc823b', 'c6809f4b97e35f2fa0ee8d653cbd025c', 'b3ad17348e32728a7eb9cda1e7ede438', '927b3e6b0b6a037e8e035bda134e0b05', '108f6ee98e7db19ea2006ffd208f4bf1', 'bd48ccaaff28e16e6badee81041b7180']} py_version_str = 'py3' if PY3_OR_LATER else 'py2' expected_list = expected_dict[py_version_str] for to_hash, expected in zip(to_hash_list, expected_list): > assert hash(to_hash) == expected E AssertionError: assert '81b225876779e9dfd235bfd83bb555cb' == '10a6afc379ca2708acfbaef0ab676eab' E - 81b225876779e9dfd235bfd83bb555cb E + 10a6afc379ca2708acfbaef0ab676eab joblib/test/test_hashing.py:459: AssertionError ========================================= 1 failed, 1184 passed, 21 skipped, 1 xfailed in 140.11s (0:02:20) ========================================= ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
From nilearn tests:
The text was updated successfully, but these errors were encountered: