Fix numpy DeprecationWarning in hashing.py #308

lesteve · 2016-02-04T15:32:38Z

From nilearn tests:

/home/ubuntu/virtualenvs/venv-system/local/lib/python2.7/site-packages/sklearn/externals/joblib/hashing.py:178: DeprecationWarning: Changing the shape of non-C contiguous array by
descriptor assignment is deprecated. To maintain
the Fortran contiguity of a multidimensional Fortran
array, use 'a.T.view(...).T' instead
  obj_bytes_view = obj.view(self.np.uint8)

The text was updated successfully, but these errors were encountered:

aabadie · 2016-02-04T15:56:08Z

For the record, this deprecation warning announce on numpy github project: https://github.com/numpy/numpy/blob/master/doc/release/1.10.2-notes.rst#deprecate-views-changing-dimensions-in-fortran-order

- Previously, Hasher would collect the full bytes output of `self.dump(obj)` into the `BytesIO` at `self.stream`, and then `self.stream.getvalue()` the bytes to put into `self._hash.update(bytes)`. This requires O(n) space inside `self.stream`. - This PR proposes a change that hands off all the Pickler's `file.write(bytes)` calls directly to `self._hash.update(bytes)`, without first accumulating O(n) of space in an in-memory stream. This requires O(1) space. - I haven't measured performance improvements (or regressions?), but I'd expect this to be a meaningful win when hashing large objects (~gigs), since they no longer have to allocate a ton of ram - And even when runtime performance isn't noticeably better, this at least makes hashing large objects memory safe, in that you don't need ~1x extra ram to compute the hash of a large object Questions: - Do you see any reasons why this wouldn't be a desirable change? - Also, this makes 1/1185 tests fail. Any guess as to why? ``` ___________________________________________________ test_hashes_stay_the_same_with_numpy_objects ____________________________________________________ @with_numpy def test_hashes_stay_the_same_with_numpy_objects(): # We want to make sure that hashes don't change with joblib # version. For end users, that would mean that they have to # regenerate their cache from scratch, which potentially means # lengthy recomputations. rng = np.random.RandomState(42) # Being explicit about dtypes in order to avoid # architecture-related differences. Also using 'f4' rather than # 'f8' for float arrays because 'f8' arrays generated by # rng.random.randn don't seem to be bit-identical on 32bit and # 64bit machines. to_hash_list = [ rng.randint(-1000, high=1000, size=50).astype('<i8'), tuple(rng.randn(3).astype('<f4') for _ in range(5)), [rng.randn(3).astype('<f4') for _ in range(5)], { -3333: rng.randn(3, 5).astype('<f4'), 0: [ rng.randint(10, size=20).astype('<i8'), rng.randn(10).astype('<f4') ] }, # Non regression cases for joblib#308. # Generated with joblib 0.9.4. np.arange(100, dtype='<i8').reshape((10, 10)), # Fortran contiguous array np.asfortranarray(np.arange(100, dtype='<i8').reshape((10, 10))), # Non contiguous array np.arange(100, dtype='<i8').reshape((10, 10))[:, :2], ] # These expected results have been generated with joblib 0.9.0 expected_dict = {'py2': ['80f2387e7752abbda2658aafed49e086', '0d700f7f25ea670fd305e4cd93b0e8cd', '83a2bdf843e79e4b3e26521db73088b9', '63e0efd43c0a9ad92a07e8ce04338dd3', '03fef702946b602c852b8b4e60929914', '07074691e90d7098a85956367045c81e', 'd264cf79f353aa7bbfa8349e3df72d8f'], 'py3': ['10a6afc379ca2708acfbaef0ab676eab', '988a7114f337f381393025911ebc823b', 'c6809f4b97e35f2fa0ee8d653cbd025c', 'b3ad17348e32728a7eb9cda1e7ede438', '927b3e6b0b6a037e8e035bda134e0b05', '108f6ee98e7db19ea2006ffd208f4bf1', 'bd48ccaaff28e16e6badee81041b7180']} py_version_str = 'py3' if PY3_OR_LATER else 'py2' expected_list = expected_dict[py_version_str] for to_hash, expected in zip(to_hash_list, expected_list): > assert hash(to_hash) == expected E AssertionError: assert '81b225876779e9dfd235bfd83bb555cb' == '10a6afc379ca2708acfbaef0ab676eab' E - 81b225876779e9dfd235bfd83bb555cb E + 10a6afc379ca2708acfbaef0ab676eab joblib/test/test_hashing.py:459: AssertionError ========================================= 1 failed, 1184 passed, 21 skipped, 1 xfailed in 140.11s (0:02:20) ========================================= ```

aabadie mentioned this issue Feb 4, 2016

[MRG] fix deprecation warning in hashing.py #309

Merged

aabadie self-assigned this Feb 4, 2016

aabadie added the in progress label Feb 4, 2016

lesteve closed this as completed in #309 Feb 11, 2016

lesteve removed the in progress label Feb 11, 2016

arthurmensch mentioned this issue Feb 16, 2016

Joblib deprecation warning scikit-learn/scikit-learn#6370

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix numpy DeprecationWarning in hashing.py #308

Fix numpy DeprecationWarning in hashing.py #308

lesteve commented Feb 4, 2016

aabadie commented Feb 4, 2016

Fix numpy DeprecationWarning in hashing.py #308

Fix numpy DeprecationWarning in hashing.py #308

Comments

lesteve commented Feb 4, 2016

aabadie commented Feb 4, 2016