Skip to content

[Python 3] joblib.hashing.hash error for mixed types sets or dict with mixed types keys #254

@lesteve

Description

@lesteve
import joblib
joblib.hashing.hash({'a', 1})
joblib.hashing.hash({'a': 1, 1: 2})

The traceback:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-e763c033a75e> in <module>()
----> 1 joblib.hashing.hash({'a': 1, 1: 2})

/home/le243287/dev/joblib/joblib/hashing.py in hash(obj, hash_name, coerce_mmap)
    231     else:
    232         hasher = Hasher(hash_name=hash_name)
--> 233     return hasher.hash(obj)

/home/le243287/dev/joblib/joblib/hashing.py in hash(self, obj, return_digest)
     52     def hash(self, obj, return_digest=True):
     53         try:
---> 54             self.dump(obj)
     55         except pickle.PicklingError as e:
     56             warnings.warn('PicklingError while hashing %r: %r' % (obj, e))

/volatile/le243287/miniconda3/lib/python3.4/pickle.py in dump(self, obj)
    410         if self.proto >= 4:
    411             self.framer.start_framing()
--> 412         self.save(obj)
    413         self.write(STOP)
    414         self.framer.end_framing()

/home/le243287/dev/joblib/joblib/hashing.py in save(self, obj)
    211             klass = obj.__class__
    212             obj = (klass, ('HASHED', obj.descr))
--> 213         Hasher.save(self, obj)
    214 
    215 

/home/le243287/dev/joblib/joblib/hashing.py in save(self, obj)
     77                 cls = obj.__self__.__class__
     78                 obj = _MyHash(func_name, inst, cls)
---> 79         Pickler.save(self, obj)
     80 
     81     def memoize(self, obj):

/volatile/le243287/miniconda3/lib/python3.4/pickle.py in save(self, obj, save_persistent_id)
    477         f = self.dispatch.get(t)
    478         if f is not None:
--> 479             f(self, obj) # Call unbound method with explicit self
    480             return
    481 

/volatile/le243287/miniconda3/lib/python3.4/pickle.py in save_dict(self, obj)
    812 
    813         self.memoize(obj)
--> 814         self._batch_setitems(obj.items())
    815 
    816     dispatch[dict] = save_dict

/home/le243287/dev/joblib/joblib/hashing.py in _batch_setitems(self, items)
    124     def _batch_setitems(self, items):
    125         # forces order of keys in dict to ensure consistent hash
--> 126         Pickler._batch_setitems(self, iter(sorted(items)))
    127 
    128     def save_set(self, set_items):

TypeError: unorderable types: str() < int()

The reason: we rely on sorted to guarantee the reproducibility of the hash for non ordered containers and in Python 3 you can not compare arbitrary types.

Maybe the first thing to do would be to just catch the error in Python 3 and raise a more user-friendly one saying that set with mixed types or dict with mixed type keys are not supported ?

A possible fix (suggested by @ogrisel): using joblib.hashing.hash to hash the elements (in the set case) or keys (in the dict case) and use key= in the sorted call.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions