-
Notifications
You must be signed in to change notification settings - Fork 233
Description
I understand that msgpack
has a single array-type, that both tuple
and list
both get mapped to (and that this is the reason for the use_list
-kwarg).
However, the list/tuple
distinction is essential in Python, and one can't fully emulate the other (e.g. tuple
is hashable, list
isn't; list
has an .append
-method, tuple
doesn't). For this reason, I need to reconstruct tuples and lists correctly, e.g. [(1, 'a'), (2, 'b'), (3, 'c')]
-- actually, arbitrary nestings of tuple/list/dict
.
Currently (v0.5.6), this does not work:
orig = [(1, 'a'), (2, 'b'), (3, 'c')]
packed = msgpack.packb(orig, use_bin_type=True)
packed
# b'\x93\x92\x01\xa1a\x92\x02\xa1b\x92\x03\xa1c'
unpacked = msgpack.unpackb(packed, raw=False)
unpacked
# [[1, 'a'], [2, 'b'], [3, 'c']]
unpacked == orig
# False
unpacked = msgpack.unpackb(packed, use_list=False, raw=False)
unpacked
# ((1, 'a'), (2, 'b'), (3, 'c'))
unpacked == orig
# False
So now to the actual question - I thought I could use the default/object_hook
keywords to help me out, but this does not work for list/tuple
, even though it does work for sets (and all the types covered by pandas.io.packer.encode/decode
).
def encode(obj):
if isinstance(obj, (list, set, tuple)):
return {'t': type(obj).__name__, # type
'v': tuple(obj)} # values
return obj
def decode(obj):
otype = obj.get('t', None)
if otype is None:
return obj
elif otype == 'tuple':
return obj['v']
elif otype == 'list':
return list(obj['v'])
elif otype == 'set':
return set(obj['v'])
else:
return obj
### works for set
orig = [{1, 'a'}, {2, 'b'}, {3, 'c'}]
packed = msgpack.packb(orig, default=encode, use_bin_type=True)
packed
# b'\x93\x82\xa1t\xa3set\xa1v\x92\x01\xa1a\x82\xa1t\xa3set\xa1v\x92\xa1b\x02\x82\xa1t\xa3set\xa1v\x92\x03\xa1c'
unpacked = msgpack.unpackb(packed, object_hook=decode, raw=False)
unpacked == orig
# True
### but does not work for tuple/list
orig = [(1, 'a'), (2, 'b'), (3, 'c')]
packed = msgpack.packb(orig, default=encode, use_bin_type=True)
packed # no change!
# b'\x93\x92\x01\xa1a\x92\x02\xa1b\x92\x03\xa1c'
unpacked = msgpack.unpackb(packed, object_hook=decode, raw=False)
unpacked
# [[1, 'a'], [2, 'b'], [3, 'c']]
unpacked == orig
# False