Skip to content

Kwargs default/object_hook ignored for tuple/list? #305

@h-vetinari

Description

@h-vetinari

I understand that msgpack has a single array-type, that both tuple and list both get mapped to (and that this is the reason for the use_list-kwarg).

However, the list/tuple distinction is essential in Python, and one can't fully emulate the other (e.g. tuple is hashable, list isn't; list has an .append-method, tuple doesn't). For this reason, I need to reconstruct tuples and lists correctly, e.g. [(1, 'a'), (2, 'b'), (3, 'c')] -- actually, arbitrary nestings of tuple/list/dict.

Currently (v0.5.6), this does not work:

orig = [(1, 'a'), (2, 'b'), (3, 'c')]
packed = msgpack.packb(orig, use_bin_type=True)
packed
# b'\x93\x92\x01\xa1a\x92\x02\xa1b\x92\x03\xa1c'

unpacked = msgpack.unpackb(packed, raw=False)
unpacked
# [[1, 'a'], [2, 'b'], [3, 'c']]
unpacked == orig
# False

unpacked = msgpack.unpackb(packed, use_list=False, raw=False)
unpacked
# ((1, 'a'), (2, 'b'), (3, 'c'))
unpacked == orig
# False

So now to the actual question - I thought I could use the default/object_hook keywords to help me out, but this does not work for list/tuple, even though it does work for sets (and all the types covered by pandas.io.packer.encode/decode).

def encode(obj):
    if isinstance(obj, (list, set, tuple)):
        return {'t': type(obj).__name__, # type
                'v': tuple(obj)} # values
    return obj

def decode(obj):
    otype = obj.get('t', None)
    if otype is None:
        return obj
    elif otype == 'tuple':
        return obj['v']
    elif otype == 'list':
        return list(obj['v'])
    elif otype == 'set':
        return set(obj['v'])
    else:
        return obj

### works for set
orig = [{1, 'a'}, {2, 'b'}, {3, 'c'}]
packed = msgpack.packb(orig, default=encode, use_bin_type=True)
packed
# b'\x93\x82\xa1t\xa3set\xa1v\x92\x01\xa1a\x82\xa1t\xa3set\xa1v\x92\xa1b\x02\x82\xa1t\xa3set\xa1v\x92\x03\xa1c'

unpacked = msgpack.unpackb(packed, object_hook=decode, raw=False)
unpacked == orig
# True

### but does not work for tuple/list
orig = [(1, 'a'), (2, 'b'), (3, 'c')]
packed = msgpack.packb(orig, default=encode, use_bin_type=True)
packed # no change!
# b'\x93\x92\x01\xa1a\x92\x02\xa1b\x92\x03\xa1c'

unpacked = msgpack.unpackb(packed, object_hook=decode, raw=False)
unpacked
# [[1, 'a'], [2, 'b'], [3, 'c']]
unpacked == orig
# False

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions