Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-dimensional memoryview breaks packing #526

Closed
pohlt opened this issue Jan 17, 2023 · 11 comments
Closed

Multi-dimensional memoryview breaks packing #526

pohlt opened this issue Jan 17, 2023 · 11 comments

Comments

@pohlt
Copy link
Contributor

pohlt commented Jan 17, 2023

NumPy 1.24 seems to use multi-dimensional memoryviews when getting array data with array.data. msgpack does not play nicely with such multi-dimensional views.

Example

Here's a crashing example without NumPy, but creating a multi-dimensional memoryview by hand:

import msgpack

def mutli_dim_memoryview():

    view = memoryview(b"\00" * 6)
    for data in [
        view,  # fine
        view.cast(view.format, (3, 2))  # crashes when unpacking
    ]:
        print("view shape:", data.shape)
        packed = msgpack.packb(data)
        print("packed:", packed)
        unpacked = msgpack.unpackb(packed)
        print("unpacked:", unpacked)

Output

view shape: (6,)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
view shape: (3, 2)
packed: b'\xc4\x03\x00\x00\x00\x00\x00\x00'
Traceback (most recent call last):
    [...]
    unpacked = msgpack.unpackb(packed)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/lib64/python3.11/site-packages/msgpack/fallback.py", line 133, in unpackb
    raise ExtraData(ret, unpacker._get_extradata())
msgpack.exceptions.ExtraData: unpack(b) received extra data.

Mitigations

  • Casting the view w/o a shape creates a 1D view without re-allocating memory (see Python docs): data.case(data.format)
  • Generate bytes either by data.tobytes() or bytes(data). I'm not sure if that duplicates the memory buffer.

Of course, it would be great if msgpack could do that internally when packing a memoryview.

Thanks for this great package! ❤️

@jfolz
Copy link
Contributor

jfolz commented Jan 17, 2023

The packer actually requests a buffer object with the PyBUF_SIMPLE flag, so there should be no shape, strides, or suboffsets. See here. The API also says that "Without stride information, the buffer must be C-contiguous."
As long as I'm not misunderstanding something here Numpy should either return a 1D C-contiguous buffer (possibly creating a copy of non-contiguous arrays first), or return an error if it can't do that. So maybe Numpy is at fault here?

@pohlt
Copy link
Contributor Author

pohlt commented Jan 17, 2023

@jfolz, I don't fully understand your comment.

When packing a NumPy array (in my code not shown here), I explicilty ask for the memoryview using ndarray.data, so I don't blame NumPy for returning a multi-dimensional memoryview.

But even without NumPy: msgpack packs multi-dimensional memoryviews incorrectly as shown in the example, wouldn't you agree?

@methane
Copy link
Member

methane commented Jan 18, 2023

I can not reproduce.

$ python -VV
Python 3.9.7 (default, Sep 16 2021, 13:09:58)
[GCC 7.5.0]

$ pip freeze | grep msgpack
msgpack==1.0.4

$ cat x.py
import msgpack

def multi_dim_memoryview():

    view = memoryview(b"\00" * 6)
    for data in [
        view,  # fine
        view.cast(view.format, (3, 2))  # crashes when unpacking
    ]:
        print("view shape:", data.shape)
        packed = msgpack.packb(data)
        print("packed:", packed)
        unpacked = msgpack.unpackb(packed)
        print("unpacked:", unpacked)

multi_dim_memoryview()

$ python x.py
view shape: (6,)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
view shape: (3, 2)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'

@methane
Copy link
Member

methane commented Jan 18, 2023

I tested Python 3.10 and 3.11 but output is same.

@jfolz
Copy link
Contributor

jfolz commented Jan 18, 2023

When packing a NumPy array (in my code not shown here), I explicilty ask for the memoryview using ndarray.data, so I don't blame NumPy for returning a multi-dimensional memoryview.

You're right, I misunderstood what you were trying to do. Numpy is not to blame, as it's job is done after you get the memoryview object out of it. My point still stands though, if the memoryview cannot produce a nice and simple contiguous buffer object it should return an error and packing should in turn fail.

@pohlt
Copy link
Contributor Author

pohlt commented Jan 18, 2023

@methane , that's weird. I reproduced your steps above in a fresh venv with only msgpack installed:

$ uname -a
Linux big 6.0.18-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jan 7 17:10:00 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

$ python -VV
Python 3.11.1 (main, Dec  7 2022, 00:00:00) [GCC 12.2.1 20221121 (Red Hat 12.2.1-4)]

$ pip freeze | grep msgpack
msgpack==1.0.4

$ cat msgpack_test.py 
import msgpack

def multi_dim_memoryview():
    view = memoryview(b"\00" * 6)
    for data in [
      view,  # fine
      view.cast(view.format, (3, 2))  # crashes when unpacking
    ]:
        print("view shape:", data.shape)
        packed = msgpack.packb(data)
        print("packed:", packed)
        unpacked = msgpack.unpackb(packed)
        print("unpacked:", unpacked)

multi_dim_memoryview()

$ python msgpack_test.py 
view shape: (6,)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
view shape: (3, 2)
packed: b'\xc4\x03\x00\x00\x00\x00\x00\x00'
Traceback (most recent call last):
  File "/home/xxx/msgpack_test.py", line 15, in <module>
    multi_dim_memoryview()
  File "/home/xxx/msgpack_test.py", line 11, in multi_dim_memoryview
    unpacked = msgpack.unpackb(packed)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxx/venv/lib64/python3.11/site-packages/msgpack/fallback.py", line 133, in unpackb
    raise ExtraData(ret, unpacker._get_extradata())
msgpack.exceptions.ExtraData: unpack(b) received extra data.

As you can see, the bin array has a correct size of 6 in the first case and an incorrect size 3 (the size of the first dimension) in the second case.

@pohlt
Copy link
Contributor Author

pohlt commented Jan 18, 2023

@jfolz The memoryview is contiguous, but it's multi-dimensional. The question is what msgpack should do with multi-dimensional memoryviews:

  • Treat it as a flat memoryview and pack and unpack it like that? This would mean that the unpacked memoryview is slightly different: same data, but different shape (flattened). I think that's the current behavior (at least for @methane who currently cannot reproduce the issue).
  • Refuse packing a multi-dimensional memoryview? Ok, but impractical, IMHO.

Could be another configuration parameter.

@methane
Copy link
Member

methane commented Jan 18, 2023

Now I got it. Fallback module packs it wrong.

(venv) [root@978f0846d90b ~]# MSGPACK_PUREPYTHON=1 python mptest.py
view shape: (6,)
packed: b'\xc4\x06\x00\x00\x00\x00\x00\x00'
unpacked: b'\x00\x00\x00\x00\x00\x00'
view shape: (3, 2)
packed: b'\xc4\x03\x00\x00\x00\x00\x00\x00'
Traceback (most recent call last):
  File "/root/mptest.py", line 15, in <module>
    multi_dim_memoryview()
  File "/root/mptest.py", line 12, in multi_dim_memoryview
    unpacked = msgpack.unpackb(packed)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/venv/lib64/python3.11/site-packages/msgpack/fallback.py", line 133, in unpackb
    raise ExtraData(ret, unpacker._get_extradata())
msgpack.exceptions.ExtraData: unpack(b) received extra data.

methane added a commit that referenced this issue Jan 18, 2023
methane added a commit that referenced this issue Jan 18, 2023
methane added a commit that referenced this issue Jan 18, 2023
@methane
Copy link
Member

methane commented Jan 18, 2023

I released 1.0.5rc1. Please try it.

@pohlt
Copy link
Contributor Author

pohlt commented Jan 18, 2023

Issue is fixed with 1.0.5rc1. Thanks a lot!

@pohlt
Copy link
Contributor Author

pohlt commented Jan 19, 2023

But since you now provide wheels for 3.11, I'm no longer using the fallback code, I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants