Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python pickler unable to pickle object the native pickler can #120380

Closed
apmorton opened this issue Jun 12, 2024 · 7 comments
Closed

Python pickler unable to pickle object the native pickler can #120380

apmorton opened this issue Jun 12, 2024 · 7 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@apmorton
Copy link

apmorton commented Jun 12, 2024

Bug report

Bug description:

import io
import pickle


class ZeroCopyByteArray(bytearray):
    def __reduce_ex__(self, protocol):
        return type(self), (pickle.PickleBuffer(self),), None


data = [
    ZeroCopyByteArray(),
    ZeroCopyByteArray(),
]

# this works
f = io.BytesIO()
pickler = pickle.Pickler(f, protocol=5)
pickler.dump(data)

# this hits memo assertion
f = io.BytesIO()
pickler = pickle._Pickler(f, protocol=5)
pickler.dump(data)
Traceback (most recent call last):
  File "bug.py", line 24, in <module>
    pickler.dump(data)
  File "/pickle.py", line 487, in dump
    self.save(obj)
  File "/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
    ^^^^^^^^^^^^
  File "/pickle.py", line 932, in save_list
    self._batch_appends(obj)
  File "/pickle.py", line 956, in _batch_appends
    save(x)
  File "/pickle.py", line 603, in save
    self.save_reduce(obj=obj, *rv)
  File "/pickle.py", line 692, in save_reduce
    save(args)
  File "/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
    ^^^^^^^^^^^^
  File "/pickle.py", line 887, in save_tuple
    save(element)
  File "/pickle.py", line 560, in save
    f(self, obj)  # Call unbound method with explicit self
    ^^^^^^^^^^^^
  File "/pickle.py", line 842, in save_picklebuffer
    self.save_bytearray(m.tobytes())
  File "/pickle.py", line 821, in save_bytearray
    self.memoize(obj)
  File "/pickle.py", line 508, in memoize
    assert id(obj) not in self.memo
           ^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

CPython versions tested on:

3.11

Operating systems tested on:

Linux

Linked PRs

@apmorton apmorton added the type-bug An unexpected behavior, bug, or error label Jun 12, 2024
@aisk aisk added the stdlib Python modules in the Lib dir label Jun 12, 2024
@picnixz

This comment was marked as resolved.

@aisk

This comment was marked as resolved.

@picnixz

This comment was marked as resolved.

@picnixz
Copy link
Contributor

picnixz commented Jun 12, 2024

The ZeroCopyBytearray buffer appears to need the following implementation (it's actually in Lib/test/pickletester) but I confirmed that the issue remains but only if the buffer is empty (otherwise it's fine). The implementation is roughly

class ZeroCopyBytearray(bytearray):
    def __reduce_ex__(self, protocol):
        if protocol >= 5:
            return type(self)._reconstruct, (pickle.PickleBuffer(self),), None
        else:
            return type(self)._reconstruct, (bytes(self),)

    @classmethod
    def _reconstruct(cls, obj):
        with memoryview(obj) as m:
            obj = m.obj
            if type(obj) is cls:
                # Zero-copy
                return obj
            else:
                return cls(obj)

So, the __reduce_ex__ method makes more things than what you provided in your example, but it does not fix the issue at all.

EDIT: I'm investigating the issue in picnixz@a8c1464

@apmorton
Copy link
Author

Correct, I minimized the example to only the relevant parts for reproduction.

The issue only happens if you use protocol 5 and end up with two objects having different IDs produce a PickleBuffer with the same underlying buffer ID.

@picnixz
Copy link
Contributor

picnixz commented Jun 12, 2024

Yes, the issue is that this specific test case is not covered (I've added one and I'll try to fix it), which is why we did not see the issue (by specific case, I mean the two different instances that are empty, not that it still works if I use the same non-empty buffer values).

encukou pushed a commit that referenced this issue Jun 21, 2024
…and `bytearray` objects in protocol version 5. (GH-120422)
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 21, 2024
…ytes` and `bytearray` objects in protocol version 5. (pythonGH-120422)

(cherry picked from commit 7595e67)

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 21, 2024
…ytes` and `bytearray` objects in protocol version 5. (pythonGH-120422)

(cherry picked from commit 7595e67)

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
encukou pushed a commit that referenced this issue Jun 26, 2024
…bytes` and `bytearray` objects in protocol version 5. (GH-120422) (GH-120832)

gh-120380: fix Python implementation of `pickle.Pickler` for `bytes` and `bytearray` objects in protocol version 5. (GH-120422)
(cherry picked from commit 7595e67)

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
encukou pushed a commit that referenced this issue Jun 26, 2024
…bytes` and `bytearray` objects in protocol version 5. (GH-120422) (GH-120833)

gh-120380: fix Python implementation of `pickle.Pickler` for `bytes` and `bytearray` objects in protocol version 5. (GH-120422)
(cherry picked from commit 7595e67)

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
@encukou
Copy link
Member

encukou commented Jun 27, 2024

Thank you @apmorton for the report, and @picnixz for the fix!

@encukou encukou closed this as completed Jun 27, 2024
mrahtz pushed a commit to mrahtz/cpython that referenced this issue Jun 30, 2024
…ytes` and `bytearray` objects in protocol version 5. (pythonGH-120422)
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
…ytes` and `bytearray` objects in protocol version 5. (pythonGH-120422)
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…ytes` and `bytearray` objects in protocol version 5. (pythonGH-120422)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

4 participants