New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persistent id in pickle with protocol version 0 #61911
Comments
Python 2 allows pickling and unpickling non-ascii persistent ids. In Python 3 C implementation of pickle saves persistent ids with protocol version 0 as utf8-encoded strings and loads as bytes. >>> import pickle, io
>>> class MyPickler(pickle.Pickler):
... def persistent_id(self, obj):
... if isinstance(obj, str):
... return obj
... return None
...
>>> class MyUnpickler(pickle.Unpickler):
... def persistent_load(self, pid):
... return pid
...
>>> f = io.BytesIO(); MyPickler(f).dump('\u20ac'); data = f.getvalue()
>>> MyUnpickler(io.BytesIO(data)).load()
'€'
>>> f = io.BytesIO(); MyPickler(f, 0).dump('\u20ac'); data = f.getvalue()
>>> MyUnpickler(io.BytesIO(data)).load()
b'\xe2\x82\xac'
>>> f = io.BytesIO(); MyPickler(f, 0).dump('a'); data = f.getvalue()
>>> MyUnpickler(io.BytesIO(data)).load()
b'a' Python implementation in Python 3 doesn't works with non-ascii persistant ids at all. |
In protocol 0, the persistent ID is restricted to alphanumeric strings because of the problems that arise when the persistent ID contains newline characters. _pickle likely should be changed to use the ASCII decoded. And perhaps, we should check for embedded newline characters too. |
Even for alphanumeric strings Python 3 have a bug. It saves strings and load bytes objects. |
Here's a patch that fix the bug. |
I think a string with character codes < 256 will be better for test_protocol0_is_ascii_only(). It can be latin1 encoded (Python 2 allows any 8-bit strings). PyUnicode_AsASCIIString() can be slower than _PyUnicode_AsStringAndSize() (actually PyUnicode_AsUTF8AndSize()) because the latter can use cached value. You can check if the persistent id only contains ASCII characters by checking PyUnicode_GET_LENGTH(pid_str) == size. And what are you going to do with the fact that in Python 2 you can pickle non-ascii persistent ids, which will not be able to unpickle in Python 3? |
The patch is updated to current sources. Also optimized writing ASCII strings and fixed tests. |
Ping. |
Ping again. |
New changeset f6a41552a312 by Serhiy Storchaka in branch '3.5': New changeset df8857c6f3eb by Serhiy Storchaka in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: