-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
byte/unicode pickle incompatibilities between python2 and python3 #51033
Comments
i just noticed that there are some slight differences of the the first things i noticed are: a str from python2 is unpickled as unicode in python3 a bytes instance from python3 is pickled as custom class in protocols <3 i'll write a script to try all combinations of protocols and string |
Why are you reporting this here? If you think there is a bug, can you The changes you mentioned are all deliberate. |
the basic behavior i want to see for all protocols <= 2
anything else is is confusing and may break note that these changes seem irrelevant for protocol 3 as python2.x |
That would not be good. Many people create pickles in 2.x where the
That's the case, right?
That's also the case, AFAICT.
Hmm. This may be indeed a mistake. Until r61467, bytes were saved |
Since it breaks for anything non-ascii, its not that helpfull after all It might be preferable to supply unpicklers that are cappable of yup
b'\x80\x02c__builtin__\nbytes\nq\x00]q\x01\x85q\x02Rq\x03.' I'm convinced that a 1:1 mapping of python2 string from/to python3 It just has bitten me, and i suspect will will get others, too. |
its even worse python3:
>>> import pickle
>>> pickle.dumps(b'', protocol=2)
b'\x80\x02c__builtin__\nbytes\nq\x00]q\x01\x85q\x02Rq\x03.'
python2.6:
>>> import pickle
>>> pickle.loads('\x80\x02c__builtin__\nbytes\nq\x00]q\x01\x85q\x02Rq\x03.')
'[]' |
The problem with trying to solve the following issue: |
unpickle of any non-ascii string from python2 will break also byte-strings stored as python2 str would break and since i pass around binary strings as parts of objects, its just |
in case the actual behavior is not supposed to change how about a way to declare one wants exact 1:1 mapping between py2<>py3, something like load/dump(..., encoding=bytes) just crossed my mind |
In a sense, that's already possible. Inherit from _Pickler/_Unpickler, I wouldn't object to supporting this with an option, though, assuming it |
Note that this is also a documentation issue: "The pickle |
i'll try to add some tests now hopefully i can get rid of the implicit badness like trying to coerce |
Any news on this? Just as a note, pickletools.py also does not reflect the current behaviour; pickle types STRING, BINSTRING and SHORT_BINSTRING are all defined with stack_after=[pystring]: [1, line 992] although the doc=... does describe it will be decoded, the object type of pystring is still defined as bytes: [1, line 747] [1] http://hg.python.org/cpython/file/98df29d51e12/Lib/pickletools.py |
im unlikely to find the time to try and fix pickle/cpickle myself in the next few months |
Last night, I hacked together a wrapper to do what loewis suggested [1]. It pickles bytes to str (for protocol <= 2), and unpickles str to bytes. If I (ever) get the build system and tests of python itself to work, I'll try and see if I can implement a nicer solution - at least for pickle.py. [1] https://github.com/valhallasw/py2/blob/master/bytestrpickle.py |
If you have any problems with that, don't hesitate to ask on python-dev |
OK, this is the pickle.py patch. A new parameter 'bytestr' has been added to both _Pickler and _Unpickler to toggle the pickle.string<=>bytes behaviour: _Pickler: _Unpickler: I also extracted the decoding stuff from the three string reading functions to a single one. |
P.S. (sorry for forgetting this in the original post ;-)) Both The test script should of course be merged with test_pickle.py at some time.... |
Ok, this is my first attempt at the Pickler part of the C implementation. I'll have to adapt the python implementation to match this one. All BytestrPicklerTests in test_bytestrpickle.py pass, and ./python -m test -G -v test_pickle passes. Comments on style etc. are very welcome. |
Added tests in Lib/test format. After applying pickle.py.patch and BytestrPickler_c.diff, and has 8 errors (as expected). |
And a complete patch that implements the tests, the python implementation and the C implementation. I'm not completely happy with the code duplication in read_string/read_binstring/read_short_binstring C implementation, so that might be an improvement (however, there is already a lot of code duplication there at the moment). Again: comments would be very welcome... |
OK, and now a version that's not broken... I forgot to initialize self->bytestr for PicklerObject/UnpicklerObject. *puts on the you-broke-the-build-hat* Except for test_packaging.test_caches, this version passes all tests -- test_packaging.test_caches, which seems to fail because I make install'd python and installed {distribute,pip,setuptools,virtualenv}. |
Based on the discussion on python-dev [1], this is an updated implementation that uses encoding='bytes' to signal str->bytes behaviour. http://mail.python.org/pipermail/python-dev/2012-March/117536.html |
...and the tests to go with that. |
Could you provide a single patch with the implementation and the tests together? I will try to find some time this week to review this. |
Hi Alexandre, Attached is a diff based on r87793:0c508d87f80b. Merlijn |
I have fixed most of the nits in this patch, except for:
./python -E ./Tools/clinic/clinic.py --make and I have no clue how to fix that -- the clinic docs are sparse, to say the least;
|
I cleaned up the patch. I will submit it tonight if there is no major objections. |
How about updating the documentation as well? |
And what about an issue mentioned in msg153659? |
New changeset bd71352e950f by Alexandre Vassalotti in branch 'default': |
I fixed up the last few review comments and submitted the patch. Thank you for the help! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: