Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe unpickling of utf8 and latin1 encodings in Python3 #49

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

hunse
Copy link
Collaborator

@hunse hunse commented Jun 8, 2017

This allows Python2 pickle files to have strings encoded as either utf8 or ascii (latin1).

The only disadvantage is that it uses the pure-python pickle implementation, rather than the faster C-based _pickle. In practice, though, this doesn't seem much slower.

I thought that I should be able to avoid doing this by setting the errors argument on pickle.load. This gets passed to codecs.decode, and controls what happens if an error happens when trying to decode a string. However, whatever I set as the value of errors, I always get this exception: ValueError: Failed to encode latin1 string when unpickling a Numpy array. pickle.load(a, encoding='latin1') is assumed.

I even tried defining my own error handler with codecs.register_error, but I can't seem to get around that exception.

Anyway, should we go ahead with this? It seems like a bit of a hack, but it does let things work nicely with pickle files that use either encoding.

hunse added 2 commits June 8, 2017 12:05
This allows Python2 pickle files to have strings encoded as either
utf8 or ascii (latin1).

The only disadvantage is that it uses the pure-python ``pickle``
implementation, rather than the faster C-based ``_pickle``. In
practice, though, this doesn't seem much slower.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants