Safe unpickling of utf8 and latin1 encodings in Python3 #49

hunse · 2017-06-08T20:56:40Z

This allows Python2 pickle files to have strings encoded as either utf8 or ascii (latin1).

The only disadvantage is that it uses the pure-python pickle implementation, rather than the faster C-based _pickle. In practice, though, this doesn't seem much slower.

I thought that I should be able to avoid doing this by setting the errors argument on pickle.load. This gets passed to codecs.decode, and controls what happens if an error happens when trying to decode a string. However, whatever I set as the value of errors, I always get this exception: ValueError: Failed to encode latin1 string when unpickling a Numpy array. pickle.load(a, encoding='latin1') is assumed.

I even tried defining my own error handler with codecs.register_error, but I can't seem to get around that exception.

Anyway, should we go ahead with this? It seems like a bit of a hack, but it does let things work nicely with pickle files that use either encoding.

This allows Python2 pickle files to have strings encoded as either utf8 or ascii (latin1). The only disadvantage is that it uses the pure-python ``pickle`` implementation, rather than the faster C-based ``_pickle``. In practice, though, this doesn't seem much slower.

hunse added 2 commits June 8, 2017 12:05

Python3 and flake8 fixes

df038ef

hunse mentioned this pull request Jun 8, 2017

Python3 and flake8 fixes #48

Closed

Seanny123 added the discussion label Dec 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safe unpickling of utf8 and latin1 encodings in Python3 #49

Safe unpickling of utf8 and latin1 encodings in Python3 #49

hunse commented Jun 8, 2017 •

edited

Safe unpickling of utf8 and latin1 encodings in Python3 #49

Are you sure you want to change the base?

Safe unpickling of utf8 and latin1 encodings in Python3 #49

Conversation

hunse commented Jun 8, 2017 • edited

hunse commented Jun 8, 2017 •

edited