doc: Pickle with protocol=0 in python 3 does not produce a 'human-readable' format #82422

aggieNick02 · 2019-09-20T22:53:12Z

BPO	38241
Nosy	@serhiy-storchaka, @MojoVampire, @aggieNick02

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2019-09-20.22:53:12.087>
labels = ['type-bug', '3.8', '3.9', '3.10', 'docs']
title = "doc: Pickle with protocol=0 in python 3 does not produce a 'human-readable' format"
updated_at = <Date 2021-03-25.23:18:24.098>
user = 'https://github.com/aggieNick02'

bugs.python.org fields:

activity = <Date 2021-03-25.23:18:24.098>
actor = 'iritkatriel'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Documentation']
creation = <Date 2019-09-20.22:53:12.087>
creator = 'aggieNick02'
dependencies = []
files = []
hgrepos = []
issue_num = 38241
keywords = []
message_count = 6.0
messages = ['352907', '352919', '352920', '352958', '353004', '353014']
nosy_count = 4.0
nosy_names = ['docs@python', 'serhiy.storchaka', 'josh.r', 'aggieNick02']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue38241'
versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

aggieNick02 · 2019-09-20T22:53:12Z

The docs for pickle, in python 2, say that the default pickle protocol, 0, produces ASCII. In the python 3 docs, this has changed to "human-readable". While the pickle output with protocol 0 loads fine in python2, it is definitely not human-readable, as it is not valid ASCII and contains every possible byte.

To see a simple example, run this in both python 2 and 3

import pickle
a = bytearray(range(255)) #bytes containing 0..255
b = bytes(a)
c = pickle.dumps(b,protocol=0)
print(c)#human readable in 2, not in 3
c.decode('ascii')#throws in 3, not in 2

MojoVampire · 2019-09-21T02:12:26Z

This seems like a bug in pickle; protocol 0 is *defined* to be ASCII compatible. Nothing should encode to a byte above 0x7f. It's not actually supposed to be "human-readable" (since many ASCII bytes aren't printable), so the docs should be changed to describe protocol 0 as ASCII consistently; if this isn't fixed to make it ASCII consistently, "human-readable" is still meaningless and shouldn't be used.

I'm kind of surprised the output from Py3 works on Py2 to be honest.

MojoVampire · 2019-09-21T02:15:08Z

I'll note, the same bug appears in Python 2, but only when pickling bytearray; since bytes in Python 2 is just a str alias, you don't see this misbehavior with it, only with bytearray (which is consistently incorrect/non-ASCII on both 2 and 3).

aggieNick02 · 2019-09-22T02:54:09Z

Wow, that's a great catch with bytearray on py2. Protocols 0-2 are all actually supposed to work with python 2, and I was using 0 from Py3 as it's the default for Py2, and what I want to use during a short transition where both Py3 and Py2 are operating on my pickled data. I was surprised when Py3's protocol 0 output was so different than Py2's protocol 0.

To be "human-readable", I think the protocol would have to be even stricter, omitting the non-printable ASCII characters.

I wonder if protocol 0 was initially ASCII (or even stricter), and then this went out the window or was unintentionally not adhered to when new things like bytearray (2.6) were introduced.

serhiy-storchaka · 2019-09-23T10:18:41Z

Protocol 0 was initially ASCII, but it was changed since adding support for the unicode type (and bytearray uses the unicode representation for compatibility with Python 3). It is Latin1 now. And still mostly human-readable (except that some control characters in Unicode strings can be invisible on your terminal).

aggieNick02 · 2019-09-23T13:29:29Z

Apologies as I'm not super-familiar with Latin1 or how Python refers to Latin1, but it seems a little odd to even call it Latin1. It can be decoded as Latin1, but it can contain every possible byte, including 0x7f through 0x9f, which aren't really Latin1. It is human readable when pickling certain data types, but when others are involved, it sure seems like binary to me.

I think that is fine, and perhaps all that needs to be done is to update the documentation to say something like: "Protocol level 0 is the original pickling format. It is the default for Python 2 and is now a binary format; it originally was an ASCII format but this ceased to be true as support for new datatypes was added to Python."

aggieNick02 mannequin added 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Sep 20, 2019

iritkatriel added 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes docs Documentation in the Doc dir and removed 3.7 (EOL) end of life labels Mar 25, 2021

iritkatriel changed the title ~~Pickle with protocol=0 in python 3 does not produce a 'human-readable' format~~ doc: Pickle with protocol=0 in python 3 does not produce a 'human-readable' format Mar 25, 2021

iritkatriel assigned docspython Mar 25, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: Pickle with protocol=0 in python 3 does not produce a 'human-readable' format #82422

doc: Pickle with protocol=0 in python 3 does not produce a 'human-readable' format #82422

aggieNick02 mannequin commented Sep 20, 2019

aggieNick02 mannequin commented Sep 20, 2019

MojoVampire mannequin commented Sep 21, 2019

MojoVampire mannequin commented Sep 21, 2019

aggieNick02 mannequin commented Sep 22, 2019

serhiy-storchaka commented Sep 23, 2019

aggieNick02 mannequin commented Sep 23, 2019

doc: Pickle with protocol=0 in python 3 does not produce a 'human-readable' format #82422

doc: Pickle with protocol=0 in python 3 does not produce a 'human-readable' format #82422

Comments

aggieNick02 mannequin commented Sep 20, 2019

aggieNick02 mannequin commented Sep 20, 2019

MojoVampire mannequin commented Sep 21, 2019

MojoVampire mannequin commented Sep 21, 2019

aggieNick02 mannequin commented Sep 22, 2019

serhiy-storchaka commented Sep 23, 2019

aggieNick02 mannequin commented Sep 23, 2019