quopri module differences in quoted-printable text with whitespace #60677

aleperalta · 2012-11-14T21:22:06Z

BPO	16473
Nosy	@warsaw, @brettcannon, @jcea, @ncoghlan, @bitdancer, @berkerpeksag, @vadmium, @serhiy-storchaka
Files	test_quopri.diff codec-impl.patch: Document and test quotetabs=True for quopri-codec

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2012-11-14.21:22:05.920>
labels = ['type-bug', 'tests', 'expert-email', 'docs']
title = 'quopri module differences in quoted-printable text with whitespace'
updated_at = <Date 2019-02-24.22:39:40.466>
user = 'https://bugs.python.org/aleperalta'

bugs.python.org fields:

activity = <Date 2019-02-24.22:39:40.466>
actor = 'BreamoreBoy'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Documentation', 'Tests', 'email']
creation = <Date 2012-11-14.21:22:05.920>
creator = 'aleperalta'
dependencies = []
files = ['27985', '37772']
hgrepos = []
issue_num = 16473
keywords = ['patch']
message_count = 13.0
messages = ['175593', '175594', '175595', '179744', '222121', '222122', '234300', '234304', '250506', '250508', '250509', '250514', '250520']
nosy_count = 11.0
nosy_names = ['barry', 'brett.cannon', 'jcea', 'ncoghlan', 'r.david.murray', 'docs@python', 'python-dev', 'berker.peksag', 'martin.panter', 'serhiy.storchaka', 'aleperalta']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue16473'
versions = ['Python 2.7', 'Python 3.4', 'Python 3.5', 'Python 3.6']

aleperalta · 2012-11-14T21:22:04Z

New to python-dev; I grab a beginner tasks "increase test coverage" and I decided to add coverage to this bit of code in the quopri module:

# quopri.py
L138 while n > 0 and line[n-1:n] in b" \t\r":
L139 n = n-1

As far as I understand to get into that while-loop the line to decode should end in " \t\r\n".

So the I added the following test:

    def test_decodestring_badly_enconded(self):
        e = b"hello     \t\r\n"
        p = b"hello\n"
        s = self.module.decodestring(e)
        self.assertEqual(s, p)

but that only passes when the module doesn't use binascii. In fact I change test_quopri to use support.import_fresh_module to disable binascii and removed a decorator that was used.

The decode text when binascci is used is:

>>> quopri.decodestring("hello \t\r\n")
'hello \t\r\n'

which differs from

>>> quopri.a2b_qp = None
>>> quopri.b2a_qp = None
>>> quopri.decodestring("hello \t\r\n")
'hello\n

And what's the deal with:

>>> import quopri
>>> quopri.encodestring("hello \t\r")
'hello \t\r'
>>> "hello \t\r".encode("quopri")
'hello=20=09\r'

bitdancer · 2012-11-14T21:32:36Z

I think I can answer your last question. There are two quopri algorithms, one where spaces are allowed (message body) and one where they aren't (email headers).

For the rest, I'd have to take a closer look than I have time for right now.

aleperalta · 2012-11-14T21:35:10Z

I think I can answer your last question. There are two quopri algorithms,

one where spaces are allowed (message body) and one where they aren't
(email headers).

OK, thank you. Good to know.

jcea · 2013-01-11T23:14:45Z

Ping.

BreamoreBoy · 2014-07-02T20:11:58Z

I'll take this on if I can. Is binascii available on all platforms, as if it is the quopri code could be simplified slightly along with the test code?

bitdancer · 2014-07-02T20:26:57Z

The first problem is determining the "best" error recovery algorithms by reading through the RFCs and considering use cases.

vadmium · 2015-01-19T05:46:35Z

Three slightly different points here:

Decoding trailing whitespace: My understanding is quoted-printable encoding aims to be tolerant of whitespace being added to and removed from the end of encoded lines. So I assume the “binascii” module is wrong to leave trailing whitespace in the decoded output, and the native “quopri” implementation is correct to ignore it.
CRLF handling: See bpo-20121. It seems CRLF newlines should be valid, and I have added a patch to that issue to make the native Python implementation handle CRLF newlines.
Whitespace encoding: The quopri-codec actually sets quotetabs=True. Here is a patch to document and test that, as well as correct the functions used by other codecs.

vadmium · 2015-01-19T06:26:53Z

Regarding decoding trailing whitespace, <https://tools.ietf.org/html/rfc1521.html#section-5.1\> rule #3 says:

“When decoding a Quoted-Printable body, any trailing white space on a line must be deleted, as it will necessarily have been added by intermediate transport agents.”

vadmium · 2015-09-12T00:50:28Z

Will commit a slightly modified version of my doc patch to 3.4+, since mentioning the wrong functions is confusing. But I think we still need to fix the “binascii” decoding, and have a look at Alejandro’s test suite patch.

python-dev · 2015-09-12T01:44:43Z

New changeset de82f41d6669 by Martin Panter <vadmium> in branch '3.4':
Issue bpo-16473: Fix byte transform codec documentation; test quotetabs=True
https://hg.python.org/cpython/rev/de82f41d6669

New changeset 28cd11dc2915 by Martin Panter <vadmium> in branch '3.5':
Issue bpo-16473: Merge codecs doc and test from 3.4 into 3.5
https://hg.python.org/cpython/rev/28cd11dc2915

New changeset 3ecb5766ba15 by Martin Panter <vadmium> in branch 'default':
Issue bpo-16473: Merge codecs doc and test from 3.5
https://hg.python.org/cpython/rev/3ecb5766ba15

python-dev · 2015-09-12T02:56:31Z

New changeset cfb0481c89d7 by Martin Panter <vadmium> in branch '2.7':
Issue bpo-16473: Fix byte transform codec documentation; test quotetabs=True
https://hg.python.org/cpython/rev/cfb0481c89d7

serhiy-storchaka · 2015-09-12T08:13:33Z

Mentioned functions are not exact equivalents of codecs. They are preferable way to to obtain the similar (apart from minor details) output.

vadmium · 2015-09-12T12:16:04Z

The list of functions were added in bpo-17844. I made the change today because I forgot that the listed functions weren’t exactly equivalent when investigating bpo-25075.

Base64-codec encodes to multiple lines, but b64encode() returns the raw encoding without line breaks. I see that base64.encodebytes() is listed as a “legacy interface”, but as far as I can tell nothing outside the legacy interface does any line splitting.

Hex-codec encodes to lowercase, but b16encode() returns uppercase, following RFC 4648.

Quopri-codec encodes all whitespace, but quopri.encodestring() lets most whitespace through verbatim by default. In this case I think it would be reasonable to change back to encodestring() if we say that quotetabs=True is passed in.

aleperalta mannequin added the tests Tests in the Lib/test dir label Nov 14, 2012

bitdancer added the topic-email label Nov 14, 2012

BreamoreBoy mannequin changed the title ~~Minor difference in decoding quoted-printable text~~ quopri module minor difference in decoding quoted-printable text Jul 2, 2014

vadmium added the docs Documentation in the Doc dir label Jan 19, 2015

vadmium changed the title ~~quopri module minor difference in decoding quoted-printable text~~ quopri module differences in quoted-printable text with whitespace Jan 19, 2015

vadmium assigned docspython Jan 19, 2015

vadmium added the type-bug An unexpected behavior, bug, or error label Sep 12, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quopri module differences in quoted-printable text with whitespace #60677

quopri module differences in quoted-printable text with whitespace #60677

aleperalta mannequin commented Nov 14, 2012

aleperalta mannequin commented Nov 14, 2012

bitdancer commented Nov 14, 2012

aleperalta mannequin commented Nov 14, 2012

jcea commented Jan 11, 2013

BreamoreBoy mannequin commented Jul 2, 2014

bitdancer commented Jul 2, 2014

vadmium commented Jan 19, 2015

vadmium commented Jan 19, 2015

vadmium commented Sep 12, 2015

python-dev mannequin commented Sep 12, 2015

python-dev mannequin commented Sep 12, 2015

serhiy-storchaka commented Sep 12, 2015

vadmium commented Sep 12, 2015

quopri module differences in quoted-printable text with whitespace #60677

quopri module differences in quoted-printable text with whitespace #60677

Comments

aleperalta mannequin commented Nov 14, 2012

aleperalta mannequin commented Nov 14, 2012

bitdancer commented Nov 14, 2012

aleperalta mannequin commented Nov 14, 2012

jcea commented Jan 11, 2013

BreamoreBoy mannequin commented Jul 2, 2014

bitdancer commented Jul 2, 2014

vadmium commented Jan 19, 2015

vadmium commented Jan 19, 2015

vadmium commented Sep 12, 2015

python-dev mannequin commented Sep 12, 2015

python-dev mannequin commented Sep 12, 2015

serhiy-storchaka commented Sep 12, 2015

vadmium commented Sep 12, 2015