Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quopri module differences in quoted-printable text with whitespace #60677

Open
aleperalta mannequin opened this issue Nov 14, 2012 · 13 comments
Open

quopri module differences in quoted-printable text with whitespace #60677

aleperalta mannequin opened this issue Nov 14, 2012 · 13 comments
Labels
docs Documentation in the Doc dir tests Tests in the Lib/test dir topic-email type-bug An unexpected behavior, bug, or error

Comments

@aleperalta
Copy link
Mannequin

aleperalta mannequin commented Nov 14, 2012

BPO 16473
Nosy @warsaw, @brettcannon, @jcea, @ncoghlan, @bitdancer, @berkerpeksag, @vadmium, @serhiy-storchaka
Files
  • test_quopri.diff
  • codec-impl.patch: Document and test quotetabs=True for quopri-codec
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2012-11-14.21:22:05.920>
    labels = ['type-bug', 'tests', 'expert-email', 'docs']
    title = 'quopri module differences in quoted-printable text with whitespace'
    updated_at = <Date 2019-02-24.22:39:40.466>
    user = 'https://bugs.python.org/aleperalta'

    bugs.python.org fields:

    activity = <Date 2019-02-24.22:39:40.466>
    actor = 'BreamoreBoy'
    assignee = 'docs@python'
    closed = False
    closed_date = None
    closer = None
    components = ['Documentation', 'Tests', 'email']
    creation = <Date 2012-11-14.21:22:05.920>
    creator = 'aleperalta'
    dependencies = []
    files = ['27985', '37772']
    hgrepos = []
    issue_num = 16473
    keywords = ['patch']
    message_count = 13.0
    messages = ['175593', '175594', '175595', '179744', '222121', '222122', '234300', '234304', '250506', '250508', '250509', '250514', '250520']
    nosy_count = 11.0
    nosy_names = ['barry', 'brett.cannon', 'jcea', 'ncoghlan', 'r.david.murray', 'docs@python', 'python-dev', 'berker.peksag', 'martin.panter', 'serhiy.storchaka', 'aleperalta']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'needs patch'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue16473'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5', 'Python 3.6']

    @aleperalta
    Copy link
    Mannequin Author

    aleperalta mannequin commented Nov 14, 2012

    New to python-dev; I grab a beginner tasks "increase test coverage" and I decided to add coverage to this bit of code in the quopri module:

    # quopri.py
    L138 while n > 0 and line[n-1:n] in b" \t\r":
    L139 n = n-1

    As far as I understand to get into that while-loop the line to decode should end in " \t\r\n".

    So the I added the following test:

        def test_decodestring_badly_enconded(self):
            e = b"hello     \t\r\n"
            p = b"hello\n"
            s = self.module.decodestring(e)
            self.assertEqual(s, p)

    but that only passes when the module doesn't use binascii. In fact I change test_quopri to use support.import_fresh_module to disable binascii and removed a decorator that was used.

    The decode text when binascci is used is:

    >>> quopri.decodestring("hello \t\r\n")
    'hello \t\r\n'

    which differs from

    >>> quopri.a2b_qp = None
    >>> quopri.b2a_qp = None
    >>> quopri.decodestring("hello \t\r\n")
    'hello\n

    And what's the deal with:

    >>> import quopri
    >>> quopri.encodestring("hello \t\r")
    'hello \t\r'
    >>> "hello \t\r".encode("quopri")
    'hello=20=09\r'

    @aleperalta aleperalta mannequin added the tests Tests in the Lib/test dir label Nov 14, 2012
    @bitdancer
    Copy link
    Member

    I think I can answer your last question. There are two quopri algorithms, one where spaces are allowed (message body) and one where they aren't (email headers).

    For the rest, I'd have to take a closer look than I have time for right now.

    @aleperalta
    Copy link
    Mannequin Author

    aleperalta mannequin commented Nov 14, 2012

    I think I can answer your last question. There are two quopri algorithms,

    one where spaces are allowed (message body) and one where they aren't
    (email headers).

    OK, thank you. Good to know.

    @jcea
    Copy link
    Member

    jcea commented Jan 11, 2013

    Ping.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 2, 2014

    I'll take this on if I can. Is binascii available on all platforms, as if it is the quopri code could be simplified slightly along with the test code?

    @BreamoreBoy BreamoreBoy mannequin changed the title Minor difference in decoding quoted-printable text quopri module minor difference in decoding quoted-printable text Jul 2, 2014
    @bitdancer
    Copy link
    Member

    The first problem is determining the "best" error recovery algorithms by reading through the RFCs and considering use cases.

    @vadmium
    Copy link
    Member

    vadmium commented Jan 19, 2015

    Three slightly different points here:

    1. Decoding trailing whitespace: My understanding is quoted-printable encoding aims to be tolerant of whitespace being added to and removed from the end of encoded lines. So I assume the “binascii” module is wrong to leave trailing whitespace in the decoded output, and the native “quopri” implementation is correct to ignore it.

    2. CRLF handling: See bpo-20121. It seems CRLF newlines should be valid, and I have added a patch to that issue to make the native Python implementation handle CRLF newlines.

    3. Whitespace encoding: The quopri-codec actually sets quotetabs=True. Here is a patch to document and test that, as well as correct the functions used by other codecs.

    @vadmium vadmium added the docs Documentation in the Doc dir label Jan 19, 2015
    @vadmium vadmium changed the title quopri module minor difference in decoding quoted-printable text quopri module differences in quoted-printable text with whitespace Jan 19, 2015
    @vadmium
    Copy link
    Member

    vadmium commented Jan 19, 2015

    Regarding decoding trailing whitespace, <https://tools.ietf.org/html/rfc1521.html#section-5.1\> rule #3 says:

    “When decoding a Quoted-Printable body, any trailing white space on a line must be deleted, as it will necessarily have been added by intermediate transport agents.”

    @vadmium
    Copy link
    Member

    vadmium commented Sep 12, 2015

    Will commit a slightly modified version of my doc patch to 3.4+, since mentioning the wrong functions is confusing. But I think we still need to fix the “binascii” decoding, and have a look at Alejandro’s test suite patch.

    @vadmium vadmium added the type-bug An unexpected behavior, bug, or error label Sep 12, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 12, 2015

    New changeset de82f41d6669 by Martin Panter <vadmium> in branch '3.4':
    Issue bpo-16473: Fix byte transform codec documentation; test quotetabs=True
    https://hg.python.org/cpython/rev/de82f41d6669

    New changeset 28cd11dc2915 by Martin Panter <vadmium> in branch '3.5':
    Issue bpo-16473: Merge codecs doc and test from 3.4 into 3.5
    https://hg.python.org/cpython/rev/28cd11dc2915

    New changeset 3ecb5766ba15 by Martin Panter <vadmium> in branch 'default':
    Issue bpo-16473: Merge codecs doc and test from 3.5
    https://hg.python.org/cpython/rev/3ecb5766ba15

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 12, 2015

    New changeset cfb0481c89d7 by Martin Panter <vadmium> in branch '2.7':
    Issue bpo-16473: Fix byte transform codec documentation; test quotetabs=True
    https://hg.python.org/cpython/rev/cfb0481c89d7

    @serhiy-storchaka
    Copy link
    Member

    Mentioned functions are not exact equivalents of codecs. They are preferable way to to obtain the similar (apart from minor details) output.

    @vadmium
    Copy link
    Member

    vadmium commented Sep 12, 2015

    The list of functions were added in bpo-17844. I made the change today because I forgot that the listed functions weren’t exactly equivalent when investigating bpo-25075.

    Base64-codec encodes to multiple lines, but b64encode() returns the raw encoding without line breaks. I see that base64.encodebytes() is listed as a “legacy interface”, but as far as I can tell nothing outside the legacy interface does any line splitting.

    Hex-codec encodes to lowercase, but b16encode() returns uppercase, following RFC 4648.

    Quopri-codec encodes all whitespace, but quopri.encodestring() lets most whitespace through verbatim by default. In this case I think it would be reasonable to change back to encodestring() if we say that quotetabs=True is passed in.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir tests Tests in the Lib/test dir topic-email type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants