Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Email parser creates a message object that can't be flattened #71508

Closed
msapiro mannequin opened this issue Jun 14, 2016 · 22 comments
Closed

Email parser creates a message object that can't be flattened #71508

msapiro mannequin opened this issue Jun 14, 2016 · 22 comments
Labels
3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes topic-email type-bug An unexpected behavior, bug, or error

Comments

@msapiro
Copy link
Mannequin

msapiro mannequin commented Jun 14, 2016

BPO 27321
Nosy @warsaw, @msapiro, @diekhans, @bitdancer, @soltysh, @matrixise, @csabella, @kyrias, @miss-islington, @edeca
PRs
  • bpo-27321: email: don't try to replace headers that aren't set #1977
  • bpo-27321 Fix email.generator.py to not replace a non-existent header. #18074
  • [3.9] bpo-27321 Fix email.generator.py to not replace a non-existent header. (GH-18074) #22796
  • [3.8] bpo-27321 Fix email.generator.py to not replace a non-existent header. (GH-18074) #22797
  • Files
  • bad_email: The problem message
  • generator.patch: Suggested fix
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-10-19.23:02:01.155>
    created_at = <Date 2016-06-14.18:52:52.474>
    labels = ['type-bug', '3.8', 'expert-email', '3.10', '3.9']
    title = "Email parser creates a message object that can't be flattened"
    updated_at = <Date 2020-10-19.23:11:42.432>
    user = 'https://github.com/msapiro'

    bugs.python.org fields:

    activity = <Date 2020-10-19.23:11:42.432>
    actor = 'miss-islington'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-10-19.23:02:01.155>
    closer = 'barry'
    components = ['email']
    creation = <Date 2016-06-14.18:52:52.474>
    creator = 'msapiro'
    dependencies = []
    files = ['43391', '43394']
    hgrepos = []
    issue_num = 27321
    keywords = ['patch']
    message_count = 22.0
    messages = ['268580', '268589', '279172', '295080', '295093', '295101', '295105', '295107', '295108', '295430', '295442', '295443', '296293', '310513', '311272', '327136', '336930', '378083', '378088', '379052', '379056', '379057']
    nosy_count = 10.0
    nosy_names = ['barry', 'msapiro', 'diekhans', 'r.david.murray', 'maciej.szulik', 'matrixise', 'cheryl.sabella', 'Johannes L\xc3\xb6thberg', 'miss-islington', 'edeca']
    pr_nums = ['1977', '18074', '22796', '22797']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue27321'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    @msapiro
    Copy link
    Mannequin Author

    msapiro mannequin commented Jun 14, 2016

    The attached file, bad_email, can be parsed via

    msg = email.message_from_binary_file(open('bad_email', 'rb'))

    but then msg.as_string() prodices the following:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python3.5/email/message.py", line 159, in as_string
        g.flatten(self, unixfrom=unixfrom)
      File "/usr/lib/python3.5/email/generator.py", line 115, in flatten
        self._write(msg)
      File "/usr/lib/python3.5/email/generator.py", line 189, in _write
        msg.replace_header('content-transfer-encoding', munge_cte[0])
      File "/usr/lib/python3.5/email/message.py", line 559, in replace_header
        raise KeyError(_name)
    KeyError: 'content-transfer-encoding'

    @msapiro msapiro mannequin added the topic-email label Jun 14, 2016
    @msapiro
    Copy link
    Mannequin Author

    msapiro mannequin commented Jun 14, 2016

    One additional observation. The original message contained no Content-Transfer-Encoding header even though the message body was raw koi8-r characters. Adding

    Content-Transfer-Encoding: 8bit

    to the message headers avoids the issue, but that is not a practical solution as the message was Russian spam received by a Mailman list and the resultant KeyError caused problems in Mailman.

    We can work on defending against this in Mailman, but I suggest that the munge_cte code in generator._write() avoid the documented possible KeyError raised by replace_header() by using __delitem__() and __setitem__() instead as in the attached generator.patch.

    @warsaw
    Copy link
    Member

    warsaw commented Oct 22, 2016

    While I was reviewing https://gitlab.com/mailman/mailman/merge_requests/197/diffs I noticed the KeyError and it made me thing "hmm, I wonder if this should be turned into one of the email package errors"?

    @kyrias
    Copy link
    Mannequin

    kyrias mannequin commented Jun 3, 2017

    Any updates on this? I'm having the same problem with some non-spam emails while trying to use some mail-handling tools written in Python.

    @bitdancer
    Copy link
    Member

    replace_header has a different semantic than del-and-set (replace_header leaves the header in the same location in the list, rather than appending it to the end...that's it's purpose). If replace_header is throwing a key error, then I guess we need a look-before-you-leap if statement.

    And a test :)

    Note that a correct fix would preserve the broken input email, but since we're obviously already doing munging I'm fine with just making this work for now.

    @bitdancer bitdancer added 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Jun 3, 2017
    @msapiro
    Copy link
    Mannequin Author

    msapiro mannequin commented Jun 3, 2017

    I considered look before you leap, but I decided since we're munging the headers anyway, preserving their order is not that critical, but the patch is easy enough. I'll work on that and a test.

    @kyrias
    Copy link
    Mannequin

    kyrias mannequin commented Jun 3, 2017

    Fix: kyrias@a986a82
    Testcase: kyrias@9a51042

    Can start a PR once my CLA signature goes through I guess.

    @msapiro
    Copy link
    Mannequin Author

    msapiro mannequin commented Jun 3, 2017

    It looks like Johannes beat me to it. Thanks for that, but see my comments in the diff at kyrias@a986a82

    @kyrias
    Copy link
    Mannequin

    kyrias mannequin commented Jun 3, 2017

    Ah, didn't even see your comment before I did it! Fix to the comments are on the same branch, will be rebased before PR is up.

    @matrixise
    Copy link
    Member

    maybe we could merge the PR, and I could propose a backport for 3.5 and 3.6.

    2.7 is affected ?

    @bitdancer
    Copy link
    Member

    I'm going to try to review this this weekend.

    @matrixise
    Copy link
    Member

    ok, I have tested on 3.6 and 3.5, just with the test. and in this case, we get the errors on both.
    if we apply the patch of Johannes, the test passes and there is no issues.

    +1 the backports for 3.5 and 3.6 is just a git cherry-picking.

    @kyrias
    Copy link
    Mannequin

    kyrias mannequin commented Jun 18, 2017

    Ping?

    @bitdancer
    Copy link
    Member

    Note: I reviewed this a while ago but the review comments haven't been addressed.

    @bitdancer
    Copy link
    Member

    Requested a small additional change to the new tests, and then this will be ready to go in.

    @edeca
    Copy link
    Mannequin

    edeca mannequin commented Oct 5, 2018

    Ping on an ETA for this fix?

    @matrixise matrixise added the 3.8 (EOL) end of life label Nov 3, 2018
    @csabella
    Copy link
    Contributor

    csabella commented Mar 1, 2019

    @r.david.murray, it appears that all your requested changes have been addressed on the PR. Please re-review this when you get a chance. Thanks!

    @diekhans
    Copy link
    Mannequin

    diekhans mannequin commented Oct 6, 2020

    any chance of getting this merged? A work-around is not obvious

    @msapiro
    Copy link
    Mannequin Author

    msapiro mannequin commented Oct 6, 2020

    I work around it with

    class Message(email.message.Message):
    
        def as_string(self):
            # Work around for https://bugs.python.org/issue27321 and
            # https://bugs.python.org/issue32330.
            try:
                value = email.message.Message.as_string(self)
            except (KeyError, LookupError, UnicodeEncodeError):
                value = email.message.Message.as_bytes(self).decode(
                    'ascii', 'replace')
            # Also ensure no unicode surrogates in the returned string.
            return email.utils._sanitize(value)
    

    This is easy for me because it's Mailman which already subclasses email.message.Message for other reasons. It is perhaps more difficult if you aren't already subclassing email.message.Message for other purposes.

    @miss-islington
    Copy link
    Contributor

    New changeset bf83822 by Mark Sapiro in branch 'master':
    bpo-27321 Fix email.generator.py to not replace a non-existent header. (GH-18074)
    bf83822

    @warsaw warsaw added 3.9 only security fixes 3.10 only security fixes and removed 3.7 (EOL) end of life labels Oct 19, 2020
    @warsaw warsaw closed this as completed Oct 19, 2020
    @miss-islington
    Copy link
    Contributor

    New changeset 371146a by Miss Skeleton (bot) in branch '3.8':
    bpo-27321 Fix email.generator.py to not replace a non-existent header. (GH-18074)
    371146a

    @miss-islington
    Copy link
    Contributor

    New changeset 72ce82a by Miss Skeleton (bot) in branch '3.9':
    bpo-27321 Fix email.generator.py to not replace a non-existent header. (GH-18074)
    72ce82a

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes topic-email type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants