Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CVE-2019-16056] email.utils.parseaddr mistakenly parse an email #78336

Closed
cnicodeme mannequin opened this issue Jul 19, 2018 · 52 comments
Closed

[CVE-2019-16056] email.utils.parseaddr mistakenly parse an email #78336

cnicodeme mannequin opened this issue Jul 19, 2018 · 52 comments
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes topic-email type-security A security issue

Comments

@cnicodeme
Copy link
Mannequin

cnicodeme mannequin commented Jul 19, 2018

BPO 34155
Nosy @warsaw, @vstinner, @msapiro, @larryhastings, @ned-deily, @mcepl, @bitdancer, @ambv, @nicoe, @maxking, @ksze, @miss-islington, @Windsooon, @tirkarthi, @cnicodeme, @jpic, @ret2libc, @aeros, @rcsanchez97
PRs
  • bpo-34155: Dont parse domains containing @ #13079
  • [3.8] bpo-34155: Dont parse domains containing @ (GH-13079) #14824
  • [3.7] bpo-34155: Dont parse domains containing @ (GH-13079) #14825
  • [3.6] bpo-34155: Dont parse domains containing @ (GH-13079) #14826
  • [3.5] bpo-34155: Dont parse domains containing @ (GH-13079) #15317
  • [2.7] bpo-34155: Dont parse domains containing @ (GH-13079) #16006
  • Files
  • Screen Shot 2019-05-02 at 22.07.27.png
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-09-14.17:43:40.610>
    created_at = <Date 2018-07-19.14:53:43.774>
    labels = ['type-security', '3.7', '3.8', 'expert-email', '3.9']
    title = '[CVE-2019-16056] email.utils.parseaddr mistakenly parse an email'
    updated_at = <Date 2019-09-14.17:43:40.609>
    user = 'https://github.com/cnicodeme'

    bugs.python.org fields:

    activity = <Date 2019-09-14.17:43:40.609>
    actor = 'maxking'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-09-14.17:43:40.610>
    closer = 'maxking'
    components = ['email']
    creation = <Date 2018-07-19.14:53:43.774>
    creator = 'cnicodeme'
    dependencies = []
    files = ['48295']
    hgrepos = []
    issue_num = 34155
    keywords = ['patch', 'security_issue']
    message_count = 52.0
    messages = ['321956', '321957', '321958', '321959', '321967', '329372', '329376', '329377', '329379', '329380', '329382', '329463', '340534', '340535', '340933', '341069', '341294', '341320', '341322', '341362', '341367', '341370', '341381', '344030', '344157', '344205', '344389', '344431', '344432', '347157', '347183', '347223', '348082', '349278', '349279', '349292', '349357', '349464', '349465', '349820', '349891', '349892', '349957', '349968', '350291', '351281', '351283', '351364', '351377', '352230', '352444', '352445']
    nosy_count = 22.0
    nosy_names = ['barry', 'vstinner', 'msapiro', 'larry', 'ned.deily', 'mcepl', 'r.david.murray', 'lukasz.langa', 'nicoe', 'maxking', 'kal.sze', 'miss-islington', 'Windson Yang', 'xtreak', 'cnicodeme', 'bortzmeyer', 'jpic', 'Dain Dwarf', 'rschiron', 'aeros', 'Anselmo Melo', 'rcsanchez97']
    pr_nums = ['13079', '14824', '14825', '14826', '15317', '16006']
    priority = 'critical'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue34155'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9']

    @cnicodeme
    Copy link
    Mannequin Author

    cnicodeme mannequin commented Jul 19, 2018

    Hi!

    I'm trying to parse some emails, and I discovered that email.utils.parseaddr wrongly parse an email.

    Here's the corresponding header:

    From: =?utf-8?Q?zq@redacted.com.cn=E3=82=86=E2=86=91=E3=82=86?=
    =?utf-8?Q?=E3=82=83=E3=82=85=E3=81=87=E3=81=BA=E3=81=BD=E3=81=BC"\=E3?=
    =?utf-8?Q?=81=A9=E3=81=A5=E3=81=A2l=E3=81=A0=E3=81=B0=E3=81=A8=E3=81?=
    =?utf-8?Q?=8FKL=E3=81=84=E3=82=8C=E3=82=8B=E3=82=86>KL=E3=82=89JF?=
    <mxvu@redacted2.com>

    Once this has been parsed via decode_header, we obtain this value:

    From: zq@redacted.com.cnゆ↑ゆゃゅぇぺぽぼ"\どづぢlだばとくKLいれるゆ>KLらJF <mxvu@redacted2.com>

    (I agree, not really a nice looking From email ...)

    Then, when this value is given to parseaddr, here's the result:

    ('', 'zq@redacted.com.cnゆ↑ゆゃゅぇぺぽぼ')

    But it should be:

    ('zq@redacted.com.cnゆ↑ゆゃゅぇぺぽぼ"\どづぢlだばとくKLいれるゆ>KLらJF', 'mxvu@redacted2.com')

    (Note that the email in the "name" part is not the same as the email in the "email" part!)

    @cnicodeme cnicodeme mannequin added type-bug An unexpected behavior, bug, or error topic-email labels Jul 19, 2018
    @bitdancer
    Copy link
    Member

    That does appear to be a bug. Note that the new email API handles it correctly:

        >>> x = """
        ... > From: =?utf-8?Q?zq@redacted.com.cn=E3=82=86=E2=86=91=E3=82=86?=
        ...  =?utf-8?Q?=E3=82=83=E3=82=85=E3=81=87=E3=81=BA=E3=81=BD=E3=81=BC"\=E3?=
        ...  =?utf-8?Q?=81=A9=E3=81=A5=E3=81=A2l=E3=81=A0=E3=81=B0=E3=81=A8=E3=81?=
        ...  =?utf-8?Q?=8FKL=E3=81=84=E3=82=8C=E3=82=8B=E3=82=86>KL=E3=82=89JF?=
        ...  <mxvu@redacted2.com>
        ... """
        >>> from email import message_from_string
        >>> from email.policy import default
        >>> m = message_from_string(x+'\n\ntest', policy=default)
        >>> m['from']
        '"zq@redacted.com.cnゆ↑ゆ ゃゅぇぺぽぼ\\"\\\\� ��づぢlだばと� �KLいれるゆ>KLらJF" <mxvu@redacted2.com>'
        >>> m['from'].addresses[0].addr_spec
        'mxvu@redacted2.com'
        >>> m['from'].addresses[0].display_name
        'zq@redacted.com.cnゆ↑ゆ ゃゅぇぺぽぼ"\\\udce3 \udc81\udca9づぢlだばと\udce3\udc81 \udc8fKLいれるゆ>KLらJF'

    I'm not particularly interested myself in fixing parseaddr to handle this case correctly, since it is the legacy API, but if someone else wants to I'll review the patch.

    @bitdancer bitdancer added 3.7 (EOL) end of life 3.8 (EOL) end of life labels Jul 19, 2018
    @bitdancer
    Copy link
    Member

    Oops, I left out a step in that cut and paste. For completeness:

    >>> x = x[3:]
    

    @bitdancer
    Copy link
    Member

    Ah, maybe it doesn't handle it completely correctly; that decode looks different now that I look at it in detail.

    @jwilk
    Copy link
    Mannequin

    jwilk mannequin commented Jul 19, 2018

    You should not use decode_header() on the whole From header, because that loses
    information. You should parse the header first, then decode the parts that
    could be RFC2047-encoded.

    Quoting <https://tools.ietf.org/html/rfc2047#section-6.2\>:

    NOTE: Decoding and display of encoded-words occurs *after* a
    structured field body is parsed into tokens. It is therefore
    possible to hide 'special' characters in encoded-words which, when
    displayed, will be indistinguishable from 'special' characters in the
    surrounding text. For this and other reasons, it is NOT generally
    possible to translate a message header containing 'encoded-word's to
    an unencoded form which can be parsed by an RFC 822 mail reader.

    So I don't see a bug in parseaddr() here, except that the API is a bit of a
    footgun.

    @msapiro
    Copy link
    Mannequin

    msapiro mannequin commented Nov 6, 2018

    The issue is illustrated much more simply as follows:

    email.utils.parseaddr('John Doe jdoe@example.com <other@example.net>')

    returns

    ('', 'John Doe jdoe@example.com')

    whereas it should return

    ('John Doe jdoe@example.com', 'other@example.net')

    I'll look at developing a patch.

    @bitdancer
    Copy link
    Member

    >> m = message_from_string("From: John Doe jdoe@example.com <other@example.net>\n\n", policy=default)
    >>> m['From'].addresses(Address(display_name='', username='John Doe jdoe', domain='example.com'),)

    The new policies have more error recovery for non-RFC compliant addresses than decode_header, but the two agree in this case. What is happening here is that (1) an unquoted/unencoded '@' is not allowed in a display name (2) if the address is not '<>' quoted, then everything before the @ is the username and (3) in the absence of a comma after the end of the fqdn (which is not allowed to contain blanks) any additional tokens are discarded.

    One could argue that we could treat the blank after the FQDN as a "missing comma", and there would be some merit to that argument. You could also argue that a "<>" quoted string would trump the occurrence of the @ earlier in the token list. However, the RFC822 grammar is designed to be parsed character by character, so that would not be a typical way for an RFC822 parser to try to do postel-style error recovery.

    So, I don't think there is a bug here, but I'd be curious what other email address parsing libraries do, and that could influence whether extensions to the "make a guess when the string doesn't conform to the RFC" code would be acceptable.

    @bitdancer
    Copy link
    Member

    The formatting of that doctest paragraph got messed up. Let me try again:

        >>> m = message_from_string("From: John Doe jdoe@example.com <other@example.net>\n\n", policy=default)
        >>> m['From'].addresses
        (Address(display_name='', username='John Doe jdoe', domain='example.com'),)

    @tirkarthi
    Copy link
    Member

    Is this a case of realname having @ inside an unquoted string? As I can see from the RFC the acceptable characters of an atom other than alphabets and digits that comprises a phrase are ['!', '#', '$', '%', '&', "'", '*', '+', '-', '/', '=', '?', '^', '_', '`', '{', '|', '}', '~'] . So just curious if it's a case of @ inside unquoted string as name?

    >>> for char in accepted:
    ...     print(parseaddr(f'John Doe jdoe{char}example.com <other@example.net>'))
    ...
    ('John Doe jdoe!example.com', 'other@example.net')
    ('John Doe jdoe#example.com', 'other@example.net')
    ('John Doe jdoe$example.com', 'other@example.net')
    ('John Doe jdoe%example.com', 'other@example.net')
    ('John Doe jdoe&example.com', 'other@example.net')
    ("John Doe jdoe'example.com", 'other@example.net')
    ('John Doe jdoe*example.com', 'other@example.net')
    ('John Doe jdoe+example.com', 'other@example.net')
    ('John Doe jdoe-example.com', 'other@example.net')
    ('John Doe jdoe/example.com', 'other@example.net')
    ('John Doe jdoe=example.com', 'other@example.net')
    ('John Doe jdoe?example.com', 'other@example.net')
    ('John Doe jdoe^example.com', 'other@example.net')
    ('John Doe jdoe_example.com', 'other@example.net')
    ('John Doe jdoe`example.com', 'other@example.net')
    ('John Doe jdoe{example.com', 'other@example.net')
    ('John Doe jdoe|example.com', 'other@example.net')
    ('John Doe jdoe}example.com', 'other@example.net')
    ('John Doe jdoe~example.com', 'other@example.net')
    
    >>> parseaddr('"John Doe jdoe@example.com" <other@example.net>')
    ('John Doe jdoe@example.com', 'other@example.net')
    
    >>> parseaddr('John Doe jdoe@example.com <other@example.net>')
    ('', 'John Doe jdoe@example.com')

    @msapiro
    Copy link
    Mannequin

    msapiro mannequin commented Nov 6, 2018

    I agree that my example with an @ in the 'display name', although actually seen in the wild, is non-compliant, and that the behavior of parseaddr() in this case is not a bug.

    Sorry for the noise.

    @tirkarthi
    Copy link
    Member

    Ah sorry, I was typing so long and had an idle session that I didn't realize @r.david.murray added a comment with the explanation. Just to add I tried using Perl module (https://metacpan.org/release/Email-Address) that uses regex for parsing that returns me two addresses and the regex is also not much comprehensible.

    use v5.14;
    use Email::Address;

    my $line = 'John Doe jdoe@example.com <other@example.net>';
    my @addresses = Email::Address->parse($line);
    say $addresses[0];
    say $addresses[1];

    say "Angle address regex";
    say $Email::Address::angle_addr;

    jdoe@example.com
    other@example.net
    Angle address regex
    (?^:(?^:(?^:\s*\((?:\s*(?^:(?^:(?>[^()\\\\]+))|(?^:\\(?^:[^\\x0A\\x0D]))|))\s\)\s*)|\s+)<(?^:(?^:(?^:(?^:(?^:\s\((?:\s*(?^:(?^:(?>[^()\\\\]+))|(?^:\\(?^:[^\\x0A\\x0D]))|))\s\)\s*)|\s+)(?^:[^\\x00-\\x1F\\x7F()\<\>\\[\\]:;@\\\\,."\\s]+(?:\.[^\\x00-\\x1F\\x7F()\<\>\\[\\]:;@\\\\,."\\s]+))(?^:(?^:\s*\((?:\s*(?^:(?^:(?>[^()\\\\]+))|(?^:\\(?^:[^\\x0A\\x0D]))|))\s\)\s*)|\s+))|(?^:(?^:(?^:\s\((?:\s*(?^:(?^:(?>[^()\\\\]+))|(?^:\\(?^:[^\\x0A\\x0D]))|))\s\)\s*)|\s+)"(?^:(?^:[^\\\\"])|(?^:\\(?^:[^\\x0A\\x0D])))"(?^:(?^:\s*\((?:\s*(?^:(?^:(?>[^()\\\\]+))|(?^:\\(?^:[^\\x0A\\x0D]))|))\s\)\s*)|\s+)))\@(?^:(?^:(?^:(?^:\s\((?:\s*(?^:(?^:(?>[^()\\\\]+))|(?^:\\(?^:[^\\x0A\\x0D]))|))\s\)\s*)|\s+)(?^:[^\\x00-\\x1F\\x7F()\<\>\\[\\]:;@\\\\,."\\s]+(?:\.[^\\x00-\\x1F\\x7F()\<\>\\[\\]:;@\\\\,."\\s]+))(?^:(?^:\s*\((?:\s*(?^:(?^:(?>[^()\\\\]+))|(?^:\\(?^:[^\\x0A\\x0D]))|))\s\)\s*)|\s+))|(?^:(?^:(?^:\s\((?:\s*(?^:(?^:(?>[^()\\\\]+))|(?^:\\(?^:[^\\x0A\\x0D]))|))\s\)\s*)|\s+)\(?:\s*(?^:(?^:[^\\[\\]\\\\])|(?^:\\(?^:[^\\x0A\\x0D]))))\s\)))>(?^:(?^:\s*\((?:\s*(?^:(?^:(?>[^()\\\\]+))|(?^:\\(?^:[^\\x0A\\x0D]))|))\s\)\s*)|\s+)*)

    Thanks

    @ksze
    Copy link
    Mannequin

    ksze mannequin commented Nov 8, 2018

    Another failure case:

    >>> from email.utils import parseaddr
    >>> parseaddr('fo@o@bar.com')
    ('', 'fo@o')

    If I understand the RFC correctly, the correct results should be ('', '') because there are two '@' signs. The first '@' would need to be quoted for the address to be valid.

    @bortzmeyer
    Copy link
    Mannequin

    bortzmeyer mannequin commented Apr 19, 2019

    @tirkarthi
    Copy link
    Member

    Relevant attack from matrix blog post.

    https://matrix.org/blog/2019/04/18/security-update-sydent-1-0-2/

    sydent uses python's email.utils.parseaddr function to parse the input email address before sending validation mail to it, but it turns out that if you hand parseaddr an malformed email address of form a@b.com@c.com, it silently discards the @c.com prefix without error. The result of this is that if one requested a validation token for 'a@malicious.org@important.com', the token would be sent to 'a@malicious.org', but the address 'a@malicious.org@important.com' would be marked as validated. This release fixes this behaviour by asserting that the parsed email address is the same as the input email address.

    I am marking this as a security issue.

    @jpic
    Copy link
    Mannequin

    jpic mannequin commented Apr 26, 2019

    Given the situation, could raising a SecurityWarning and a DeprecationWarning fix this issue ?

    @DainDwarf
    Copy link
    Mannequin

    DainDwarf mannequin commented Apr 29, 2019

    Hello, kind of new here.

    I just wanted to note that the issue that lead to Tchap's security attack still exists in the non-deprecated message_from_string function:

    email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses

    (Address(display_name='', username='a', domain='malicious.org'),)

    So, deprecating parseaddr is not enough for security purpose, unless there is another ticket for the new email API.

    @ned-deily
    Copy link
    Member

    @barry, @r.david.murray, With the additional info about attacks in the wild, should we now consider this a security issue? If so, someone needs to provide an actual PR. (Raising the priority to "deferred blocker" pending evaluation.)

    @Windsooon
    Copy link
    Mannequin

    Windsooon mannequin commented May 3, 2019

    I found the issue located in https://github.com/python/cpython/blob/master/Lib/email/_parseaddr.py#L277

    elif self.field[self.pos] in '.@':
    # email address is just an addrspec
    # this isn't very efficient since we start over
    self.pos = oldpos
    self.commentlist = oldcl
    addrspec = self.getaddrspec()
    returnlist = [(SPACE.join(self.commentlist), addrspec)]

    The parseaddr function runs a for in loop over the input string, when it meets '.@' it will do something. That is why when the input string is 'foo@bar.com@example.com' will return ('', 'foo@bar.com'). One possible solution will be to check the string in the reverse order then we can always get the last '@' in the string.

    @warsaw
    Copy link
    Member

    warsaw commented May 3, 2019

    Well, at least we're not alone. Here's a screen capture from Mail.app Version 12.4 (3445.104.8).

    @jpic
    Copy link
    Mannequin

    jpic mannequin commented May 3, 2019

    I haven't found this specific case in an RFC, but checked Go's net/mail
    library behavior and it just considers it broken:

    $ cat mail.go
    package main
    import "fmt"
    import "net/mail"
    func main() {
        fmt.Println((&mail.AddressParser{}).Parse("a@example.com"))
        fmt.Println((&mail.AddressParser{}).Parse("a@malicious.org@example.com
    "))
    }
    
    $ go run mail.go
    <a@example.com> <nil>
    <nil> mail: expected single address, got "@example.com"

    That would fix the security issue but not the whole ticket.

    @jpic
    Copy link
    Mannequin

    jpic mannequin commented May 4, 2019

    The pull request has been updated to mimic net/mail's behavior rather than
    trying to workaround user input.

    Before:

        >>> email.message_from_string('From: a@malicious.org@important.com',
    policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='a', domain='malicious.org'),)
    
        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@malicious.org')

    After:

        >>> email.message_from_string('From: a@malicious.org@important.com',
    policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='', domain=''),)
    
        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@')

    I like what I saw under the hood, please feel free to hack me for other
    tasks in the email stdlib.

    @Windsooon
    Copy link
    Mannequin

    Windsooon mannequin commented May 4, 2019

    Frome the answer from Alnitak (https://stackoverflow.com/questions/12355858/how-many-symbol-can-be-in-an-email-address). Maybe we should raise an error when the address has multiple @ in it.

    @jpic
    Copy link
    Mannequin

    jpic mannequin commented May 4, 2019

    At is allowed in the local part if quoted, the proposed patch acts within
    get_domain() and getdomain() and does not affect local part parsing. This
    still works:

    >>> parseaddr('"fo@bar"@bar.com')
    ('', '"fo@bar"@bar.com')
    
    >>> email.message_from_string('From: "a@b"@ex.com
    ',policy=email.policy.default)['from'].addresses
    (Address(display_name='', username='a@b', domain='ex.com'),)

    I'm not against raising an exception in parseaddr, but you should know that
    currently nothing in parseaddr seems to raise an exception:

    jpic@ci:/cpython$ grep raise Lib/email/_parseaddr.py
    jpic@ci:
    /cpython$

    For example:

    >>> parseaddr('aoeu')
    ('', 'aoeu')
    >>> parseaddr('a@')
    ('', 'a@')

    None of the above calls raised an exception. That is the reason why I did
    not introduce a new Exception in the getdomain() change: I thought it would
    be more consistent with the rest of the API as such.

    As for the new API, the patch does raise a parse error:

    # this detect that the next caracter right after the parsed domain is
    another @
    if value and value[0] == '@':
    raise errors.HeaderParseError('Multiple domains')

    But that's in the lower level API that is planned for going public later on
    (probably when it will have unit tests), it's just the higher level API
    that the user faces that swallows it. As a user you can still know about
    that parse problem using the defects attribute:

    >>> email.message_from_string('From: a@malicious.org@example.com',
    policy=email.policy.default)['from'].defects[0]
    InvalidHeaderDefect('invalid address in address-list')

    @maxking
    Copy link
    Contributor

    maxking commented May 31, 2019

    How about we go a slightly different route than suggested by jpic and instead of returning a None value, we return the entire rest of the string as the domain? That would take care of the security issue since it won't be a valid domain anymore.

         msg = email.message_from_string(
            'From: SomeAbhilashRaj <abhilash@malicious.org@important.com>',    
            policy=email.policy.default)
         print(msg['From'].addresses)
         print(msg['From'].defects)
     (Address(display_name='SomeAbhilashRaj', username='abhilash', domain='malicious.org@important.com>'),)
     (InvalidHeaderDefect('invalid address in address-list'), InvalidHeaderDefect("missing trailing '>' on angle-addr"),  InvalidHeaderDefect("unpected '@' in domain"), ObsoleteHeaderDefect("period in 'phrase'"))
    

    This lets us do postel-style error recovery while working in RFC 2822 style grammar.

    I wrote this patch to achieve this:

    @@ -1573,6 +1574,11 @@ def get_domain(value):
                 domain.append(DOT)
                 token, value = get_atom(value[1:])
                 domain.append(token)
    +    if value and value[0] == '@':
    +        domain.defects.append(errors.InvalidHeaderDefect(
    +            "unpected '@' in domain"))
    +        token = get_unstructured(value)
    +        domain.append(token)
         return domain, value

    Does this makes sense?

    @jpic
    Copy link
    Mannequin

    jpic mannequin commented Jun 1, 2019

    The email API does error recovery without loading invalid domains into
    the domain variable which could lead to dangerous situations, example
    with "a@foo.":

    >>> email.message_from_string('From: a@foo.',policy=email.policy.default)['from'].addresses[0].domain
    ''

    In perspective with the new patch proposed by maxking that lets an
    invalid domain make it to the domain variable:

    >>> email.message_from_string('From: a@b@c.com',policy=email.policy.default)['from'].addresses[0].domain
    'b@c.com'

    For me maxking's suggestion opens the question of where to draw the
    line between invalid domains should be loaded into the domain variable
    and what invalid domains should not be loaded into the domain
    variable.

    Another smaller advantage of of Go's net/mail behaviour is that
    results between parseaddr and email are consistently empty strings for
    an invalid domain: parseaddr does not seem able to return a list of
    defects.

    @miss-islington
    Copy link
    Contributor

    New changeset c48d606 by Miss Islington (bot) in branch '3.7':
    bpo-34155: Dont parse domains containing @ (GH-13079)
    c48d606

    @miss-islington
    Copy link
    Contributor

    New changeset 2170774 by Miss Islington (bot) in branch '3.8':
    bpo-34155: Dont parse domains containing @ (GH-13079)
    2170774

    @ned-deily
    Copy link
    Member

    New changeset 13a1913 by Ned Deily (Miss Islington (bot)) in branch '3.6':
    bpo-34155: Dont parse domains containing @ (GH-13079) (GH-14826)
    13a1913

    @maxking
    Copy link
    Contributor

    maxking commented Aug 10, 2019

    Closing this since teh PRs are merged.

    @maxking maxking closed this as completed Aug 10, 2019
    @vstinner
    Copy link
    Member

    I change the issue type to security because of https://bugs.python.org/issue34155#msg340534: "Note that this bug was used in an actual security attack so it is serious".

    @vstinner vstinner added type-security A security issue and removed type-bug An unexpected behavior, bug, or error labels Aug 12, 2019
    @vstinner
    Copy link
    Member

    This issue is a security issue so Python 2.7, 3.5, 3.6 should also be fixed, no?

    @vstinner vstinner reopened this Aug 12, 2019
    @maxking
    Copy link
    Contributor

    maxking commented Aug 15, 2019

    @victor: This is already backported to 3.6. I am not sure about what gets backported to 3.5 right now, I don't even see a 'Backport to 3.5' label on Github (which made me think we are discouraged to backport to 3.5). I can work on a manual backport if needed?

    This patch most probably won't backport to 2.7 without re-writing it completely since the implementation in 2.7 is much different than what we have today.

    @aeros
    Copy link
    Contributor

    aeros commented Aug 17, 2019

    This is already backported to 3.6. I am not sure about what gets backported to 3.5 right now, I don't even see a 'Backport to 3.5' label on Github (which made me think we are discouraged to backport to 3.5). I can work on a manual backport if needed?

    As far as I'm aware, backports to 3.5 have to be manually approved by those with repository management permissions, such the the organization owners (https://devguide.python.org/devcycle/#current-owners) and admins (https://devguide.python.org/devcycle/#current-administrators)

    @maxking
    Copy link
    Contributor

    maxking commented Aug 17, 2019

    Created a backport PR for 3.5.

    @vstinner
    Copy link
    Member

    Created a backport PR for 3.5.

    Thanks. I reviewed it (LGTM).

    What about Python 2.7, it's also vulnerable, no?

    @maxking
    Copy link
    Contributor

    maxking commented Aug 19, 2019

    2.7 needs a separate PR since the code is very different and my familiarity with 2.7 version of email package is very limited.

    I am going to work on a separate patch later this week for 2.7.

    @ambv
    Copy link
    Contributor

    ambv commented Aug 23, 2019

    Downgraded the severity since 3.6 - 3.9 are merged.

    @larryhastings
    Copy link
    Contributor

    New changeset 063eba2 by larryhastings (Abhilash Raj) in branch '3.5':
    [3.5] bpo-34155: Dont parse domains containing @ (GH-13079) (bpo-15317)
    063eba2

    @larryhastings
    Copy link
    Contributor

    All PRs merged. Thanks, everybody!

    @ret2libc
    Copy link
    Mannequin

    ret2libc mannequin commented Sep 9, 2019

    CVE-2019-16056 has been assigned to this issue.
    See https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-16056 .

    @vstinner
    Copy link
    Member

    vstinner commented Sep 9, 2019

    I reopen the issue since Python 2.7 is still vulnerable.

    @vstinner vstinner reopened this Sep 9, 2019
    @vstinner vstinner changed the title email.utils.parseaddr mistakenly parse an email [CVE-2019-16056] email.utils.parseaddr mistakenly parse an email Sep 9, 2019
    @rcsanchez97
    Copy link
    Mannequin

    rcsanchez97 mannequin commented Sep 12, 2019

    I am working on Debian LTS support. I have submitted a PR that contains the necessary adjustments to implement the fix in 2.7.

    @miss-islington
    Copy link
    Contributor

    New changeset 4cbcd2f by Miss Islington (bot) (Roberto C. Sánchez) in branch '2.7':
    [2.7] bpo-34155: Dont parse domains containing @ (GH-13079) (GH-16006)
    4cbcd2f

    @maxking
    Copy link
    Contributor

    maxking commented Sep 14, 2019

    Merged in 2.7, closing this one finally!

    Thanks to everyone who helped with this :)

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes topic-email type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    10 participants