-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CVE-2019-16056] email.utils.parseaddr mistakenly parse an email #78336
Comments
Hi! I'm trying to parse some emails, and I discovered that email.utils.parseaddr wrongly parse an email. Here's the corresponding header:
Once this has been parsed via
(I agree, not really a nice looking From email ...) Then, when this value is given to parseaddr, here's the result:
But it should be:
(Note that the email in the "name" part is not the same as the email in the "email" part!) |
That does appear to be a bug. Note that the new email API handles it correctly: >>> x = """
... > From: =?utf-8?Q?zq@redacted.com.cn=E3=82=86=E2=86=91=E3=82=86?=
... =?utf-8?Q?=E3=82=83=E3=82=85=E3=81=87=E3=81=BA=E3=81=BD=E3=81=BC"\=E3?=
... =?utf-8?Q?=81=A9=E3=81=A5=E3=81=A2l=E3=81=A0=E3=81=B0=E3=81=A8=E3=81?=
... =?utf-8?Q?=8FKL=E3=81=84=E3=82=8C=E3=82=8B=E3=82=86>KL=E3=82=89JF?=
... <mxvu@redacted2.com>
... """
>>> from email import message_from_string
>>> from email.policy import default
>>> m = message_from_string(x+'\n\ntest', policy=default)
>>> m['from']
'"zq@redacted.com.cnゆ↑ゆ ゃゅぇぺぽぼ\\"\\\\� ��づぢlだばと� �KLいれるゆ>KLらJF" <mxvu@redacted2.com>'
>>> m['from'].addresses[0].addr_spec
'mxvu@redacted2.com'
>>> m['from'].addresses[0].display_name
'zq@redacted.com.cnゆ↑ゆ ゃゅぇぺぽぼ"\\\udce3 \udc81\udca9づぢlだばと\udce3\udc81 \udc8fKLいれるゆ>KLらJF' I'm not particularly interested myself in fixing parseaddr to handle this case correctly, since it is the legacy API, but if someone else wants to I'll review the patch. |
Oops, I left out a step in that cut and paste. For completeness:
|
Ah, maybe it doesn't handle it completely correctly; that decode looks different now that I look at it in detail. |
You should not use decode_header() on the whole From header, because that loses Quoting <https://tools.ietf.org/html/rfc2047#section-6.2\>:
So I don't see a bug in parseaddr() here, except that the API is a bit of a |
The issue is illustrated much more simply as follows: email.utils.parseaddr('John Doe jdoe@example.com <other@example.net>') returns ('', 'John Doe jdoe@example.com') whereas it should return ('John Doe jdoe@example.com', 'other@example.net') I'll look at developing a patch. |
The new policies have more error recovery for non-RFC compliant addresses than decode_header, but the two agree in this case. What is happening here is that (1) an unquoted/unencoded '@' is not allowed in a display name (2) if the address is not '<>' quoted, then everything before the @ is the username and (3) in the absence of a comma after the end of the fqdn (which is not allowed to contain blanks) any additional tokens are discarded. One could argue that we could treat the blank after the FQDN as a "missing comma", and there would be some merit to that argument. You could also argue that a "<>" quoted string would trump the occurrence of the @ earlier in the token list. However, the RFC822 grammar is designed to be parsed character by character, so that would not be a typical way for an RFC822 parser to try to do postel-style error recovery. So, I don't think there is a bug here, but I'd be curious what other email address parsing libraries do, and that could influence whether extensions to the "make a guess when the string doesn't conform to the RFC" code would be acceptable. |
The formatting of that doctest paragraph got messed up. Let me try again: >>> m = message_from_string("From: John Doe jdoe@example.com <other@example.net>\n\n", policy=default)
>>> m['From'].addresses
(Address(display_name='', username='John Doe jdoe', domain='example.com'),) |
Is this a case of realname having @ inside an unquoted string? As I can see from the RFC the acceptable characters of an atom other than alphabets and digits that comprises a phrase are ['!', '#', '$', '%', '&', "'", '*', '+', '-', '/', '=', '?', '^', '_', '`', '{', '|', '}', '~'] . So just curious if it's a case of @ inside unquoted string as name? >>> for char in accepted:
... print(parseaddr(f'John Doe jdoe{char}example.com <other@example.net>'))
...
('John Doe jdoe!example.com', 'other@example.net')
('John Doe jdoe#example.com', 'other@example.net')
('John Doe jdoe$example.com', 'other@example.net')
('John Doe jdoe%example.com', 'other@example.net')
('John Doe jdoe&example.com', 'other@example.net')
("John Doe jdoe'example.com", 'other@example.net')
('John Doe jdoe*example.com', 'other@example.net')
('John Doe jdoe+example.com', 'other@example.net')
('John Doe jdoe-example.com', 'other@example.net')
('John Doe jdoe/example.com', 'other@example.net')
('John Doe jdoe=example.com', 'other@example.net')
('John Doe jdoe?example.com', 'other@example.net')
('John Doe jdoe^example.com', 'other@example.net')
('John Doe jdoe_example.com', 'other@example.net')
('John Doe jdoe`example.com', 'other@example.net')
('John Doe jdoe{example.com', 'other@example.net')
('John Doe jdoe|example.com', 'other@example.net')
('John Doe jdoe}example.com', 'other@example.net')
('John Doe jdoe~example.com', 'other@example.net')
>>> parseaddr('"John Doe jdoe@example.com" <other@example.net>')
('John Doe jdoe@example.com', 'other@example.net')
>>> parseaddr('John Doe jdoe@example.com <other@example.net>')
('', 'John Doe jdoe@example.com') |
I agree that my example with an @ in the 'display name', although actually seen in the wild, is non-compliant, and that the behavior of parseaddr() in this case is not a bug. Sorry for the noise. |
Ah sorry, I was typing so long and had an idle session that I didn't realize @r.david.murray added a comment with the explanation. Just to add I tried using Perl module (https://metacpan.org/release/Email-Address) that uses regex for parsing that returns me two addresses and the regex is also not much comprehensible. use v5.14; my $line = 'John Doe jdoe@example.com <other@example.net>'; say "Angle address regex"; jdoe@example.com Thanks |
Another failure case: >>> from email.utils import parseaddr
>>> parseaddr('fo@o@bar.com')
('', 'fo@o') If I understand the RFC correctly, the correct results should be ('', '') because there are two '@' signs. The first '@' would need to be quoted for the address to be valid. |
Note that this bug was used in an actual security attack so it is serious https://medium.com/@fs0c131y/tchap-the-super-not-secure-app-of-the-french-government-84b31517d144 |
Relevant attack from matrix blog post. https://matrix.org/blog/2019/04/18/security-update-sydent-1-0-2/
I am marking this as a security issue. |
Given the situation, could raising a SecurityWarning and a DeprecationWarning fix this issue ? |
Hello, kind of new here. I just wanted to note that the issue that lead to Tchap's security attack still exists in the non-deprecated message_from_string function: email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) So, deprecating parseaddr is not enough for security purpose, unless there is another ticket for the new email API. |
@barry, @r.david.murray, With the additional info about attacks in the wild, should we now consider this a security issue? If so, someone needs to provide an actual PR. (Raising the priority to "deferred blocker" pending evaluation.) |
I found the issue located in https://github.com/python/cpython/blob/master/Lib/email/_parseaddr.py#L277 elif self.field[self.pos] in '.@': The parseaddr function runs a for in loop over the input string, when it meets '.@' it will do something. That is why when the input string is 'foo@bar.com@example.com' will return ('', 'foo@bar.com'). One possible solution will be to check the string in the reverse order then we can always get the last '@' in the string. |
Well, at least we're not alone. Here's a screen capture from Mail.app Version 12.4 (3445.104.8). |
I haven't found this specific case in an RFC, but checked Go's net/mail $ cat mail.go
package main
import "fmt"
import "net/mail"
func main() {
fmt.Println((&mail.AddressParser{}).Parse("a@example.com"))
fmt.Println((&mail.AddressParser{}).Parse("a@malicious.org@example.com
"))
}
$ go run mail.go
<a@example.com> <nil>
<nil> mail: expected single address, got "@example.com" That would fix the security issue but not the whole ticket. |
The pull request has been updated to mimic net/mail's behavior rather than Before: >>> email.message_from_string('From: a@malicious.org@important.com',
policy=email.policy.default)['from'].addresses
(Address(display_name='', username='a', domain='malicious.org'),)
>>> parseaddr('a@malicious.org@important.com')
('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com',
policy=email.policy.default)['from'].addresses
(Address(display_name='', username='', domain=''),)
>>> parseaddr('a@malicious.org@important.com')
('', 'a@') I like what I saw under the hood, please feel free to hack me for other |
Frome the answer from Alnitak (https://stackoverflow.com/questions/12355858/how-many-symbol-can-be-in-an-email-address). Maybe we should raise an error when the address has multiple @ in it. |
At is allowed in the local part if quoted, the proposed patch acts within >>> parseaddr('"fo@bar"@bar.com')
('', '"fo@bar"@bar.com')
>>> email.message_from_string('From: "a@b"@ex.com
',policy=email.policy.default)['from'].addresses
(Address(display_name='', username='a@b', domain='ex.com'),) I'm not against raising an exception in parseaddr, but you should know that jpic@ci: For example: >>> parseaddr('aoeu')
('', 'aoeu')
>>> parseaddr('a@')
('', 'a@') None of the above calls raised an exception. That is the reason why I did As for the new API, the patch does raise a parse error: # this detect that the next caracter right after the parsed domain is But that's in the lower level API that is planned for going public later on >>> email.message_from_string('From: a@malicious.org@example.com',
policy=email.policy.default)['from'].defects[0]
InvalidHeaderDefect('invalid address in address-list') |
How about we go a slightly different route than suggested by jpic and instead of returning a None value, we return the entire rest of the string as the domain? That would take care of the security issue since it won't be a valid domain anymore. msg = email.message_from_string(
'From: SomeAbhilashRaj <abhilash@malicious.org@important.com>',
policy=email.policy.default)
print(msg['From'].addresses)
print(msg['From'].defects)
This lets us do postel-style error recovery while working in RFC 2822 style grammar. I wrote this patch to achieve this: @@ -1573,6 +1574,11 @@ def get_domain(value):
domain.append(DOT)
token, value = get_atom(value[1:])
domain.append(token)
+ if value and value[0] == '@':
+ domain.defects.append(errors.InvalidHeaderDefect(
+ "unpected '@' in domain"))
+ token = get_unstructured(value)
+ domain.append(token)
return domain, value Does this makes sense? |
The email API does error recovery without loading invalid domains into >>> email.message_from_string('From: a@foo.',policy=email.policy.default)['from'].addresses[0].domain
'' In perspective with the new patch proposed by maxking that lets an >>> email.message_from_string('From: a@b@c.com',policy=email.policy.default)['from'].addresses[0].domain
'b@c.com' For me maxking's suggestion opens the question of where to draw the Another smaller advantage of of Go's net/mail behaviour is that |
Closing this since teh PRs are merged. |
I change the issue type to security because of https://bugs.python.org/issue34155#msg340534: "Note that this bug was used in an actual security attack so it is serious". |
This issue is a security issue so Python 2.7, 3.5, 3.6 should also be fixed, no? |
@victor: This is already backported to 3.6. I am not sure about what gets backported to 3.5 right now, I don't even see a 'Backport to 3.5' label on Github (which made me think we are discouraged to backport to 3.5). I can work on a manual backport if needed? This patch most probably won't backport to 2.7 without re-writing it completely since the implementation in 2.7 is much different than what we have today. |
As far as I'm aware, backports to 3.5 have to be manually approved by those with repository management permissions, such the the organization owners (https://devguide.python.org/devcycle/#current-owners) and admins (https://devguide.python.org/devcycle/#current-administrators) |
Created a backport PR for 3.5. |
Thanks. I reviewed it (LGTM). What about Python 2.7, it's also vulnerable, no? |
2.7 needs a separate PR since the code is very different and my familiarity with 2.7 version of email package is very limited. I am going to work on a separate patch later this week for 2.7. |
Downgraded the severity since 3.6 - 3.9 are merged. |
All PRs merged. Thanks, everybody! |
CVE-2019-16056 has been assigned to this issue. |
I reopen the issue since Python 2.7 is still vulnerable. |
I am working on Debian LTS support. I have submitted a PR that contains the necessary adjustments to implement the fix in 2.7. |
Merged in 2.7, closing this one finally! Thanks to everyone who helped with this :) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: