-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
email.Message.as_string infinite loop #81945
Comments
The following will hang the system until it runs out of memory. import email
import email.policy
text = """From: user@host.com
To: user@host.com
Bad-Header:
=?us-ascii?Q?LCSwrV11+IB0rSbSker+M9vWR7wEDSuGqmHD89Gt=ea0nJFSaiz4vX3XMJPT4vrE?=
=?us-ascii?Q?xGUZeOnp0o22pLBB7CYLH74Js=wOlK6Tfru2U47qR?=
=?us-ascii?Q?72OfyEY2p2=2FrA9xNFyvH+fBTCmazxwzF8nGkK6D?= Hello! eml = email.message_from_string(text, policy=email.policy.SMTPUTF8)
eml.as_string() |
I can't reproduce this on 3.9: https://github.com/epicfaace/cpython/runs/187997615 |
I also can't reproduce this on 3.7: https://github.com/epicfaace/cpython/runs/188005822 |
Reproduced on 3.7.4 Looks like this started happening after this commit: dc20fc4#diff-19171ae20182f6759204a3436475ddd1 |
I looked at the job at https://travis-ci.com/epicfaace/cpython/jobs/223345147 and its running py3.6. |
This does look like a side-effect of the commit mentioned by mytran. The issues seems to be that email._header_value_parser.get_unstructured wrongfully assumes that anything leading with '=?' would be a valid rfc 2047 encoded word. This is a smaller test case to reproduce this bug: from email._header_value_parser import get_unstructured
get_unstructured('=?utf-8?q?FSaiz4vX3XMJPT4vrExGUZeOnp0o22pLBB7CYLH74Js=3DwOlK6Tfru2U47qR?=72OfyEY2p2/rA9xNFyvH+fBTCmazxwzF8nGkK6D') |
Adding security label since this can cause DOS. |
Oh, both the Travis links I sent actually ended up reproducing the bug. I've made a PR that fixes with an even smaller test case: get_unstructured('=?utf-8?q?somevalue?=aa') It looks like this is caused because "aa" is thought to be an encoded word escape in cpython/Lib/email/_header_value_parser.py Line 1042 in fd5a82a
My PR makes the parser parse "=?utf-8?q?somevalue?=aa" as "=?utf-8?q?somevalue?=aa". However, the existing test cases make sure it parses "=?utf-8?q?somevalue?=nowhitespace" as "somevaluenowhitespace". I'm not too familiar with RFC 2047, but why are "aa" and "nowhitespace" treated differently? Should they be? |
You have correctly identified that "=aa" is detected as a encoded word and causes the get_encoded_word to fail. However, "=?utf-8?q?somevalue?=aa" should ideally get parsed as "somevalueaa" and not "=?utf-8?q?somevalue?=aa". This is because "=?utf-8?q?somevalue?=" is a valid encoded word, it is just not followed by an empty whitespace. modified Lib/email/_header_value_parser.py
@@ -1037,7 +1037,10 @@ def get_encoded_word(value):
raise errors.HeaderParseError(
"expected encoded word but found {}".format(value))
remstr = ''.join(remainder)
- if len(remstr) > 1 and remstr[0] in hexdigits and remstr[1] in hexdigits:
+ if (len(remstr) > 1 and
+ remstr[0] in hexdigits and
+ remstr[1] in hexdigits and
+ tok.count('?') < 2):
# The ? after the CTE was followed by an encoded word escape (=XX).
rest, *remainder = remstr.split('?=', 1) This can be avoided by checking The 2nd bug, which needs a better test case, is that if the encoded_word is invalid, you will keep running into infinite loop, which you correctly fixed in your PR. However, the test case you used is more appropriate for the first issue. You can fix both the issues, for which, you need to add a test case for 2nd issue and fix for the first issue. Looking into the PR now. |
I meant, =aa is identified as encoded word escape |
Although, the 2nd bug I spoke of is kind of speculative, I haven't been able to find a test case which matches rfc2047_matcher but raises exception with get_encoded_word (after, ofcourse, the first bug is fixed), which the only way to cause an infinite loop. |
Thanks, I've fixed the first case as you suggested. I found an example of the 2nd case -- '=?utf-8?q?=somevalue?=' -- which causes the method to hang. I've added a fix, though I'm not sure if it treats the string properly -- it parses it as '=?utf-8?q?=somevalue?=' and doesn't raise any defects. Is that the behavior we would want? |
Should we get a CVE for this because this is a security issue? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: