gh-83938, gh-122476: Stop incorrectly RFC 2047 encoding non-ASCII email addresses#122540
gh-83938, gh-122476: Stop incorrectly RFC 2047 encoding non-ASCII email addresses#122540bitdancer merged 20 commits intopython:mainfrom
Conversation
d1f0bdc to
2e0696c
Compare
|
This is based on #81074 (comment):
Checking An alternative would be to introduce a new Also, I think |
2e0696c to
cbedf5d
Compare
Email generators had been incorrectly flattening non-ASCII email addresses to RFC 2047 encoded-word format, leaving them undeliverable. (RFC 2047 prohibits use of encoded-word in an addr-spec.) This change raises a ValueError when attempting to flatten an EmailMessage with a non-ASCII addr-spec and a policy with utf8=False. (Exception: If the non-ASCII address originated from parsing a message, it will be flattened as originally parsed, without error.) Non-ASCII email addresses are supported when using a policy with utf8=True (such as email.policy.SMTPUTF8) under RFCs 6531 and 6532. Non-ASCII email address domains (but not localparts) can also be used with non-SMTPUTF8 policies by encoding the domain as an IDNA A-label. (The email package does not perform this encoding, because it cannot know whether the caller wants IDNA 2003, IDNA 2008, or some other variant such as UTS python#46.)
cbedf5d to
faa4006
Compare
bitdancer
left a comment
There was a problem hiding this comment.
Your analysis is solid and the fix looks great. We'll need a follow on PR to have smtplib handle the new error, but that should be a trivial PR.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be poked with soft cushions! |
- Incorporate PR review feedback - Improve docs
5d60c1c to
bd6845d
Compare
|
I have made the requested changes; please review again |
|
Thanks for making the requested changes! @bitdancer: please review the changes made to this pull request. |
bitdancer
left a comment
There was a problem hiding this comment.
Sorry it took me so long to get back to this.
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
|
Looking at the code more closely, it is supposed to already be handing this case. The fact that it isn't indicates there's a bug there, and there were certainly missing tests. I will poke at this. |
The mime parameter folder doesn't make use of the encoding check done be the code that is now below it, it does its own. So it makes more sense to take that branch first. This will simplify subsequent changes.
This is a more complete fix, covering any syntax part where encoded words are not permitted, and the doc changes are adjusted accordingly. There is also no need for a new exception, since HeaderWriteError already exists. The fix itself is to use a separate code loop to fold parts that may not have encoded words, guaranteeing that we do not do incorrect encoding. This opens a door to simplifying the main folding loop, but that is a much bigger refactoring job better left for another time.
Behavior when folding in parts versus rendering on one line takes different code paths, so make sure both work.
|
I've crafted a more complete fix for the bug. If you could review it, @medmunds, that would be great. Since I switched to using the existing HeaderWriteError, I'm wondering if we need the versionchanged at all at this point, if anyone has input on that. |
This comment was marked as outdated.
This comment was marked as outdated.
Fix typo; bump versionchanged.
Documentation build overview
65 files changed ·
|
medmunds
left a comment
There was a problem hiding this comment.
@bitdancer this LGTM. I pushed a couple of fixes to the docs.
I'd be inclined to keep the versionchanged note. (Unless there's some style guide policy against that for bug fixes.) There seems to be a popular misconception—in many email libraries and systems—that any MIME header with non-ASCII chars can just be run through rfc2047. The versionchanged note makes it clear dropping rfc2047 here was a deliberate fix in Python's email package, and might help discourage future requests to add it back in.
|
Since this is a bug fix that isn't going to get backported because of backward compatibility concerns, we'll keep the versionchanged. We can always delete it if someone objects. |
|
It occurs to me that if it is worth a versionchanged there should also be a What's New entry. Do you want to write one? |
Something like this? [edited] Index: Doc/whatsnew/3.15.rst
<+>UTF-8
===================================================================
diff --git a/Doc/whatsnew/3.15.rst b/Doc/whatsnew/3.15.rst
--- a/Doc/whatsnew/3.15.rst (revision 71636d347e37db41a11a21419137fd7eb3423f74)
+++ b/Doc/whatsnew/3.15.rst (date 1777572899407)
@@ -914,6 +914,16 @@
(Contributed by Eric Froemling in :gh:`149085`.)
+email
+-----
+
+* Email generators now raise an error when an :class:`.EmailMessage` cannot be
+ accurately flattened due to a non-ASCII email address (mailbox) in an address
+ header. :attr:`.EmailPolicy.utf8` offers options for supporting Email Address
+ Internationalization (EAI).
+ (Contributed by R David Murray and Mike Edmunds in :gh:`122540`.)
+
+
functools
---------
[Edited: trying to capture the attention of users who might be affected and direct them to the docs that have complete details.] |
|
Yeah, that looks good. Though I'd say "The documentation for...offers" just to make it clear that it isn't the policy option itself that provides the support. |
|
Although I guess technically it does...maybe your wording is correct. |
…122477) The email.headerregistry.Address constructor raised an error if addr_spec contained a non-ASCII character. (But it fully supports non-ASCII in the separate username and domain args.) This change removes the error for a non-ASCII addr_spec, as well as the Defect that triggered it. In the unicode era non-ascii is not a defect, though it is an error when an attempt is made to serialize it to ascii. The serialization issue was handled in #122540.
This PR fixes gh-83938 and fixes gh-122476, which have the same underlying issue.
Email generators had been incorrectly flattening non-ASCII email addresses to RFC 2047 encoded-word format, leaving them undeliverable. (RFC 2047 prohibits use of encoded-word in an addr-spec.) This change raises a ValueError when attempting to flatten an EmailMessage with a non-ASCII addr-spec and a policy with
utf8=False. (Exception: If the non-ASCII address originated from parsing a message, it will be flattened as originally parsed, without error.)Non-ASCII email addresses are supported when using a policy with
utf8=True(such as email.policy.SMTPUTF8) under RFCs 6531 and 6532.Non-ASCII email address domains (but not localparts) can also be used with non-SMTPUTF8 policies by encoding the domain as an IDNA A-label. (The email package does not perform this encoding, because it cannot know whether the caller wants IDNA 2003, IDNA 2008, or some other variant such as UTS-46.)
📚 Documentation preview 📚: https://cpython-previews--122540.org.readthedocs.build/