Skip to content

gh-83938, gh-122476: Stop incorrectly RFC 2047 encoding non-ASCII email addresses#122540

Merged
bitdancer merged 20 commits intopython:mainfrom
medmunds:fix-issues-83938-122476
May 1, 2026
Merged

gh-83938, gh-122476: Stop incorrectly RFC 2047 encoding non-ASCII email addresses#122540
bitdancer merged 20 commits intopython:mainfrom
medmunds:fix-issues-83938-122476

Conversation

@medmunds
Copy link
Copy Markdown
Contributor

@medmunds medmunds commented Aug 1, 2024

This PR fixes gh-83938 and fixes gh-122476, which have the same underlying issue.

Email generators had been incorrectly flattening non-ASCII email addresses to RFC 2047 encoded-word format, leaving them undeliverable. (RFC 2047 prohibits use of encoded-word in an addr-spec.) This change raises a ValueError when attempting to flatten an EmailMessage with a non-ASCII addr-spec and a policy with utf8=False. (Exception: If the non-ASCII address originated from parsing a message, it will be flattened as originally parsed, without error.)

Non-ASCII email addresses are supported when using a policy with utf8=True (such as email.policy.SMTPUTF8) under RFCs 6531 and 6532.

Non-ASCII email address domains (but not localparts) can also be used with non-SMTPUTF8 policies by encoding the domain as an IDNA A-label. (The email package does not perform this encoding, because it cannot know whether the caller wants IDNA 2003, IDNA 2008, or some other variant such as UTS-46.)


📚 Documentation preview 📚: https://cpython-previews--122540.org.readthedocs.build/

@medmunds medmunds requested a review from a team as a code owner August 1, 2024 00:35
@medmunds medmunds force-pushed the fix-issues-83938-122476 branch 2 times, most recently from d1f0bdc to 2e0696c Compare August 1, 2024 00:43
@medmunds
Copy link
Copy Markdown
Contributor Author

medmunds commented Aug 1, 2024

This is based on #81074 (comment):

we should probably be raising an error if the rendering policy does not have utf8=True and we don't have an "original source line" from parsing a message (which is the case here), rather than using the incorrect RFC2047 encoding.

Checking part.token_type == 'addr-spec' seemed like the simplest approach.

An alternative would be to introduce a new NonASCIIDomainLiteralDefect paralleling NonASCIILocalPartDefect and apply it in _header_value_parser.get_domain_literal(). And add NonASCIIAddrSpecDefect as a superclass of both. Then change _refold_parse_tree() to check any(isinstance(d, NonASCIIAddrSpecDefect) for d in part.all_defects) (and perhaps move it up with the other UnicodeEncodeError logic). (If we go this direction, PR #122477 will also need an update.)

Also, I think charset == 'unknown-8bit' is only possible in _refold_parse_tree() when the non-ASCII characters resulted from parsing an existing message: see the UndecodableBytesDefect logic just above the new code. (The added tests seem to confirm this.)

@medmunds medmunds force-pushed the fix-issues-83938-122476 branch from 2e0696c to cbedf5d Compare August 1, 2024 01:11
medmunds added 2 commits July 31, 2024 18:35
Email generators had been incorrectly flattening non-ASCII email
addresses to RFC 2047 encoded-word format, leaving them undeliverable.
(RFC 2047 prohibits use of encoded-word in an addr-spec.)
This change raises a ValueError when attempting to flatten an
EmailMessage with a non-ASCII addr-spec and a policy with utf8=False.
(Exception: If the non-ASCII address originated from parsing a message,
it will be flattened as originally parsed, without error.)

Non-ASCII email addresses are supported when using a policy with
utf8=True (such as email.policy.SMTPUTF8) under RFCs 6531 and 6532.

Non-ASCII email address domains (but not localparts) can also be used
with non-SMTPUTF8 policies by encoding the domain as an IDNA A-label.
(The email package does not perform this encoding, because it cannot
know whether the caller wants IDNA 2003, IDNA 2008, or some other
variant such as UTS python#46.)
@picnixz picnixz changed the title gh-83938: Stop incorrectly RFC 2047 encoding non-ASCII email addresses gh-83938, gh-122476: Stop incorrectly RFC 2047 encoding non-ASCII email addresses Dec 3, 2024
Copy link
Copy Markdown
Member

@bitdancer bitdancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your analysis is solid and the fix looks great. We'll need a follow on PR to have smtplib handle the new error, but that should be a trivial PR.

Comment thread Lib/test/test_email/test_generator.py Outdated
Comment thread Lib/email/_header_value_parser.py Outdated
Comment thread Misc/NEWS.d/next/Library/2024-07-31-17-22-10.gh-issue-83938.TtUa-c.rst Outdated
@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented Mar 31, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be poked with soft cushions!

@medmunds medmunds force-pushed the fix-issues-83938-122476 branch from 5d60c1c to bd6845d Compare April 1, 2025 20:14
@medmunds
Copy link
Copy Markdown
Contributor Author

medmunds commented Apr 1, 2025

I have made the requested changes; please review again

@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented Apr 1, 2025

Thanks for making the requested changes!

@bitdancer: please review the changes made to this pull request.

@bedevere-app bedevere-app Bot requested a review from bitdancer April 1, 2025 20:26
Copy link
Copy Markdown
Member

@bitdancer bitdancer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me so long to get back to this.

Comment thread Lib/email/_header_value_parser.py Outdated
Comment thread Misc/NEWS.d/next/Library/2024-07-31-17-23-06.gh-issue-122476.TtUa-c.rst Outdated
@bedevere-app
Copy link
Copy Markdown

bedevere-app Bot commented Apr 22, 2026

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@bitdancer
Copy link
Copy Markdown
Member

Looking at the code more closely, it is supposed to already be handing this case. The fact that it isn't indicates there's a bug there, and there were certainly missing tests. I will poke at this.

The mime parameter folder doesn't make use of the encoding check
done be the code that is now below it, it does its own.  So it
makes more sense to take that branch first.  This will simplify
subsequent changes.
This is a more complete fix, covering any syntax part where encoded
words are not permitted, and the doc changes are adjusted accordingly.
There is also no need for a new exception, since HeaderWriteError
already exists.

The fix itself is to use a separate code loop to fold parts that
may not have encoded words, guaranteeing that we do not do incorrect
encoding.  This opens a door to simplifying the main folding loop,
but that is a much bigger refactoring job better left for another time.
Behavior when folding in parts versus rendering on one line takes
different code paths, so make sure both work.
@bitdancer
Copy link
Copy Markdown
Member

I've crafted a more complete fix for the bug. If you could review it, @medmunds, that would be great.

Since I switched to using the existing HeaderWriteError, I'm wondering if we need the versionchanged at all at this point, if anyone has input on that.

@medmunds

This comment was marked as outdated.

Fix typo; bump versionchanged.
@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented Apr 29, 2026

Documentation build overview

📚 cpython-previews | 🛠️ Build #32489666 | 📁 Comparing 37f13db against main (76b3923)

  🔍 Preview build  

65 files changed · + 1 added · ± 64 modified

+ Added

± Modified

Copy link
Copy Markdown
Contributor Author

@medmunds medmunds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bitdancer this LGTM. I pushed a couple of fixes to the docs.

I'd be inclined to keep the versionchanged note. (Unless there's some style guide policy against that for bug fixes.) There seems to be a popular misconception—in many email libraries and systems—that any MIME header with non-ASCII chars can just be run through rfc2047. The versionchanged note makes it clear dropping rfc2047 here was a deliberate fix in Python's email package, and might help discourage future requests to add it back in.

@bitdancer
Copy link
Copy Markdown
Member

Since this is a bug fix that isn't going to get backported because of backward compatibility concerns, we'll keep the versionchanged. We can always delete it if someone objects.

@bitdancer
Copy link
Copy Markdown
Member

It occurs to me that if it is worth a versionchanged there should also be a What's New entry. Do you want to write one?

@medmunds
Copy link
Copy Markdown
Contributor Author

medmunds commented Apr 30, 2026

It occurs to me that if it is worth a versionchanged there should also be a What's New entry. Do you want to write one?

Something like this? [edited]

Index: Doc/whatsnew/3.15.rst
<+>UTF-8
===================================================================
diff --git a/Doc/whatsnew/3.15.rst b/Doc/whatsnew/3.15.rst
--- a/Doc/whatsnew/3.15.rst	(revision 71636d347e37db41a11a21419137fd7eb3423f74)
+++ b/Doc/whatsnew/3.15.rst	(date 1777572899407)
@@ -914,6 +914,16 @@
   (Contributed by Eric Froemling in :gh:`149085`.)
 
 
+email
+-----
+
+* Email generators now raise an error when an :class:`.EmailMessage` cannot be
+  accurately flattened due to a non-ASCII email address (mailbox) in an address
+  header. :attr:`.EmailPolicy.utf8` offers options for supporting Email Address
+  Internationalization (EAI).
+  (Contributed by R David Murray and Mike Edmunds in :gh:`122540`.)
+
+
 functools
 ---------
 

[Edited: trying to capture the attention of users who might be affected and direct them to the docs that have complete details.]

@bitdancer
Copy link
Copy Markdown
Member

Yeah, that looks good. Though I'd say "The documentation for...offers" just to make it clear that it isn't the policy option itself that provides the support.

@bitdancer
Copy link
Copy Markdown
Member

Although I guess technically it does...maybe your wording is correct.

@medmunds medmunds requested a review from AA-Turner as a code owner April 30, 2026 19:44
@bitdancer bitdancer merged commit d96ffc1 into python:main May 1, 2026
53 of 54 checks passed
bitdancer pushed a commit that referenced this pull request May 1, 2026
…122477)

The email.headerregistry.Address constructor raised an error if
addr_spec contained a non-ASCII character. (But it fully supports
non-ASCII in the separate username and domain args.) This change
removes the error for a non-ASCII addr_spec, as well as the
Defect that triggered it.  In the unicode era non-ascii is not a
defect, though it is an error when an attempt is made to serialize
it to ascii.  The serialization issue was handled in #122540.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Stale PR or inactive for long period of time. topic-email

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EmailMessage bad encoding for non-ASCII localpart EmailMessage bad encoding for international domain

3 participants