Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-118716: Do not enforce ASCII to format address #118717

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cedk
Copy link
Contributor

@cedk cedk commented May 7, 2024

RFC 6530 allows UTF-8 strings

@nineteendo
Copy link
Contributor

You need to update this test:

def test_unicode_address_raises_error(self):
# issue 1690608. email.utils.formataddr() should be rfc2047 aware.
addr = 'pers\u00f6n@dom.in'
self.assertRaises(UnicodeError, utils.formataddr, (None, addr))
self.assertRaises(UnicodeError, utils.formataddr, ("Name", addr))

@nineteendo
Copy link
Contributor

The commit that added this test pre-dates that rfc: 8debacb.

@bitdancer
Copy link
Member

IMO the legacy API (which is what you are modifying) should not support utf-8 addresses, since it provides no mechanisms for correctly interoperating with smtplib the way the new API does. Therefore I recommend rejecting this PR. See my comments on the issue for more.

@nineteendo
Copy link
Contributor

Could you add a news entry? https://blurb-it.herokuapp.com/add_blurb

@@ -83,8 +83,6 @@ def formataddr(pair, charset='utf-8'):
'utf-8'.
"""
name, address = pair
# The address MUST (per RFC) be ascii, so raise a UnicodeError if it isn't.
address.encode('ascii')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I object to this change. I don't think the legacy API should be modified.

@@ -1214,14 +1214,14 @@ def test_get_local_part_valid_and_invalid_qp_in_atom_list(self):
'@example.com')
self.assertEqual(local_part.local_part, r'\example\\ example')

def test_get_local_part_unicode_defect(self):
def test_get_local_part_unicode(self):
Copy link
Member

@bitdancer bitdancer May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not, IMO, the best solution. But fixing it right is non-trivial. The issue here is that this IS a defect if policy.utf8=False, and that should be maintained. The problem is that currently the header value parser does not have access to the policy. Which was a design mistake.

The reason you want to maintain the old behavior (aside from backward compatibility, which definitely counts) is that the parser is designed to be used in multiple scenarios, and one of those scenarios might be a place were we want to be emulating a non-SMTPUTF8 parser and looking at where a parsed message does not pass muster.

Now, that said, I'm not seeing any good way to fix this right. I think the best compromise might be to leave this defect alone, but to change the Address object so that it specifically ignores this defect (does not raise an error in that case). The smtp library should be raising an error if an attempt is made to actually send using a utf8 address to a server that does not support SMTPUTF8. With this compromise backward compatibility will be maintained, while still allowing utf8 addresses to be specified using Address (which should only be used by library consumers, not the library itself). I believe I am correct that such addresses will not otherwise be rejected by the email library.

This leaves a dangling issue that if we parse a message using utf8=True we technically should not get a defect, but unless and until we teach the header value parser about the policy we can't really fix that.

@bedevere-app
Copy link

bedevere-app bot commented May 9, 2024

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

…r_spec

RFC 6530 allows UTF-8 strings in the local part.
@cedk cedk force-pushed the email-utils-formataddr-utf-8 branch from ac72c64 to 2bbb200 Compare May 10, 2024 10:39
@cedk
Copy link
Contributor Author

cedk commented May 10, 2024

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented May 10, 2024

Thanks for making the requested changes!

@bitdancer: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from bitdancer May 10, 2024 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants