New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode names break email header #78605
Comments
See also this comment and ensuing conversation: https://bugs.python.org/issue24218?#msg322761 Consider an email message with the following: message = EmailMessage()
message["From"] = Address(addr_spec="bar@foo.com", display_name="Jens Troeger")
message["To"] = Address(addr_spec="foo@bar.com", display_name="Martín Córdoba") It’s important here that the email itself is flatmsg: b'From: Jens Troeger <bar@foo.com>\r\nTo: Fernando =?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= <foo@bar.com>\r\r\r\r\r\nSubject:\r\n Confirmation: …\r\n…' After an initial investigation into the BytesGenerator (used to flatten an EmailMessage object), here is what’s happening. Flattening the body and attachments of the EmailMessage object works, and eventually _write_headers() is called to flatten the headers which happens entry by entry (https://github.com/python/cpython/blob/master/Lib/email/generator.py#L417-L418). Flattening a header entry is a recursive process over the parse tree of the entry, which builds the flattened and encoded final string by descending into the parse tree and encoding & concatenating the individual “parts” (tokens of the header entry). Given the parse tree for a header entry like "Martín Córdoba <foo@bar.com>" eventually results in the correct flattened string:
at the bottom of the recursion for this “Mailbox” part. The recursive callstack is then:
The problem now arises from the interplay of # https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2629
encoded_part = part.fold(policy=policy)[:-1] # strip nl which strips the '\n' from the returned string, and
which adds the policy’s line separation string linesep="\r\n" to the end of the flattened string upon unrolling the recursion. I am not sure about a proper fix here, but considering that the linesep policy can be any string length (in this case len("\r\n") == 2) a fixed truncation of one character [:-1] seems wrong. Instead, using: encoded_part = part.fold(policy=policy)[:-len(policy.linesep)] # strip nl seems to work for entries with and without Unicode characters in their display names. |
Pull request #8803 |
I've requested some small changes on the PR. If Jens doesn't respond in another week or so someone else could pick it up. |
Michael, if you could check if Jens patch fixes your problem I would appreciate it. |
Thanks for pointing me to this issue. :)
Jens PR does exactly, what I proposed in bpo-35057, so it fixes my problem, indeed. |
Can somebody please review and merge #8803 ? I am still waiting for this fix the become mainstream. |
This looks like it was just pending final approval after changes. Please let me know if I can help move it along. Thanks! |
Cheryl, if you can find somebody to approve and merge this fix, that would be greatly appreciated! Anything I can do, please let me know. |
Approved and merged. Cheryl, can you shepherd this through the backport process, please? I'm contributing infrequently enough that I'm not even sure which version we are on :) |
David, Thank you for the review and merging into master! I'd be glad to backport this. I believe the only backport would be to 3.7 (master is currently 3.8), but please let me know if it would need to go into a security branch. |
Thank you. I don't believe this is a security issue. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: