Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smtplib does not handle Unicode characters #85195

Closed
jpatel mannequin opened this issue Jun 18, 2020 · 3 comments
Closed

smtplib does not handle Unicode characters #85195

jpatel mannequin opened this issue Jun 18, 2020 · 3 comments
Labels
3.8 only security fixes topic-email type-feature A feature request or enhancement

Comments

@jpatel
Copy link
Mannequin

jpatel mannequin commented Jun 18, 2020

BPO 41023
Nosy @warsaw, @bitdancer
Files
  • send_rawemail_demo.py
  • providing_Unicode_characters_in_email_body.png
  • providing_mail_options_in_sendmail.png
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-06-18.12:51:03.989>
    created_at = <Date 2020-06-18.09:45:34.374>
    labels = ['type-feature', '3.8', 'expert-email']
    title = 'smtplib does not handle Unicode characters'
    updated_at = <Date 2020-06-29.10:24:36.777>
    user = 'https://bugs.python.org/jpatel'

    bugs.python.org fields:

    activity = <Date 2020-06-29.10:24:36.777>
    actor = 'jpatel'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-06-18.12:51:03.989>
    closer = 'r.david.murray'
    components = ['email']
    creation = <Date 2020-06-18.09:45:34.374>
    creator = 'jpatel'
    dependencies = []
    files = ['49249', '49251', '49252']
    hgrepos = []
    issue_num = 41023
    keywords = []
    message_count = 3.0
    messages = ['371801', '371803', '371808']
    nosy_count = 3.0
    nosy_names = ['barry', 'r.david.murray', 'jpatel']
    pr_nums = []
    priority = 'normal'
    resolution = 'works for me'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue41023'
    versions = ['Python 3.8']

    @jpatel
    Copy link
    Mannequin Author

    jpatel mannequin commented Jun 18, 2020

    According to the user requirements, I need to send an email, which is provided as a raw email, i.e., the contents of email are provided in form of headers. To accomplish this I am using the methods provided in the "send_rawemail_demo.py" file (attached below).
    The smtplib library works fine when providing only 'ascii' characters in the 'raw_email' variable. But, when I provide any Unicode characters either in the Subject or Body of the email, then the sendmail method of the smtplib library fails with the following message:
    UnicodeEncodeError 'ascii' codec can't encode characters in position 123-124: ordinal not in range(128)
    I tried providing the mail_options=["SMTPUTF-8"] in the sendmail method (On line no. 72 in the send_rawemail_demo.py file), but then it fails (even for the 'ascii' characters) with the exception as SMTPSenderRefused.
    I have faced the same issue on Python 3.6.
    The sendmail method of the SMTP class encodes the message using 'ascii' as:
    if isinstance(msg, str):
    msg = _fix_eols(msg).encode('ascii')
    The code works properly for Python 2 as the smtplib library for Python 2 does not have the above line and hence it allows Unicode characters in the Body and the Subject.

    @jpatel jpatel mannequin added 3.8 only security fixes topic-email type-feature A feature request or enhancement labels Jun 18, 2020
    @jpatel
    Copy link
    Mannequin Author

    jpatel mannequin commented Jun 18, 2020

    Screenshot for the case, where only the 'raw_email' variable contains only 'ascii' characters.

    @bitdancer
    Copy link
    Member

    If you use the 'sendmail' function for sending, then it is entirely your responsibility to turn the email into "wire format". Unicode is not wire format, but if you give sendmail a string that only has ascii in it it nicely converts it to binary for you. But given that the email RFCs specify specific ways to indicate how non-ascii is encoded in the message, there is no way for the smtp library to know now to do that correctly when passed an arbitrary unicode string, so it doesn't try. sendmail requires *you* do do the encoding to binary, indicating you at least think that you got the RFC parts right :) In python2, strings are binary by default, so in that case you are handing sendmail binary format data (with the same assumption that you got the RFC parts right)...if you passed the python2 function a unicode string it would probably complain as well, although not in the same way.

    If your raw email is RFC compliant, then you can do: sendmail(from, to, mymsg.encode()).

    I see from your example that you are trying to use the email package to construct the email, which is good. But, emails are *binary*, they are not unicode, so passing "message_from_string" a unicode string containing non-ascii isn't going to do what you are expecting, any more than passing unicode to the 'sendmail' function did. message_from_string is really only useful for doing certain sorts of debug and ought to be deprecated. Or produce a warning when handed a string containing non-ascii. (There are historical reasons why it doesn't :(

    And then you should use smtplib's 'sendmessage' function, which understands email package messages and will Do the Right Thing with them (including the extraction of the to and from addresses your code is currently doing).

    However, even if you encoded your raw message to binary and then passed it to message_from_bytes, your example message is *not* RFC compliant: without MIME headers, an email with non-ascii characters in the body is technically in violation of the RFC. Most email programs will handle that particular message despite that, but not all. You are better off using the email package to construct a properly RFC formatted email, using the new API (ex: msg = EmailMessage() (not Message), and then doing msg['from'] = address, etc, and msg.set_content(your unicode string body)). I can't really give you much advice here (nor should I, this being a bug tracker :) because I don't know how exactly how the data is coming in to your program in your real use case.

    Once you have a properly constructed EmailMessage object, you should use smtplib's 'sendmessage' function, which understands email package messages and will Do the Right Thing with them (including the extraction of the to and from addresses your code is currently doing, as well as properly handling BCC, which means deleting BCC headers from the message before sending it, which your code does not do and which 'sendmail' would not do.)

    SMTPUTF8 is about non-ascii in the email *headers*, and most SMTP servers these days do not yes support it[*]. Some of the big ones do, though (I believe gmail does).

    [*] although that doesn't explain why what you got was SMTPSenderRefused. You should have gotten SMTPNotSupportedError.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes topic-email type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant