Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 Email Subject problem #58270

Closed
msladek mannequin opened this issue Feb 20, 2012 · 11 comments
Closed

UTF-8 Email Subject problem #58270

msladek mannequin opened this issue Feb 20, 2012 · 11 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@msladek
Copy link
Mannequin

msladek mannequin commented Feb 20, 2012

BPO 14062
Nosy @loewis, @bitdancer
Files
  • issue14062_buggy_email_subject.py: Code used as example for issue 14062, but that didn't reproduce the bug locally
  • a.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/bitdancer'
    closed_at = <Date 2012-03-14.18:28:06.378>
    created_at = <Date 2012-02-20.08:03:28.636>
    labels = ['type-bug', 'library']
    title = 'UTF-8 Email Subject problem'
    updated_at = <Date 2012-03-14.18:28:06.377>
    user = 'https://bugs.python.org/msladek'

    bugs.python.org fields:

    activity = <Date 2012-03-14.18:28:06.377>
    actor = 'r.david.murray'
    assignee = 'r.david.murray'
    closed = True
    closed_date = <Date 2012-03-14.18:28:06.378>
    closer = 'r.david.murray'
    components = ['Library (Lib)']
    creation = <Date 2012-02-20.08:03:28.636>
    creator = 'msladek'
    dependencies = []
    files = ['24823', '24844']
    hgrepos = []
    issue_num = 14062
    keywords = []
    message_count = 11.0
    messages = ['153766', '155629', '155634', '155658', '155738', '155753', '155771', '155772', '155777', '155779', '155781']
    nosy_count = 5.0
    nosy_names = ['loewis', 'r.david.murray', 'python-dev', 'msladek', 'tati_alchueyr']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue14062'
    versions = ['Python 3.2', 'Python 3.3']

    @msladek
    Copy link
    Mannequin Author

    msladek mannequin commented Feb 20, 2012

    Hello!

    I think there is a problem when adding UTF-8 subject to email message. I wrote following function (its code is based on examples I found in offical docs) which should send an email with UTF-8 subject, UTF-8 plain text body and attached file when all arguments are given.
    fromAddr - address of sender
    toAddr - address of recipient
    subject - subject
    body - text of email body
    attachment - full path to file we want to attach

    Here is the code:

    def sendMail (fromAddr, toAddr, subject, body = '', attachment = ''):
        message = email.mime.multipart.MIMEMultipart()
        message.add_header('From',fromAddr)
        message.add_header('To',toAddr)
    message['Subject'] = email.header.Header(subject,'utf-8')
    
        if (body != ''):
            msgPart = email.mime.text.MIMEText(body,'plain','utf-8')
            message.attach(msgPart)
        if (attachment != ''):
            if os.path.exists(attachment) == True:
                filename = attachment.rpartition(os.sep)[2]
                fp = open(attachment,'rb')
                msgPart = email.mime.base.MIMEBase('application','octet-stream')
                msgPart.set_payload(fp.read())
                fp.close()
                email.encoders.encode_base64(msgPart)
                msgPart.add_header('Content-Disposition','attachment',filename=filename)
                message.attach(msgPart)
        if smtpPort == 25:
            smtpCon = smtplib.SMTP(smtpSrv,smtpPort)
        else:
            smtpCon = smtplib.SMTP_SSL(smtpSrv,smtpPort)
        if (smtpUser != '') and (smtpPass != ''):
            smtpCon.login(smtpUser,smtpPass)
        smtpCon.send_message(message,mail_options=['UTF8SMTP','8BITMIME'])
        smtpCon.quit()

    Running the function with following arguments:

    sendMail('rzrobot@seznam.cz','msladek@volny.cz','žluťoučký kůň','úpěl ďábelské ódy')

    produces following output on receiving side:

    Return-Path: <rzrobot@seznam.cz>
    Received: from smtp2.seznam.cz (smtp2.seznam.cz [77.75.76.43])
    by mx1.volny.cz (Postfix) with ESMTP id DD6BB2E09CD
    for <msladek@volny.cz>; Mon, 20 Feb 2012 08:34:38 +0100 (CET)
    DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=seznam.cz;
    h=Received:Content-Type:MIME-Version:From:To:Subject:--===============1029508565==:MIME-Version:Content-Transfer-Encoding:X-Smtpd:X-Seznam-User:X-Session:X-Country:X-Virus-Info:X-Seznam-SPF:X-Seznam-DomainKeys;
    b=cdU1VSRTCDf0x2CeBNbLJxYSOhSy7r9lNp+1s7+bed6AGBI48vufe3q7f8JFxlfTc
    ulZIDptWi6PMvlZYCBkh1uzTKcihZR7MCoxgW0PJLO1LX5elTJsZ/GTc5oe/GZXkTPT
    qwj1EQIlVn0dpZtt4jIzfC2RrO2IRieR2rozeQM=
    Received: from dvr.ph.sladkovi.eu (ip-84-42-150-218.net.upcbroadband.cz [84.42.150.218]) by email-relay2.ng.seznam.cz (Seznam SMTPD 1.2.15-6@18976) with ESMTP; Mon, 20 Feb 2012 08:34:35 +0100 (CET)
    Content-Type: multipart/mixed; boundary="===============1029508565=="
    MIME-Version: 1.0
    From: rzrobot@seznam.cz
    To: msladek@volny.cz
    Subject: =?utf-8?b?xb5sdcWlb3XEjWvDvSBrxa/FiA==?=
    X-DKIM-Status: fail
    X-Virus: no (m2.volny.internal - Mon, 20 Feb 2012 08:34:40 +0100 (CET))
    X-Spam: no (m2.volny.internal - Mon, 20 Feb 2012 08:34:41 +0100 (CET))
    X-Received-Date: Mon, 20 Feb 2012 08:34:42 +0100 (CET)

    --===============1029508565==:Content-Type: text/plain; charset="utf-8"
    MIME-Version: 1.0
    Content-Transfer-Encoding: base64
    X-Smtpd: 1.2.15-6@18976
    X-Seznam-User: rzrobot@seznam.cz
    X-Session: 11
    X-Country: CZ
    X-Virus-Info:clean
    X-Seznam-SPF:neutral
    X-Seznam-DomainKeys:unknown

    w7pwxJtsIMSPw6FiZWxza8OpIMOzZHk=

    --===============1029508565==--

    Although no attachment argument was given, the client says that message has an attachment of unknown type and that message does not contain any text at all. See that message part header :Content-Type: text/plain; charset="utf-8" is part of the message part boundary instead of beeing inside of the message part.

    When I change the code of function to generate the subject manually and add it via add_header like this:

    def sendMail (fromAddr, toAddr, subject, body = '', attachment = ''):
        message = email.mime.multipart.MIMEMultipart()
        message.add_header('From',fromAddr)
        message.add_header('To',toAddr)
    
        base64Subject = base64.b64encode(subject.encode('utf-8')).decode()
        encodedSubject = '=?UTF-8?B?{0}?='.format(base64Subject)
        message.add_header('Subject',encodedSubject)
    
        if (body != ''):
            msgPart = email.mime.text.MIMEText(body,'plain','utf-8')
            message.attach(msgPart)
        if (attachment != ''):
            if os.path.exists(attachment) == True:
                filename = attachment.rpartition(os.sep)[2]
                fp = open(attachment,'rb')
                msgPart = email.mime.base.MIMEBase('application','octet-stream')
                msgPart.set_payload(fp.read())
                fp.close()
                email.encoders.encode_base64(msgPart)
                msgPart.add_header('Content-Disposition','attachment',filename=filename)
                message.attach(msgPart)
        if smtpPort == 25:
            smtpCon = smtplib.SMTP(smtpSrv,smtpPort)
        else:
            smtpCon = smtplib.SMTP_SSL(smtpSrv,smtpPort)
        if (smtpUser != '') and (smtpPass != ''):
            smtpCon.login(smtpUser,smtpPass)
        smtpCon.send_message(message,mail_options=['UTF8SMTP','8BITMIME'])
        smtpCon.quit()

    Then everything is OK on receiving side, both subject and plaint text body are visible:

    Return-Path: <rzrobot@seznam.cz>
    Received: from smtp2.seznam.cz (smtp2.seznam.cz [77.75.76.43])
    by mx1.volny.cz (Postfix) with ESMTP id 177092E0825
    for <msladek@volny.cz>; Mon, 20 Feb 2012 08:51:58 +0100 (CET)
    DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=seznam.cz;
    h=Received:Content-Type:MIME-Version:From:To:Subject:X-Smtpd:X-Seznam-User:X-Session:X-Country:X-Virus-Info:X-Seznam-SPF:X-Seznam-DomainKeys;
    b=F2A6GhX0TWVjnrB4vx/ayc1BTGDFxBI96oI0fk/gr/tgP0jlV1UC91m4i/O4ay+Bg
    lfka88qa71XZOlHtY2vl7zxYjGPJ97pRCdtqWB+JcNOa5bMsk6lmjMHh+A+FQ2e7+yb
    1F091t0nMcQlarriF8sD5rNjhuRYjvCv7kKbt8s=
    Received: from dvr.ph.sladkovi.eu (ip-84-42-150-218.net.upcbroadband.cz [84.42.150.218]) by email-relay1.ng.seznam.cz (Seznam SMTPD 1.2.15-6@18976) with ESMTP; Mon, 20 Feb 2012 08:51:55 +0100 (CET)
    Content-Type: multipart/mixed; boundary="===============1044203895=="
    MIME-Version: 1.0
    From: rzrobot@seznam.cz
    To: msladek@volny.cz
    Subject: =?UTF-8?B?xb5sdcWlb3XEjWvDvSBrxa/FiA==?=
    X-Smtpd: 1.2.15-6@18976
    X-Seznam-User: rzrobot@seznam.cz
    X-Session: 11
    X-Country: CZ
    X-Virus-Info: clean
    X-Seznam-SPF: neutral
    X-Seznam-DomainKeys: unknown
    X-DKIM-Status: pass seznam.cz
    X-Virus: no (m2.volny.internal - Mon, 20 Feb 2012 08:52:00 +0100 (CET))
    X-Spam: no (m2.volny.internal - Mon, 20 Feb 2012 08:52:01 +0100 (CET))
    X-Received-Date: Mon, 20 Feb 2012 08:52:01 +0100 (CET)

    --===============1044203895==
    Content-Type: text/plain; charset="utf-8"
    MIME-Version: 1.0
    Content-Transfer-Encoding: base64

    w7pwxJtsIMSPw6FiZWxza8OpIMOzZHk=

    --===============1044203895==--

    I am not a programmer so I might overlook some obvious mistake in my code but for now I think it's a bug.

    @tatialchueyr
    Copy link
    Mannequin

    tatialchueyr mannequin commented Mar 13, 2012

    Hi msladek!

    I tried to reproduce your bug using Python 3.2.2 on MacOS X, but didn't manage - all worked fine. I used gmail both to send and receive the message, on SSL:
    smtpPort = '465'
    smtpSrv = 'smtp.gmail.com'

    As I'm no SMPTP nor email expert, I asked r.david.murray to review the email message code received and it looks fine.

    Could you provide a smaller example of code that causes the same problem?

    I just extracted your code to help other people trying to reproduce the bug. It is attached.

    @msladek
    Copy link
    Mannequin Author

    msladek mannequin commented Mar 13, 2012

    I tested the code again. Using Gmail SMTP server produces correct results, using server smtp.seznam.cz leads to a problem (I should mention here, that Seznam is the largest free mail provider in the Czech Republic). Here are the differences on receiving side.

    GMAIL:

    Return-Path: <michal@sladkovi.eu>
    Received: from mail-bk0-f45.google.com (mail-bk0-f45.google.com [209.85.214.45])
    by mx4.volny.cz (Postfix) with ESMTP id 0A3E12E086B
    for <msladek@volny.cz>; Tue, 13 Mar 2012 17:58:03 +0100 (CET)
    Received: by bkcjg9 with SMTP id jg9so842625bkc.18
    for <msladek@volny.cz>; Tue, 13 Mar 2012 09:58:03 -0700 (PDT)
    X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
    d=google.com; s=20120113;
    h=message-id:date:content-type:mime-version:from:to:subject
    :x-gm-message-state;
    bh=Sdb8G6CtN+pEzPJHxwbwCprTgWPJUrR3jiU+qeK1WAs=;
    b=X88feHvtpL6zBXYNYSjgUQ+1WirGmU8B69k+4fGlAge6F5+pYd6SzuJ6ExdBsp+brw
    1QuCne97OdVnYoFmg86ZviFz3m6Cn6N8hgPNa2H7hCPQD4O+cjJQQzze4xXYqgPJQs+D
    ke4ISEmxL9UFJUvkTyFhrCDefSxQMY+TnnLwWQR+PCD/uB0FgR2UgBjEx9K7EUKQi6W0
    78+EZYO3cd+SuuadOUvIpe2cj0576ahcP40dGN0kIe+P4NX5Ij7D2cCa/bWiwFdDRUI4
    v8UxJcnbTuOCQFtlItxCAxU9IzZWGekWtpJVnRDBGG63iGXHoTDzp+4+d1FRBGsDQ2pD
    l5tg==
    Received: by 10.204.150.73 with SMTP id x9mr6371797bkv.7.1331657883687;
    Tue, 13 Mar 2012 09:58:03 -0700 (PDT)
    Received: from dvr.ph.sladkovi.eu (ip-84-42-150-218.net.upcbroadband.cz. [84.42.150.218])
    by mx.google.com with ESMTPS id u14sm2783344bkp.2.2012.03.13.09.58.02
    (version=SSLv3 cipher=OTHER);
    Tue, 13 Mar 2012 09:58:02 -0700 (PDT)
    Message-ID: <4f5f7c9a.0e70cc0a.12f5.75a3@mx.google.com>
    Date: Tue, 13 Mar 2012 09:58:02 -0700 (PDT)
    Content-Type: multipart/mixed; boundary="===============1165280172=="
    MIME-Version: 1.0
    From: michal@sladkovi.eu
    To: msladek@volny.cz
    Subject: =?utf-8?b?xb5sdcWlb3XEjWvDvSBrxa/FiA==?=
    X-Gm-Message-State: ALoCoQmf6k2GVVKdm0ZNbvSyPpZ0Gl1yv/BDc3h3zrh34hWWp3wa/fSBXbWT9FANzBLd5k1qUnEP
    X-DKIM-Status: neutral
    X-Virus: no (m2.volny.internal - Tue, 13 Mar 2012 17:58:05 +0100 (CET))
    X-Spam: no (m2.volny.internal - Tue, 13 Mar 2012 17:58:07 +0100 (CET))
    X-Received-Date: Tue, 13 Mar 2012 17:58:08 +0100 (CET)

    --===============1165280172==
    Content-Type: text/plain; charset="utf-8"
    MIME-Version: 1.0
    Content-Transfer-Encoding: base64

    w7pwxJtsIMSPw6FiZWxza8OpIMOzZHk=

    --===============1165280172==--

    --------------------------------------------------------------

    SEZNAM:

    Return-Path: <Michal.Sladek@seznam.cz>
    Received: from smtp2.seznam.cz (smtp2.seznam.cz [77.75.76.43])
    by mx4.volny.cz (Postfix) with ESMTP id 542A32E0868
    for <msladek@volny.cz>; Tue, 13 Mar 2012 18:00:05 +0100 (CET)
    DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=seznam.cz;
    h=Received:Content-Type:MIME-Version:From:To:Subject:--===============1097187749==:MIME-Version:Content-Transfer-Encoding:X-Smtpd:X-Seznam-User:X-Session:X-Country:X-Virus-Info:X-Seznam-SPF:X-Seznam-DomainKeys;
    b=bfwTOSoFJU7vGbB7VvXNIQzhbsj+pDPhwr72BX1aVWAicyK0Cix3evz6c3+srYBba
    lHDeYd74ZXW5553N6ocfy68pRxpI6K5dKfvcKKLgUN7+N/iQOUtj09D4wN81cjPt7qQ
    uH5rjcdsDsbZV31EsxyS1P/rn6F7bYOxrpPeHAk=
    Received: from dvr.ph.sladkovi.eu (ip-84-42-150-218.net.upcbroadband.cz [84.42.150.218]) by email-relay1.ng.seznam.cz (Seznam SMTPD 1.2.15-6@18976) with ESMTP; Tue, 13 Mar 2012 17:59:32 +0100 (CET)
    Content-Type: multipart/mixed; boundary="===============1097187749=="
    MIME-Version: 1.0
    From: Michal.Sladek@seznam.cz
    To: msladek@volny.cz
    Subject: =?utf-8?b?xb5sdcWlb3XEjWvDvSBrxa/FiA==?=
    X-DKIM-Status: fail
    X-Virus: no (m2.volny.internal - Tue, 13 Mar 2012 18:00:06 +0100 (CET))
    X-Spam: no (m2.volny.internal - Tue, 13 Mar 2012 18:00:08 +0100 (CET))
    X-Received-Date: Tue, 13 Mar 2012 18:00:08 +0100 (CET)

    --===============1097187749==:Content-Type: text/plain; charset="utf-8"
    MIME-Version: 1.0
    Content-Transfer-Encoding: base64
    X-Smtpd: 1.2.15-6@18976
    X-Seznam-User: michal.sladek@seznam.cz
    X-Session: 5
    X-Country: CZ
    X-Virus-Info:clean
    X-Seznam-SPF:neutral
    X-Seznam-DomainKeys:unknown

    w7pwxJtsIMSPw6FiZWxza8OpIMOzZHk=

    --===============1097187749==--

    --------------------------------------------------------------

    As you can see, Seznam is adding a lot of headers into mail's body. Anyway, making utf-8 subject manually like this:

        base64Subject = base64.b64encode(subject.encode('utf-8')).decode()
        encodedSubject = '=?UTF-8?B?{0}?='.format(base64Subject)
        message.add_header('Subject',encodedSubject)

    works correctly for both SMTP servers. So there must be a difference...

    @bitdancer
    Copy link
    Member

    It makes no sense that changing how Subject is generated would affect the later formatting of the mime header. There is no coupling that I'm aware of in the code.

    I notice that your handcrafted version uses uppercase for the charset and CTE code. Can you try using lowercase like the email module does, and see if that reproduces the problem?

    @msladek
    Copy link
    Mannequin Author

    msladek mannequin commented Mar 14, 2012

    Changing code to:
    encodedSubject = '=?utf-8?b?{0}?='.format(base64Subject)
    still works properly with smtp.seznam.cz server....

    @bitdancer
    Copy link
    Member

    I think the next thing to do would be to replace the call to send_message with code that calls BytesGenerator to write the message out to disk, and diff the output of the two versions (normal subject and hand-encoded subject). Maybe that will give us a clue.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 14, 2012

    I digged a little bit further. The data being sent is

    'Content-Type: multipart/mixed; boundary="===============1981330074035035012=="\r\nMIME-Version: 1.0\r\nFrom: rzrobot@seznam.cz\r\nTo: msladek@volny.cz\r\nSubject: =?utf-8?b?xb5sdcWlb3XEjWvDvSBrxa/FiA==?=\n\r\n--===============1981330074035035012==\r\nContent-Type: text/plain; charset="utf-8"\r\nMIME-Version: 1.0\r\nContent-Transfer-Encoding: base64\r\n\r\nw7pwxJtsIMSPw6FiZWxza8OpIMOzZHk=\n\r\n--===============1981330074035035012==--'

    As you notice, there is a plain \n (without \r) after the subject (and all other places with base64), which might confuse seznam.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 14, 2012

    I also attach a stand-alone version. To run this locally, run

    smtpdX.Y.py -dn localhost:2525

    @bitdancer
    Copy link
    Member

    OK, got it. When I created BytesParser I turned the 'NL' constant into a class attribute, but in the line that handles Header objects in BytesParser I failed to change NL to self._NL. So when send_message calls flatten with linesep='\r\n', in that one place it was using \n instead of the correct linesep.

    I've got a patch which I will commit shortly.

    @bitdancer bitdancer added the stdlib Python modules in the Lib dir label Mar 14, 2012
    @bitdancer bitdancer self-assigned this Mar 14, 2012
    @bitdancer bitdancer added the type-bug An unexpected behavior, bug, or error label Mar 14, 2012
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 14, 2012

    New changeset d0bf40ff20ef by R David Murray in branch '3.2':
    bpo-14062: fix BytesParser handling of linesep for Header objects
    http://hg.python.org/cpython/rev/d0bf40ff20ef

    New changeset 7617f3071320 by R David Murray in branch 'default':
    bpo-14062: fix BytesParser handling of Header objects
    http://hg.python.org/cpython/rev/7617f3071320

    @bitdancer
    Copy link
    Member

    Thanks for the bug report. I thought we had tests for processing Header objects when serializing a message using BytesParser, but clearly we didn't.

    And thanks Tatiana and Martin for issue review and testing.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant