New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 Email Subject problem #58270
Comments
Hello! I think there is a problem when adding UTF-8 subject to email message. I wrote following function (its code is based on examples I found in offical docs) which should send an email with UTF-8 subject, UTF-8 plain text body and attached file when all arguments are given. Here is the code: def sendMail (fromAddr, toAddr, subject, body = '', attachment = ''):
message = email.mime.multipart.MIMEMultipart()
message.add_header('From',fromAddr)
message.add_header('To',toAddr)
if (body != ''):
msgPart = email.mime.text.MIMEText(body,'plain','utf-8')
message.attach(msgPart)
if (attachment != ''):
if os.path.exists(attachment) == True:
filename = attachment.rpartition(os.sep)[2]
fp = open(attachment,'rb')
msgPart = email.mime.base.MIMEBase('application','octet-stream')
msgPart.set_payload(fp.read())
fp.close()
email.encoders.encode_base64(msgPart)
msgPart.add_header('Content-Disposition','attachment',filename=filename)
message.attach(msgPart)
if smtpPort == 25:
smtpCon = smtplib.SMTP(smtpSrv,smtpPort)
else:
smtpCon = smtplib.SMTP_SSL(smtpSrv,smtpPort)
if (smtpUser != '') and (smtpPass != ''):
smtpCon.login(smtpUser,smtpPass)
smtpCon.send_message(message,mail_options=['UTF8SMTP','8BITMIME'])
smtpCon.quit() Running the function with following arguments: sendMail('rzrobot@seznam.cz','msladek@volny.cz','žluťoučký kůň','úpěl ďábelské ódy') produces following output on receiving side: Return-Path: <rzrobot@seznam.cz> --===============1029508565==:Content-Type: text/plain; charset="utf-8" w7pwxJtsIMSPw6FiZWxza8OpIMOzZHk= --===============1029508565==-- Although no attachment argument was given, the client says that message has an attachment of unknown type and that message does not contain any text at all. See that message part header :Content-Type: text/plain; charset="utf-8" is part of the message part boundary instead of beeing inside of the message part. When I change the code of function to generate the subject manually and add it via add_header like this: def sendMail (fromAddr, toAddr, subject, body = '', attachment = ''):
message = email.mime.multipart.MIMEMultipart()
message.add_header('From',fromAddr)
message.add_header('To',toAddr)
base64Subject = base64.b64encode(subject.encode('utf-8')).decode()
encodedSubject = '=?UTF-8?B?{0}?='.format(base64Subject)
message.add_header('Subject',encodedSubject)
if (body != ''):
msgPart = email.mime.text.MIMEText(body,'plain','utf-8')
message.attach(msgPart)
if (attachment != ''):
if os.path.exists(attachment) == True:
filename = attachment.rpartition(os.sep)[2]
fp = open(attachment,'rb')
msgPart = email.mime.base.MIMEBase('application','octet-stream')
msgPart.set_payload(fp.read())
fp.close()
email.encoders.encode_base64(msgPart)
msgPart.add_header('Content-Disposition','attachment',filename=filename)
message.attach(msgPart)
if smtpPort == 25:
smtpCon = smtplib.SMTP(smtpSrv,smtpPort)
else:
smtpCon = smtplib.SMTP_SSL(smtpSrv,smtpPort)
if (smtpUser != '') and (smtpPass != ''):
smtpCon.login(smtpUser,smtpPass)
smtpCon.send_message(message,mail_options=['UTF8SMTP','8BITMIME'])
smtpCon.quit() Then everything is OK on receiving side, both subject and plaint text body are visible: Return-Path: <rzrobot@seznam.cz> --===============1044203895== w7pwxJtsIMSPw6FiZWxza8OpIMOzZHk= --===============1044203895==-- I am not a programmer so I might overlook some obvious mistake in my code but for now I think it's a bug. |
Hi msladek! I tried to reproduce your bug using Python 3.2.2 on MacOS X, but didn't manage - all worked fine. I used gmail both to send and receive the message, on SSL: As I'm no SMPTP nor email expert, I asked r.david.murray to review the email message code received and it looks fine. Could you provide a smaller example of code that causes the same problem? I just extracted your code to help other people trying to reproduce the bug. It is attached. |
I tested the code again. Using Gmail SMTP server produces correct results, using server smtp.seznam.cz leads to a problem (I should mention here, that Seznam is the largest free mail provider in the Czech Republic). Here are the differences on receiving side. GMAIL: Return-Path: <michal@sladkovi.eu> --===============1165280172== w7pwxJtsIMSPw6FiZWxza8OpIMOzZHk= --===============1165280172==-- -------------------------------------------------------------- SEZNAM: Return-Path: <Michal.Sladek@seznam.cz> --===============1097187749==:Content-Type: text/plain; charset="utf-8" w7pwxJtsIMSPw6FiZWxza8OpIMOzZHk= --===============1097187749==-- -------------------------------------------------------------- As you can see, Seznam is adding a lot of headers into mail's body. Anyway, making utf-8 subject manually like this: base64Subject = base64.b64encode(subject.encode('utf-8')).decode()
encodedSubject = '=?UTF-8?B?{0}?='.format(base64Subject)
message.add_header('Subject',encodedSubject) works correctly for both SMTP servers. So there must be a difference... |
It makes no sense that changing how Subject is generated would affect the later formatting of the mime header. There is no coupling that I'm aware of in the code. I notice that your handcrafted version uses uppercase for the charset and CTE code. Can you try using lowercase like the email module does, and see if that reproduces the problem? |
Changing code to: |
I think the next thing to do would be to replace the call to send_message with code that calls BytesGenerator to write the message out to disk, and diff the output of the two versions (normal subject and hand-encoded subject). Maybe that will give us a clue. |
I digged a little bit further. The data being sent is 'Content-Type: multipart/mixed; boundary="===============1981330074035035012=="\r\nMIME-Version: 1.0\r\nFrom: rzrobot@seznam.cz\r\nTo: msladek@volny.cz\r\nSubject: =?utf-8?b?xb5sdcWlb3XEjWvDvSBrxa/FiA==?=\n\r\n--===============1981330074035035012==\r\nContent-Type: text/plain; charset="utf-8"\r\nMIME-Version: 1.0\r\nContent-Transfer-Encoding: base64\r\n\r\nw7pwxJtsIMSPw6FiZWxza8OpIMOzZHk=\n\r\n--===============1981330074035035012==--' As you notice, there is a plain \n (without \r) after the subject (and all other places with base64), which might confuse seznam. |
I also attach a stand-alone version. To run this locally, run smtpdX.Y.py -dn localhost:2525 |
OK, got it. When I created BytesParser I turned the 'NL' constant into a class attribute, but in the line that handles Header objects in BytesParser I failed to change NL to self._NL. So when send_message calls flatten with linesep='\r\n', in that one place it was using \n instead of the correct linesep. I've got a patch which I will commit shortly. |
New changeset d0bf40ff20ef by R David Murray in branch '3.2': New changeset 7617f3071320 by R David Murray in branch 'default': |
Thanks for the bug report. I thought we had tests for processing Header objects when serializing a message using BytesParser, but clearly we didn't. And thanks Tatiana and Martin for issue review and testing. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: