New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
email.Header no encoding of unicode strings containing newlines #66856
Comments
When trying to encode an email header with a newline in it, correct encoding is done only for strings and not for unicode strings. The attached patch should fix the problem. Simple example to reproduce the problem:
>>> from email.Header import Header as H
# correctly encoded
>>> H('two\r\nlines', 'utf-8').encode()
'=?utf-8?q?two=0D=0Alines?='
# unicode string not encoded
>>> H(u'two\r\nlines', 'utf-8').encode()
'two\r\nlines'
# unicode string with non ascii chars, correctly encoded
>>> H(u'two\r\nlines and \xe0', 'utf-8').encode()
'=?utf-8?b?dHdvDQpsaW5lcyBhbmQgw6A=?=' |
any news? |
I'd have to double check, but I think having /r /n etc encoded in an encopded string is illegal per the rfcs. It should be, anyway. So IMO the bug is encoding them at all, but at this point we probably can't fix it for bacward compatibility reasons. I'm leaving this issue open for the moment because I do want to check the rfc, and also double check what the new API does in this situation (and make sure there are tests). |
Hi, and thank you for your answer. However this is not strictly related to the newline, but also to some small idiosyncrasies and different behavior among py2 and py3 (and even in py2 using Header() or Charset()): # py2.7, non-unicode str
>>> H('test', 'utf-8').encode()
'=?utf-8?q?test?='
>>> Charset('utf-8').header_encode('test')
'=?utf-8?q?test?='
# py2.7, unicode str
>>> H(u'test', 'utf-8').encode() # this is the only different result
'test'
>>> Charset('utf-8').header_encode(u'test')
u'=?utf-8?q?test?='
# py3.4, unicode
>>> H('test', 'utf-8').encode()
'=?utf-8?q?test?='
# py3.4, bytes
>>> H(b'test', 'utf-8').encode()
'=?utf-8?q?test?=' As you can see, the only when using unicode strings in py2.7 no header encoding is done if the unicode string contains only ascii chars. |
Python 2.7 is no longer supported. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: