Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

email.Header no encoding of unicode strings containing newlines #66856

Closed
flavio mannequin opened this issue Oct 18, 2014 · 5 comments
Closed

email.Header no encoding of unicode strings containing newlines #66856

flavio mannequin opened this issue Oct 18, 2014 · 5 comments
Labels
topic-email type-bug An unexpected behavior, bug, or error

Comments

@flavio
Copy link
Mannequin

flavio mannequin commented Oct 18, 2014

BPO 22666
Nosy @warsaw, @bitdancer, @serhiy-storchaka
Files
  • fix_email_header_encoding_uses_ascii_before_selected_charset.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-05-31.14:51:52.710>
    created_at = <Date 2014-10-18.13:08:05.321>
    labels = ['type-bug', 'expert-email']
    title = 'email.Header no encoding of unicode strings containing newlines'
    updated_at = <Date 2020-05-31.14:51:52.709>
    user = 'https://bugs.python.org/flavio'

    bugs.python.org fields:

    activity = <Date 2020-05-31.14:51:52.709>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-05-31.14:51:52.710>
    closer = 'serhiy.storchaka'
    components = ['email']
    creation = <Date 2014-10-18.13:08:05.321>
    creator = 'flavio'
    dependencies = []
    files = ['36959']
    hgrepos = []
    issue_num = 22666
    keywords = ['patch']
    message_count = 5.0
    messages = ['229640', '231714', '231773', '231776', '370474']
    nosy_count = 4.0
    nosy_names = ['barry', 'r.david.murray', 'serhiy.storchaka', 'flavio']
    pr_nums = []
    priority = 'normal'
    resolution = 'out of date'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue22666'
    versions = ['Python 2.7']

    @flavio
    Copy link
    Mannequin Author

    flavio mannequin commented Oct 18, 2014

    When trying to encode an email header with a newline in it, correct encoding is done only for strings and not for unicode strings.
    In fact, for unicode strings, encoding is only done if a non ascii character is contained in it.

    The attached patch should fix the problem.

    Simple example to reproduce the problem:
    >>> from email.Header import Header as H
    
    # correctly encoded
    >>> H('two\r\nlines', 'utf-8').encode()
    '=?utf-8?q?two=0D=0Alines?='
    
    # unicode string not encoded
    >>> H(u'two\r\nlines', 'utf-8').encode()
    'two\r\nlines'
    
    # unicode string with non ascii chars, correctly encoded
    >>> H(u'two\r\nlines and \xe0', 'utf-8').encode()
    '=?utf-8?b?dHdvDQpsaW5lcyBhbmQgw6A=?='

    @flavio flavio mannequin added topic-email type-bug An unexpected behavior, bug, or error labels Oct 18, 2014
    @flavio
    Copy link
    Mannequin Author

    flavio mannequin commented Nov 26, 2014

    any news?

    @bitdancer
    Copy link
    Member

    I'd have to double check, but I think having /r /n etc encoded in an encopded string is illegal per the rfcs. It should be, anyway. So IMO the bug is encoding them at all, but at this point we probably can't fix it for bacward compatibility reasons.

    I'm leaving this issue open for the moment because I do want to check the rfc, and also double check what the new API does in this situation (and make sure there are tests).

    @flavio
    Copy link
    Mannequin Author

    flavio mannequin commented Nov 27, 2014

    Hi, and thank you for your answer.

    However this is not strictly related to the newline, but also to some small idiosyncrasies and different behavior among py2 and py3 (and even in py2 using Header() or Charset()):

    # py2.7, non-unicode str
    >>> H('test', 'utf-8').encode()
    '=?utf-8?q?test?='
    
    >>> Charset('utf-8').header_encode('test')
    '=?utf-8?q?test?='
    
    
    # py2.7, unicode str
    >>> H(u'test', 'utf-8').encode()   # this is the only different result
    'test'
    
    >>> Charset('utf-8').header_encode(u'test')
    u'=?utf-8?q?test?='
    
    
    
    # py3.4, unicode
    >>> H('test', 'utf-8').encode()                                            
    '=?utf-8?q?test?='                                                      
                    
    # py3.4, bytes                                                                
    >>> H(b'test', 'utf-8').encode()                                             
    '=?utf-8?q?test?='

    As you can see, the only when using unicode strings in py2.7 no header encoding is done if the unicode string contains only ascii chars.

    @serhiy-storchaka
    Copy link
    Member

    Python 2.7 is no longer supported.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-email type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants