Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

email.errors.HeaderParseError if base64url is used #56698

Open
guettli mannequin opened this issue Jul 4, 2011 · 7 comments
Open

email.errors.HeaderParseError if base64url is used #56698

guettli mannequin opened this issue Jul 4, 2011 · 7 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir topic-email type-bug An unexpected behavior, bug, or error

Comments

@guettli
Copy link
Mannequin

guettli mannequin commented Jul 4, 2011

BPO 12489
Nosy @warsaw, @amauryfa, @bitdancer, @phmc, @iritkatriel
Files
  • 62b280b61de7.diff
  • 732e7d4515c0.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2011-07-04.14:38:19.681>
    labels = ['type-bug', 'expert-email', '3.10', '3.11', 'library', '3.9']
    title = 'email.errors.HeaderParseError if base64url is used'
    updated_at = <Date 2021-12-13.23:49:55.369>
    user = 'https://bugs.python.org/guettli'

    bugs.python.org fields:

    activity = <Date 2021-12-13.23:49:55.369>
    actor = 'iritkatriel'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)', 'email']
    creation = <Date 2011-07-04.14:38:19.681>
    creator = 'guettli'
    dependencies = []
    files = ['34943', '34944']
    hgrepos = ['239', '240']
    issue_num = 12489
    keywords = ['patch']
    message_count = 7.0
    messages = ['139776', '139778', '139793', '139855', '216690', '216705', '408500']
    nosy_count = 7.0
    nosy_names = ['barry', 'guettli', 'amaury.forgeotdarc', 'ctheune', 'r.david.murray', 'pconnell', 'iritkatriel']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue12489'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @guettli
    Copy link
    Mannequin Author

    guettli mannequin commented Jul 4, 2011

    from email.header import decode_header
    decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib64/python2.6/email/header.py", line 101, in decode_header
        raise HeaderParseError
    email.errors.HeaderParseError

    thunderbird is able to decode the header:
    "Anmeldung Netzanschluss Südring3p.jpg"

    According to Peter Otten base64url encoding was used:

    My question in the newsgroup:
    http://groups.google.com/group/comp.lang.python/browse_thread/thread/9cf9ccd4109481cc/9f76bd627676b5f1?show_docid=9f76bd627676b5f1

    @guettli guettli mannequin added the stdlib Python modules in the Lib dir label Jul 4, 2011
    @guettli
    Copy link
    Mannequin Author

    guettli mannequin commented Jul 4, 2011

    This happens on Python3:
    root@ubuntu1004devel64:~# python3
    Python 3.1.2 (r312:79147, Sep 27 2010, 09:57:50) 
    [GCC 4.4.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from email.header import decode_header
    >>> decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')
    Traceback (most recent call last):
      File "/usr/lib/python3.1/email/header.py", line 98, in decode_header
        word = email.base64mime.decode(encoded_string)
      File "/usr/lib/python3.1/email/base64mime.py", line 112, in decode
        return a2b_base64(string.encode('raw-unicode-escape'))
    binascii.Error: Incorrect padding
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python3.1/email/header.py", line 100, in decode_header
        raise HeaderParseError('Base64 decoding error')
    email.errors.HeaderParseError: Base64 decoding error

    @amauryfa
    Copy link
    Member

    amauryfa commented Jul 4, 2011

    This gives the correct result:
    decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU/xkcmluZzNwLmpwZw==?=')
    (I replaced _ with /)

    The header was probably generated by a variant of the base64 encoding, like this one: http://www.doughellmann.com/PyMOTW/base64/#url-safe-variations

    Does this header comes from a real message? How was it generated?

    @guettli
    Copy link
    Mannequin Author

    guettli mannequin commented Jul 5, 2011

    I received this email. Here is the creator:

    X-Mailer: CommuniGate Pro MAPI Connector 1.52.53.10/1.53.10.1

    @ctheune
    Copy link
    Mannequin

    ctheune mannequin commented Apr 17, 2014

    So, in addition to "+/" and "-" there are quite a few base64 variants. Worst thing: there are the two ambigious variants "-" and "_-", even though "_-" supposedly is "non-standard" for its use.

    See http://en.wikipedia.org/wiki/Base64

    The shortest fix I can see would be to not use binascii directly from the email module but go through the base64 module (as pointed out by the blogpost) and call the urlsafe version. That should catch both cases.

    Preparing a patch right now.

    @bitdancer
    Copy link
    Member

    The patch looks good. I'd like the comment to say "We use urlsafe_b64decode here because some mailers apparently use the urlsafe b64 alphabet, and urlsafe_b64decode will correctly decode both the urlsafe and regular alphabets".

    Also, the new header parser doesn't handle this case either, and furthermore it doesn't handle binascii errors at all (my comment in the code indicates I didn't think it could ever raise there). The fact that it doesn't handle the error at all can be considered a different issue, but it would be nice to add the same decode fix (and a test in test_email/test_headerregistry.py) for the new header parser. Here's one way to reproduce the issue:

    >>> from email import message_from_string as mfs
    >>> from email.policy import default
    >>> m = mfs("From: =?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=\n\n", policy=default)
    >>> m['From']
    Traceback (most recent call last):
      File "/home/rdmurray/python/p34/Lib/email/_encoded_words.py", line 109, in decode_b
        return base64.b64decode(padded_encoded, validate=True), defects
      File "/home/rdmurray/python/p34/Lib/base64.py", line 89, in b64decode
        raise binascii.Error('Non-base64 digit found')
    binascii.Error: Non-base64 digit found
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/rdmurray/python/p34/Lib/email/message.py", line 391, in __getitem__
        return self.get(name)
      File "/home/rdmurray/python/p34/Lib/email/message.py", line 471, in get
        return self.policy.header_fetch_parse(k, v)
      File "/home/rdmurray/python/p34/Lib/email/policy.py", line 145, in header_fetch_parse
        return self.header_factory(name, ''.join(value.splitlines()))
      File "/home/rdmurray/python/p34/Lib/email/headerregistry.py", line 583, in __call__
        return self[name](name, value)
      File "/home/rdmurray/python/p34/Lib/email/headerregistry.py", line 194, in __new__
        cls.parse(value, kwds)
      File "/home/rdmurray/python/p34/Lib/email/headerregistry.py", line 334, in parse
        kwds['parse_tree'] = address_list = cls.value_parser(value)
      File "/home/rdmurray/python/p34/Lib/email/headerregistry.py", line 325, in value_parser
        address_list, value = parser.get_address_list(value)
      File "/home/rdmurray/python/p34/Lib/email/_header_value_parser.py", line 2313, in get_address_list
        token, value = get_address(value)
      File "/home/rdmurray/python/p34/Lib/email/_header_value_parser.py", line 2290, in get_address
        token, value = get_group(value)
      File "/home/rdmurray/python/p34/Lib/email/_header_value_parser.py", line 2246, in get_group
        token, value = get_display_name(value)
      File "/home/rdmurray/python/p34/Lib/email/_header_value_parser.py", line 2072, in get_display_name
        token, value = get_phrase(value)
      File "/home/rdmurray/python/p34/Lib/email/_header_value_parser.py", line 1747, in get_phrase
        token, value = get_word(value)
      File "/home/rdmurray/python/p34/Lib/email/_header_value_parser.py", line 1728, in get_word
        token, value = get_atom(value)
      File "/home/rdmurray/python/p34/Lib/email/_header_value_parser.py", line 1645, in get_atom
        token, value = get_encoded_word(value)
      File "/home/rdmurray/python/p34/Lib/email/_header_value_parser.py", line 1421, in get_encoded_word
        text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
      File "/home/rdmurray/python/p34/Lib/email/_encoded_words.py", line 166, in decode
        bstring, defects = _cte_decoders[cte](bstring)
      File "/home/rdmurray/python/p34/Lib/email/_encoded_words.py", line 124, in decode_b
        raise AssertionError("unexpected binascii.Error")
    AssertionError: unexpected binascii.Error

    @iritkatriel
    Copy link
    Member

    Reproduced on 3.11:

    >>> from email.header import decode_header
    >>> decode_header('=?iso-8859-1?B?QW5tZWxkdW5nIE5ldHphbnNjaGx1c3MgU_xkcmluZzNwLmpwZw==?=')
    Traceback (most recent call last):
      File "/Users/iritkatriel/src/cpython-1/Lib/email/header.py", line 126, in decode_header
        word = email.base64mime.decode(encoded_string)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/Users/iritkatriel/src/cpython-1/Lib/email/base64mime.py", line 112, in decode
        return a2b_base64(string.encode('raw-unicode-escape'))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    binascii.Error: Invalid base64-encoded string: number of data characters (49) cannot be 1 more than a multiple of 4
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/iritkatriel/src/cpython-1/Lib/email/header.py", line 128, in decode_header
        raise HeaderParseError('Base64 decoding error')
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    email.errors.HeaderParseError: Base64 decoding error

    @iritkatriel iritkatriel added 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes type-bug An unexpected behavior, bug, or error labels Dec 13, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir topic-email type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants