Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

punycode codec raises IndexError in decode_generalized_number() #74751

Closed
VikramHegde mannequin opened this issue Jun 4, 2017 · 9 comments
Closed

punycode codec raises IndexError in decode_generalized_number() #74751

VikramHegde mannequin opened this issue Jun 4, 2017 · 9 comments
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@VikramHegde
Copy link
Mannequin

VikramHegde mannequin commented Jun 4, 2017

BPO 30566
Nosy @bitdancer, @berkerpeksag, @vikhegde, @miss-islington
PRs
  • bpo-30566: Fix IndexError in the punycode codec #1986
  • bpo-30566: Fix IndexError when using punycode codec #18632
  • [3.8] bpo-30566: Fix IndexError when using punycode codec (GH-18632) #18651
  • [3.7] bpo-30566: Fix IndexError when using punycode codec (GH-18632) #18652
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-02-25.04:10:48.128>
    created_at = <Date 2017-06-04.15:49:54.222>
    labels = ['3.7', '3.8', 'type-bug', 'library', '3.9']
    title = 'punycode codec raises IndexError in decode_generalized_number()'
    updated_at = <Date 2020-02-25.04:10:48.125>
    user = 'https://bugs.python.org/VikramHegde'

    bugs.python.org fields:

    activity = <Date 2020-02-25.04:10:48.125>
    actor = 'berker.peksag'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-02-25.04:10:48.128>
    closer = 'berker.peksag'
    components = ['Library (Lib)']
    creation = <Date 2017-06-04.15:49:54.222>
    creator = 'Vikram Hegde'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 30566
    keywords = ['patch']
    message_count = 9.0
    messages = ['295127', '295149', '295270', '295278', '302070', '362613', '362617', '362618', '362623']
    nosy_count = 5.0
    nosy_names = ['r.david.murray', 'berker.peksag', 'Vikram Hegde', 'vikhegde', 'miss-islington']
    pr_nums = ['1986', '18632', '18651', '18652']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue30566'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @VikramHegde
    Copy link
    Mannequin Author

    VikramHegde mannequin commented Jun 4, 2017

    Here is the relevant code snippet from decode_generalized_number() in punycode.py

    try:
            char = ord(extended[extpos])
        except IndexError:
            if errors == "strict":
                raise UnicodeError("incomplete punicode string")
            return extpos + 1, None
        extpos += 1
        if 0x41 <= char <= 0x5A: # A-Z
            digit = char - 0x41
        elif 0x30 <= char <= 0x39:
            digit = char - 22 # 0x30-26
        elif errors == "strict":
            raise UnicodeError("Invalid extended code point '%s'"
                               % extended[extpos])
    

    While raising the UnicodeError() in the last line above, it accesses extended[extpos]. However extpos was incremented by 1 a few lines above that. This causes two errors:

    1. The UnicodeError() prints the wrong character (the one after the character we want)
    2. If the previous extpos was the last character in the string, then attempting to print character at extpos+1 will raise an IndexError.

    @VikramHegde VikramHegde mannequin added type-crash A hard crash of the interpreter, possibly with a core dump stdlib Python modules in the Lib dir labels Jun 4, 2017
    @bitdancer
    Copy link
    Member

    Can you provide a reproducer, please?

    @VikramHegde
    Copy link
    Mannequin Author

    VikramHegde mannequin commented Jun 6, 2017

    I have a patch for this problem but my contributor agreement has not been accepted yet, so I can't do a pull request.

    Use the python package tldextract to trigger the bug. Here is a sample

    Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 12:22:00) 
    [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import tldextract
    >>> tldextract.extract("xn--w&")
    Traceback (most recent call last):
      File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/encodings/punycode.py", line 207, in decode
        res = punycode_decode(input, errors)
      File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/encodings/punycode.py", line 194, in punycode_decode
        return insertion_sort(base, extended, errors)
      File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/encodings/punycode.py", line 165, in insertion_sort
        bias, errors)
      File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/encodings/punycode.py", line 146, in decode_generalized_number
        % extended[extpos])
    IndexError: string index out of range
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/tldextract/tldextract.py", line 358, in extract
        return TLD_EXTRACTOR(url)
      File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/tldextract/tldextract.py", line 237, in __call__
        translations = [decode_punycode(label).lower() for label in labels]
      File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/tldextract/tldextract.py", line 237, in <listcomp>
        translations = [decode_punycode(label).lower() for label in labels]
      File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/tldextract/tldextract.py", line 232, in decode_punycode
        return idna.decode(label.encode('ascii'))
      File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/idna/core.py", line 384, in decode
        result.append(ulabel(label))
      File "/home/vikram-work/anaconda3/envs/pefeatextract-debug/lib/python3.6/site-packages/idna/core.py", line 302, in ulabel
        label = label.decode('punycode')
    IndexError: decoding with 'punycode' codec failed (IndexError: string index out of range)
    >>>

    @bitdancer
    Copy link
    Member

    You don't need an eternal package, just decoding b'xn--w&' with punycode will produce the traceback.

    @vikhegde
    Copy link
    Mannequin

    vikhegde mannequin commented Sep 13, 2017

    Could someone please review my PR. It has been in the pending state for over three months.

    @serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Jul 11, 2018
    @berkerpeksag
    Copy link
    Member

    New changeset ba22e8f by Berker Peksag in branch 'master':
    bpo-30566: Fix IndexError when using punycode codec (GH-18632)
    ba22e8f

    @berkerpeksag
    Copy link
    Member

    New changeset daef21c by Miss Islington (bot) in branch '3.8':
    bpo-30566: Fix IndexError when using punycode codec (GH-18632)
    daef21c

    @berkerpeksag
    Copy link
    Member

    New changeset 55be9a6 by Miss Islington (bot) in branch '3.7':
    bpo-30566: Fix IndexError when using punycode codec (GH-18632)
    55be9a6

    @berkerpeksag
    Copy link
    Member

    Thanks for the report and for the initial patch!

    @berkerpeksag berkerpeksag added 3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes labels Feb 25, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants