Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in UTF-7 incremental decoder #64737

Closed
serhiy-storchaka opened this issue Feb 7, 2014 · 10 comments
Closed

Segfault in UTF-7 incremental decoder #64737

serhiy-storchaka opened this issue Feb 7, 2014 · 10 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@serhiy-storchaka
Copy link
Member

BPO 20538
Nosy @birkenfeld, @ncoghlan, @vstinner, @larryhastings, @ezio-melotti, @serhiy-storchaka
Files
  • issue20538-3.3.patch
  • issue20538-3.4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-02-09.09:14:58.506>
    created_at = <Date 2014-02-07.09:32:08.696>
    labels = ['interpreter-core', 'expert-unicode', 'type-crash', 'release-blocker']
    title = 'Segfault in UTF-7 incremental decoder'
    updated_at = <Date 2014-02-09.09:14:58.505>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2014-02-09.09:14:58.505>
    actor = 'larry'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-02-09.09:14:58.506>
    closer = 'larry'
    components = ['Interpreter Core', 'Unicode']
    creation = <Date 2014-02-07.09:32:08.696>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['33962', '33963']
    hgrepos = []
    issue_num = 20538
    keywords = ['patch', 'buildbot', '3.4regression']
    message_count = 10.0
    messages = ['210444', '210472', '210503', '210599', '210612', '210619', '210620', '210725', '210726', '210731']
    nosy_count = 7.0
    nosy_names = ['georg.brandl', 'ncoghlan', 'vstinner', 'larry', 'ezio.melotti', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue20538'
    versions = ['Python 3.3', 'Python 3.4']

    @serhiy-storchaka
    Copy link
    Member Author

    UTF-7 incremental decoder can crash in debug build when decodes unfinished base-64 section. In non-debug build it just produces inconsistent unicode string. Minimal examples:

    $ ./python -c "import codecs; codecs.utf_7_decode(b'a+AIA', 'strict')"
    python: Objects/unicodeobject.c:403: _PyUnicode_CheckConsistency: Assertion `maxchar >= 128' failed.
    Aborted (core dumped)
    
    $ ./python -c "import codecs; codecs.utf_7_decode(b'+AIA-+AQA', 'strict')"
    python: Objects/unicodeobject.c:410: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x100' failed.
    Aborted (core dumped)
    
    $ ./python -c "import codecs; codecs.utf_7_decode(b'+AQA-+2ADcAA', 'strict')"
    python: Objects/unicodeobject.c:414: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x10000' failed.
    Aborted (core dumped)

    This happens because _PyUnicodeWriter reverts position back before unfinished base-64 section, but its buffer was already widened for characters in unfinished base-64 section.

            if (inShift) {
                writer.pos = shiftOutStart; /* back off output */
                *consumed = startinpos;
            }

    And now _PyUnicodeWriter generates a string with a kind larger then needed for decoded characters.

    This bug causes a lot of crashes on buildbots. E.g:
    http://buildbot.python.org/all/builders/AMD64%20Snow%20Leop%203.x/builds/1197
    http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%203.3/builds/1446

    @serhiy-storchaka serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump labels Feb 7, 2014
    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Feb 7, 2014

    Note that I added a skip for test_readline in bpo-20542 before realising this bug had already been filed.

    @serhiy-storchaka
    Copy link
    Member Author

    Here are patches for 3.3 and 3.4 (this is 3.3+ only bug).

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Feb 8, 2014

    Patches look good to me.

    @vstinner
    Copy link
    Member

    vstinner commented Feb 8, 2014

    Maybe you can a new truncate operation to unicode writer? As you want.

    The patch looks good to me.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 8, 2014

    New changeset 8d40d9cee409 by Serhiy Storchaka in branch '3.3':
    Issue bpo-20538: UTF-7 incremental decoder produced inconsistant string when
    http://hg.python.org/cpython/rev/8d40d9cee409

    New changeset e988661e458c by Serhiy Storchaka in branch 'default':
    Issue bpo-20538: UTF-7 incremental decoder produced inconsistant string when
    http://hg.python.org/cpython/rev/e988661e458c

    @serhiy-storchaka
    Copy link
    Member Author

    Thanks Nick and Victor for your reviews.

    As far as there is only one place where truncating unicode writer is needed, I don't think this is worth special function.

    @larryhastings
    Copy link
    Contributor

    This checkin appears to be causing a regression in the Windows buildbots.

    http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/4040

    test_streamreaderwriter (test.test_codecs.WithStmtTest) ... test test_codecs failed
    ok

    ======================================================================
    ERROR: test_readline (test.test_codecs.CP65001Test)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_codecs.py", line 157, in test_readline
        self.assertEqual(readalllines("".join(vw), True), "|".join(vw))
      File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_codecs.py", line 136, in readalllines
        line = reader.readline(size=size, keepends=keepends)
      File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\codecs.py", line 548, in readline
        data = self.read(readsize, firstline=True)
      File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\codecs.py", line 494, in read
        newchars, decodedbytes = self.decode(data, self.errors)
    UnicodeDecodeError: 'CP_UTF8' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page.

    Ran 206 tests in 5.912s

    @larryhastings larryhastings reopened this Feb 9, 2014
    @larryhastings
    Copy link
    Contributor

    And to be clear: I'm currently waiting on this before tagging 3.4rc1. If someone who understands the issue could fix this soon, I would appreciate it.

    @larryhastings
    Copy link
    Contributor

    Marking as closed and opening a new issue as per Serhiy's suggestion.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants