Segfault in UTF-7 incremental decoder #64737

serhiy-storchaka · 2014-02-07T09:32:09Z

BPO	20538
Nosy	@birkenfeld, @ncoghlan, @vstinner, @larryhastings, @ezio-melotti, @serhiy-storchaka
Files	issue20538-3.3.patch issue20538-3.4.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2014-02-09.09:14:58.506>
created_at = <Date 2014-02-07.09:32:08.696>
labels = ['interpreter-core', 'expert-unicode', 'type-crash', 'release-blocker']
title = 'Segfault in UTF-7 incremental decoder'
updated_at = <Date 2014-02-09.09:14:58.505>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2014-02-09.09:14:58.505>
actor = 'larry'
assignee = 'none'
closed = True
closed_date = <Date 2014-02-09.09:14:58.506>
closer = 'larry'
components = ['Interpreter Core', 'Unicode']
creation = <Date 2014-02-07.09:32:08.696>
creator = 'serhiy.storchaka'
dependencies = []
files = ['33962', '33963']
hgrepos = []
issue_num = 20538
keywords = ['patch', 'buildbot', '3.4regression']
message_count = 10.0
messages = ['210444', '210472', '210503', '210599', '210612', '210619', '210620', '210725', '210726', '210731']
nosy_count = 7.0
nosy_names = ['georg.brandl', 'ncoghlan', 'vstinner', 'larry', 'ezio.melotti', 'python-dev', 'serhiy.storchaka']
pr_nums = []
priority = 'release blocker'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue20538'
versions = ['Python 3.3', 'Python 3.4']

serhiy-storchaka · 2014-02-07T09:32:08Z

UTF-7 incremental decoder can crash in debug build when decodes unfinished base-64 section. In non-debug build it just produces inconsistent unicode string. Minimal examples:

$ ./python -c "import codecs; codecs.utf_7_decode(b'a+AIA', 'strict')"
python: Objects/unicodeobject.c:403: _PyUnicode_CheckConsistency: Assertion `maxchar >= 128' failed.
Aborted (core dumped)

$ ./python -c "import codecs; codecs.utf_7_decode(b'+AIA-+AQA', 'strict')"
python: Objects/unicodeobject.c:410: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x100' failed.
Aborted (core dumped)

$ ./python -c "import codecs; codecs.utf_7_decode(b'+AQA-+2ADcAA', 'strict')"
python: Objects/unicodeobject.c:414: _PyUnicode_CheckConsistency: Assertion `maxchar >= 0x10000' failed.
Aborted (core dumped)

This happens because _PyUnicodeWriter reverts position back before unfinished base-64 section, but its buffer was already widened for characters in unfinished base-64 section.

        if (inShift) {
            writer.pos = shiftOutStart; /* back off output */
            *consumed = startinpos;
        }

And now _PyUnicodeWriter generates a string with a kind larger then needed for decoded characters.

This bug causes a lot of crashes on buildbots. E.g:
http://buildbot.python.org/all/builders/AMD64%20Snow%20Leop%203.x/builds/1197
http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%203.3/builds/1446

ncoghlan · 2014-02-07T14:14:14Z

Note that I added a skip for test_readline in bpo-20542 before realising this bug had already been filed.

serhiy-storchaka · 2014-02-07T17:49:30Z

Here are patches for 3.3 and 3.4 (this is 3.3+ only bug).

ncoghlan · 2014-02-08T09:37:25Z

Patches look good to me.

vstinner · 2014-02-08T10:39:27Z

Maybe you can a new truncate operation to unicode writer? As you want.

The patch looks good to me.

python-dev · 2014-02-08T12:09:11Z

New changeset 8d40d9cee409 by Serhiy Storchaka in branch '3.3':
Issue bpo-20538: UTF-7 incremental decoder produced inconsistant string when
http://hg.python.org/cpython/rev/8d40d9cee409

New changeset e988661e458c by Serhiy Storchaka in branch 'default':
Issue bpo-20538: UTF-7 incremental decoder produced inconsistant string when
http://hg.python.org/cpython/rev/e988661e458c

serhiy-storchaka · 2014-02-08T12:13:36Z

Thanks Nick and Victor for your reviews.

As far as there is only one place where truncating unicode writer is needed, I don't think this is worth special function.

larryhastings · 2014-02-09T06:34:12Z

This checkin appears to be causing a regression in the Windows buildbots.

http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/4040

test_streamreaderwriter (test.test_codecs.WithStmtTest) ... test test_codecs failed
ok

======================================================================
ERROR: test_readline (test.test_codecs.CP65001Test)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_codecs.py", line 157, in test_readline
    self.assertEqual(readalllines("".join(vw), True), "|".join(vw))
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_codecs.py", line 136, in readalllines
    line = reader.readline(size=size, keepends=keepends)
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\codecs.py", line 548, in readline
    data = self.read(readsize, firstline=True)
  File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\codecs.py", line 494, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'CP_UTF8' codec can't decode bytes in position 0--1: No mapping for the Unicode character exists in the target code page.

Ran 206 tests in 5.912s

larryhastings · 2014-02-09T06:58:06Z

And to be clear: I'm currently waiting on this before tagging 3.4rc1. If someone who understands the issue could fix this soon, I would appreciate it.

larryhastings · 2014-02-09T09:14:58Z

Marking as closed and opening a new issue as per Serhiy's suggestion.

serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump labels Feb 7, 2014

ncoghlan added the release-blocker label Feb 7, 2014

serhiy-storchaka closed this as completed Feb 8, 2014

larryhastings reopened this Feb 9, 2014

larryhastings closed this as completed Feb 9, 2014

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault in UTF-7 incremental decoder #64737

Segfault in UTF-7 incremental decoder #64737

serhiy-storchaka commented Feb 7, 2014

serhiy-storchaka commented Feb 7, 2014

ncoghlan commented Feb 7, 2014

serhiy-storchaka commented Feb 7, 2014

ncoghlan commented Feb 8, 2014

vstinner commented Feb 8, 2014

python-dev mannequin commented Feb 8, 2014

serhiy-storchaka commented Feb 8, 2014

larryhastings commented Feb 9, 2014

larryhastings commented Feb 9, 2014

larryhastings commented Feb 9, 2014

Segfault in UTF-7 incremental decoder #64737

Segfault in UTF-7 incremental decoder #64737

Comments

serhiy-storchaka commented Feb 7, 2014

serhiy-storchaka commented Feb 7, 2014

ncoghlan commented Feb 7, 2014

serhiy-storchaka commented Feb 7, 2014

ncoghlan commented Feb 8, 2014

vstinner commented Feb 8, 2014

python-dev mannequin commented Feb 8, 2014

serhiy-storchaka commented Feb 8, 2014

larryhastings commented Feb 9, 2014

larryhastings commented Feb 9, 2014

larryhastings commented Feb 9, 2014