Patch for potential buffer overrun in tokenizer.c #41437

glchapman · 2005-01-13T15:45:20Z

BPO	1101726
Nosy	@loewis, @doerwalter
Files	tokenizer.c.3.diff: patch without utf8 manipulation pep263tests.zip: test files test_pep263.py.diff tokenizer.c.4.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/loewis'
closed_at = <Date 2005-07-12.21:59:22.000>
created_at = <Date 2005-01-13.15:45:20.000>
labels = ['interpreter-core']
title = 'Patch for potential buffer overrun in tokenizer.c'
updated_at = <Date 2005-07-12.21:59:22.000>
user = 'https://bugs.python.org/glchapman'

bugs.python.org fields:

activity = <Date 2005-07-12.21:59:22.000>
actor = 'doerwalter'
assignee = 'loewis'
closed = True
closed_date = None
closer = None
components = ['Interpreter Core']
creation = <Date 2005-01-13.15:45:20.000>
creator = 'glchapman'
dependencies = []
files = ['6431', '6432', '6433', '6434']
hgrepos = []
issue_num = 1101726
keywords = ['patch']
message_count = 9.0
messages = ['47529', '47530', '47531', '47532', '47533', '47534', '47535', '47536', '47537']
nosy_count = 3.0
nosy_names = ['loewis', 'doerwalter', 'glchapman']
pr_nums = []
priority = 'high'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue1101726'
versions = ['Python 2.4']

glchapman · 2005-01-13T15:45:20Z

The fp_readl function in tokenizer.c has a potential
buffer overrun; see:

www.python.org/sf/1089395

It is also triggered by trying to generate the pycom
file for the Excel 11.0 typelib, which is where I ran
into it.

The attached patch allows successful generation of the
Excel file; it also runs the fail.py script from the
above report without an access violation. It also
doesn't break any of the tests in the regression suite
(probably something like fail.py should be added as a
test).

It is not as efficient as it might be; with a function
for determining the number of unicode characters in a
utf8 string, you could avoid some memory allocations.
Perhaps such a function should be added to unicodeobject.c?

And, of course, the patch definitely needs review. I'm
especially concerned that my use of
tok->decoding_buffer might be violating some kind of
assumption that I missed.

glchapman · 2005-01-23T14:26:40Z

Logged In: YES
user_id=86307

I'm attaching a new patch (tokenizer.c.2.diff), which should
be better since it avoids some unnecessary allocations and
decoding/encoding. Hopefully, I haven't made any
unwarranted assumptions about UTF8.

doerwalter · 2005-02-23T19:48:13Z

Logged In: YES
user_id=89016

I think the patch looks good. Staring at it for a while I
believe the reference counts are OK. Can you incorporate
fail.py into the test suite?

glchapman · 2005-02-27T15:22:55Z

Logged In: YES
user_id=86307

After thinking about it some more, I realized that fp_readl
doesn’t have to do any utf8 manipulation. If the string it
copies into the buffer does not include a \n, the buffer is
expanded and fp_readl is called again until a \n is returned
(or the end of the file is reached). It doesn’t matter if,
during this loop, the buffer contains temporarily broken
utf8 characters. So I’m attaching another new patch. This
copies as many characters as fit into the buffer and saves
any overflow into a new string object (stored in tok

decoding_buffer). When it is called again by the loop, it
uses this overflow string to copy characters into the buffer.

I’m also attaching a zip file with a modified test_pep263.py
and some ancillary files. To test fp_readl, python needs to
read from a file. Perhaps it would be better to generate
temporary files rather than include these extra files.
Also, although the tests will trigger the assert using a
debug build of the 2.4 source, I’m not getting an access
violation running regrtest against a release build (of 2.4)
– perhaps different tests are in order?

doerwalter · 2005-03-14T21:43:20Z

Logged In: YES
user_id=89016

Maybe I've put the test files in the wrong spot, but what
I'm getting is:

./python Lib/test/test_pep263.py
test_pep263 (main.PEP263Test) ... ok
test_evilencoding (main.EvilEncodingTest) ... ok
Segmentation fault

You should probably put pep263_evilencoding.py and
pep263_longline.py alongside test_pep263.py and then find
then via os.path.join(file.rsplit(os.sep, 1)[0],
"pep263_longline.py")

glchapman · 2005-03-21T13:47:01Z

Logged In: YES
user_id=86307

You're quite right, 2.4 AV's reliably using the new
test_pep263. The copy of python24.dll I was testing against
already had (the first version of) the patch applied.
Anyway, I'm attaching a diff for test_pep263 (against the
original 2.4 test_pep263) which incorporates your
suggestions about not assuming the current directory.

doerwalter · 2005-05-16T14:11:04Z

Logged In: YES
user_id=89016

Here is another version of the patch, this version doesn't
pass any arguments to readline(), because it's no longer
neccessary to limit the line length. What do you think?

glchapman · 2005-05-17T13:47:10Z

Logged In: YES
user_id=86307

Looks good to me.

doerwalter · 2005-07-12T21:59:22Z

Logged In: YES
user_id=89016

Checked in as:
Parser/tokenizer.c 2.78
Parser/tokenizer.c 2.76.2.1

glchapman mannequin closed this as completed Jan 13, 2005

glchapman mannequin assigned loewis Jan 13, 2005

glchapman mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jan 13, 2005

ezio-melotti transferred this issue from another repository Apr 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch for potential buffer overrun in tokenizer.c #41437

Patch for potential buffer overrun in tokenizer.c #41437

glchapman mannequin commented Jan 13, 2005

glchapman mannequin commented Jan 13, 2005

glchapman mannequin commented Jan 23, 2005

doerwalter commented Feb 23, 2005

glchapman mannequin commented Feb 27, 2005

doerwalter commented Mar 14, 2005

glchapman mannequin commented Mar 21, 2005

doerwalter commented May 16, 2005

glchapman mannequin commented May 17, 2005

doerwalter commented Jul 12, 2005

Patch for potential buffer overrun in tokenizer.c #41437

Patch for potential buffer overrun in tokenizer.c #41437

Comments

glchapman mannequin commented Jan 13, 2005

glchapman mannequin commented Jan 13, 2005

glchapman mannequin commented Jan 23, 2005

doerwalter commented Feb 23, 2005

glchapman mannequin commented Feb 27, 2005

doerwalter commented Mar 14, 2005

glchapman mannequin commented Mar 21, 2005

doerwalter commented May 16, 2005

glchapman mannequin commented May 17, 2005

doerwalter commented Jul 12, 2005