Parsing a simple script eats all of your memory #45475

complex · 2007-09-09T00:35:46Z

BPO	1134
Nosy	@gvanrossum, @brettcannon, @amauryfa, @tiran, @nascheme
Files	hungry_script.py py3k-hungry-script.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/amauryfa'
closed_at = <Date 2007-11-15.23:21:36.941>
created_at = <Date 2007-09-09.00:35:45.583>
labels = ['interpreter-core', 'type-crash']
title = 'Parsing a simple script eats all of your memory'
updated_at = <Date 2008-01-06.22:29:45.580>
user = 'https://bugs.python.org/complex'

bugs.python.org fields:

activity = <Date 2008-01-06.22:29:45.580>
actor = 'admin'
assignee = 'amaury.forgeotdarc'
closed = True
closed_date = <Date 2007-11-15.23:21:36.941>
closer = 'amaury.forgeotdarc'
components = ['Interpreter Core']
creation = <Date 2007-09-09.00:35:45.583>
creator = 'complex'
dependencies = []
files = ['8405', '8446']
hgrepos = []
issue_num = 1134
keywords = ['patch']
message_count = 20.0
messages = ['55758', '55759', '55760', '55761', '55768', '55773', '55932', '55934', '55943', '55995', '56083', '57475', '57477', '57478', '57480', '57483', '57484', '57486', '57487', '57574']
nosy_count = 11.0
nosy_names = ['gvanrossum', 'nnorwitz', 'brett.cannon', 'complex', 'jafo', 'amaury.forgeotdarc', 'alanmcintyre', 'christian.heimes', 'pythonmeister', 'alexeychen', 'nas']
pr_nums = []
priority = 'critical'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue1134'
versions = ['Python 3.0']

complex · 2007-09-09T00:35:45Z

Read the WARNING below, then run the attached script with Python3.0a2.
It will eat all of your memory.

WARNING: Keep a process killing tool or an extra command line at your
fingertips, since this script could render your machine unusable in
about 10-20 seconds depending on your memory and CPU speed!!! YOU ARE
WARNED!

OS: Ubuntu Feisty, up-to-date
Python: Python3.0a1, built from sources,
configured with: --prefix=/usr/local

complex · 2007-09-09T00:45:40Z

Confirmed on Windows:

OS: Windows XP SP2 ENG
Python: Python3.0a1 MSI installer, default installation

complex · 2007-09-09T00:50:01Z

Works fine (does nothing) with Python 2.4.4 and Python 2.5.1 under Windows, so this bug must be caused by some new code in Python3.0a1. The bug depends on the contents of the doc string. There's another strange behavior if you write the word "this" in the docstring somewhere. The docstring could be parsed as source code somehow and causes strange things to the new parser.

complex · 2007-09-09T00:58:00Z

Errata: In the first line of my original post I mean Python3.0a1 and not
3.0a2, certainly.

alanmcintyre · 2007-09-09T21:56:26Z

Confirmed that this happens on Mac OS X with a fresh build of py3k from svn.

pythonmeister · 2007-09-10T03:05:10Z

Same under Linux with Python 3.0a1.
Eats all cpu + memory

alexeychen · 2007-09-15T22:22:09Z

--- tokenizer.c	(revision 58161)
+++ tokenizer.c	(working copy)
@@ -402,6 +402,8 @@
 	if (allocated) {
 		Py_DECREF(bufobj);
 	}
+  Py_XDECREF(tok->decoding_buffer);
+  tok->decoding_buffer = 0;
 	return s;

brettcannon · 2007-09-16T00:17:02Z

Note the patch is inlined in a message.

alexeychen · 2007-09-16T15:11:23Z

Oops, i see there are two bugs. Previously i have fixed multiline
strings only.

I think it will be:

Index: tokenizer.c
===================================================================

--- tokenizer.c	(revision 58161)
+++ tokenizer.c	(working copy)
@@ -395,6 +395,7 @@
 			goto error;
 		buflen = size;
 	}
+
 	memcpy(s, buf, buflen);
 	s[buflen] = '\0';
 	if (buflen == 0) /* EOF */
@@ -402,6 +403,12 @@
 	if (allocated) {
 		Py_DECREF(bufobj);
 	}
+
+  if ( bufobj == tok->decoding_buffer ){
+    Py_XDECREF(tok->decoding_buffer);
+    tok->decoding_buffer = 0;
+  }
+
 	return s;
 
 error:

jafo · 2007-09-18T13:11:59Z

Confirmed problem (used 4.5GB before I killed it), and that the second
patch resolved the problem. I'm uploading the inline patch as an
attachment, with the directory name in it as well (from svn diff).

Bumping the priority to high because the side effect can cause all sorts
of problems on a system including other processes being killed.

nascheme · 2007-09-21T21:32:26Z

It looks to me like fp_readl is no longer working as designed and the
patch is not really the right fix. The use of "decoding_buffer" is
tricky and I think the conversion to bytes screwed it up. It might be
clearer to have a separate "decoding_overflow" struct member that's used
for overflow rather than overloading "decoding_buffer".

tiran · 2007-11-14T00:40:56Z

The issue isn't fixed yet. The script is still eating precious memory.

gvanrossum · 2007-11-14T00:54:59Z

Amaury, can you have a look at this? I think it's a bug in tok_nextc()
in tokenizer.c.

complex · 2007-11-14T02:36:55Z

This bug prevents me and many others to do preliminary testing on Py3k,
which slows down it's development. This bug is _really_ hurts. I've a
completely developed new module for Py3k that cannot be released due to
this bug, since it's unit tests are affected by this bug and would crash
the user's machine.

Sadly I've not enough free time and readily available in-depth knowledge
to fix this, especially after the first attempt was not perfect, which
shows that it may be a bug that cannot be fixed by correcting a typo
somewhere... :-)

tiran · 2007-11-14T03:14:24Z

I've already raised the priority to draw more attention to this bug.

So far I'm not able to solve the bug but I've nailed down the issue to a
short test case:

HANGS:
# -- coding: ascii --
"""
"""

The problem manifests itself only in the combination of the ascii
encoding and triple quotes across two or more line. Neither a different
encoding nor a string across a single line has the same problem

WORKS:
# -- coding: ascii --
""" """

WORKS:
# -- coding: latin.1 --
"""
"""

WORKS:
# -- coding: ascii --
""" """

DOESN'T COMPILE:
# -- coding: ascii --
"\
"
File "hungry_script2.py", line 5
SyntaxError: EOL while scanning single-quoted string

The latest example does compile with Python 2.5. Please note also the
wrong line number. The file has only three (!) lines.

During my debugging session I saw an infinite loop in tokenzize.c:1429

  letter_quote:
	/* String */
	if (c == '\'' || c == '"') {
		...
		for (;;) {
			INFINITE LOOP
		}

gvanrossum · 2007-11-14T05:04:18Z

Is this also broken in the 3.0a1 release? If not, it might be useful
to try to find the most recent rev where it's not broken.

gvanrossum · 2007-11-14T05:05:17Z

Is this also broken in the 3.0a1 release? If not, it might be useful
to try to find the most recent rev where it's not broken.

amauryfa · 2007-11-14T10:27:22Z

fp_readl is indeed broken in several ways:

decoding_buffer should be reset to NULL when all data has been read
(buflen <= size).
the (buflen > size) case will cause an error on the next pass, since
the function cannot handle PyBytesObject.

IOW, the function is always wrong ;-)

I have a correction ready (jafo's patch already addresses the first
case), but cannot access svn here. I will try to provide a patch + test
cases later tonight.

complex · 2007-11-14T12:40:49Z

In response to Guido:

According to pythonmeister's post (2007-09-10):

"Same under Linux with Python 3.0a1.
Eats all cpu + memory"

I found the bug with this version:

fviktor@rigel:~$ python3.0 --version
Python 3.0a1

AFAIK it is the latest alpha released.
I did not try the SVN trunk, but may be
buggy with high probability, since this
issue has not been closed yet.

Viktor (complex)

amauryfa · 2007-11-15T23:21:37Z

Corrected in revision 59001, with a modified patch.

complex mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Sep 9, 2007

jafo mannequin assigned nnorwitz Sep 18, 2007

gvanrossum assigned amauryfa and unassigned nnorwitz Nov 14, 2007

amauryfa closed this as completed Nov 15, 2007

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing a simple script eats all of your memory #45475

Parsing a simple script eats all of your memory #45475

complex mannequin commented Sep 9, 2007

complex mannequin commented Sep 9, 2007

complex mannequin commented Sep 9, 2007

complex mannequin commented Sep 9, 2007

complex mannequin commented Sep 9, 2007

alanmcintyre mannequin commented Sep 9, 2007

pythonmeister mannequin commented Sep 10, 2007

alexeychen mannequin commented Sep 15, 2007

brettcannon commented Sep 16, 2007

alexeychen mannequin commented Sep 16, 2007

jafo mannequin commented Sep 18, 2007

nascheme commented Sep 21, 2007

tiran commented Nov 14, 2007

gvanrossum commented Nov 14, 2007

complex mannequin commented Nov 14, 2007

tiran commented Nov 14, 2007

gvanrossum commented Nov 14, 2007

gvanrossum commented Nov 14, 2007

amauryfa commented Nov 14, 2007

complex mannequin commented Nov 14, 2007

amauryfa commented Nov 15, 2007

Parsing a simple script eats all of your memory #45475

Parsing a simple script eats all of your memory #45475

Comments

complex mannequin commented Sep 9, 2007

complex mannequin commented Sep 9, 2007

complex mannequin commented Sep 9, 2007

complex mannequin commented Sep 9, 2007

complex mannequin commented Sep 9, 2007

alanmcintyre mannequin commented Sep 9, 2007

pythonmeister mannequin commented Sep 10, 2007

alexeychen mannequin commented Sep 15, 2007

brettcannon commented Sep 16, 2007

alexeychen mannequin commented Sep 16, 2007

jafo mannequin commented Sep 18, 2007

nascheme commented Sep 21, 2007

tiran commented Nov 14, 2007

gvanrossum commented Nov 14, 2007

complex mannequin commented Nov 14, 2007

tiran commented Nov 14, 2007

gvanrossum commented Nov 14, 2007

gvanrossum commented Nov 14, 2007

amauryfa commented Nov 14, 2007

complex mannequin commented Nov 14, 2007

amauryfa commented Nov 15, 2007