Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing a simple script eats all of your memory #45475

Closed
complex mannequin opened this issue Sep 9, 2007 · 20 comments
Closed

Parsing a simple script eats all of your memory #45475

complex mannequin opened this issue Sep 9, 2007 · 20 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@complex
Copy link
Mannequin

complex mannequin commented Sep 9, 2007

BPO 1134
Nosy @gvanrossum, @brettcannon, @amauryfa, @tiran, @nascheme
Files
  • hungry_script.py
  • py3k-hungry-script.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/amauryfa'
    closed_at = <Date 2007-11-15.23:21:36.941>
    created_at = <Date 2007-09-09.00:35:45.583>
    labels = ['interpreter-core', 'type-crash']
    title = 'Parsing a simple script eats all of your memory'
    updated_at = <Date 2008-01-06.22:29:45.580>
    user = 'https://bugs.python.org/complex'

    bugs.python.org fields:

    activity = <Date 2008-01-06.22:29:45.580>
    actor = 'admin'
    assignee = 'amaury.forgeotdarc'
    closed = True
    closed_date = <Date 2007-11-15.23:21:36.941>
    closer = 'amaury.forgeotdarc'
    components = ['Interpreter Core']
    creation = <Date 2007-09-09.00:35:45.583>
    creator = 'complex'
    dependencies = []
    files = ['8405', '8446']
    hgrepos = []
    issue_num = 1134
    keywords = ['patch']
    message_count = 20.0
    messages = ['55758', '55759', '55760', '55761', '55768', '55773', '55932', '55934', '55943', '55995', '56083', '57475', '57477', '57478', '57480', '57483', '57484', '57486', '57487', '57574']
    nosy_count = 11.0
    nosy_names = ['gvanrossum', 'nnorwitz', 'brett.cannon', 'complex', 'jafo', 'amaury.forgeotdarc', 'alanmcintyre', 'christian.heimes', 'pythonmeister', 'alexeychen', 'nas']
    pr_nums = []
    priority = 'critical'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue1134'
    versions = ['Python 3.0']

    @complex
    Copy link
    Mannequin Author

    complex mannequin commented Sep 9, 2007

    Read the WARNING below, then run the attached script with Python3.0a2.
    It will eat all of your memory.

    WARNING: Keep a process killing tool or an extra command line at your
    fingertips, since this script could render your machine unusable in
    about 10-20 seconds depending on your memory and CPU speed!!! YOU ARE
    WARNED!

    OS: Ubuntu Feisty, up-to-date
    Python: Python3.0a1, built from sources,
    configured with: --prefix=/usr/local

    @complex complex mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Sep 9, 2007
    @complex
    Copy link
    Mannequin Author

    complex mannequin commented Sep 9, 2007

    Confirmed on Windows:

    OS: Windows XP SP2 ENG
    Python: Python3.0a1 MSI installer, default installation

    @complex
    Copy link
    Mannequin Author

    complex mannequin commented Sep 9, 2007

    Works fine (does nothing) with Python 2.4.4 and Python 2.5.1 under Windows, so this bug must be caused by some new code in Python3.0a1. The bug depends on the contents of the doc string. There's another strange behavior if you write the word "this" in the docstring somewhere. The docstring could be parsed as source code somehow and causes strange things to the new parser.

    @complex
    Copy link
    Mannequin Author

    complex mannequin commented Sep 9, 2007

    Errata: In the first line of my original post I mean Python3.0a1 and not
    3.0a2, certainly.

    @alanmcintyre
    Copy link
    Mannequin

    alanmcintyre mannequin commented Sep 9, 2007

    Confirmed that this happens on Mac OS X with a fresh build of py3k from svn.

    @pythonmeister
    Copy link
    Mannequin

    pythonmeister mannequin commented Sep 10, 2007

    Same under Linux with Python 3.0a1.
    Eats all cpu + memory

    @alexeychen
    Copy link
    Mannequin

    alexeychen mannequin commented Sep 15, 2007

    --- tokenizer.c	(revision 58161)
    +++ tokenizer.c	(working copy)
    @@ -402,6 +402,8 @@
     	if (allocated) {
     		Py_DECREF(bufobj);
     	}
    +  Py_XDECREF(tok->decoding_buffer);
    +  tok->decoding_buffer = 0;
     	return s;

    @brettcannon
    Copy link
    Member

    Note the patch is inlined in a message.

    @alexeychen
    Copy link
    Mannequin

    alexeychen mannequin commented Sep 16, 2007

    Oops, i see there are two bugs. Previously i have fixed multiline
    strings only.

    I think it will be:

    Index: tokenizer.c
    ===================================================================

    --- tokenizer.c	(revision 58161)
    +++ tokenizer.c	(working copy)
    @@ -395,6 +395,7 @@
     			goto error;
     		buflen = size;
     	}
    +
     	memcpy(s, buf, buflen);
     	s[buflen] = '\0';
     	if (buflen == 0) /* EOF */
    @@ -402,6 +403,12 @@
     	if (allocated) {
     		Py_DECREF(bufobj);
     	}
    +
    +  if ( bufobj == tok->decoding_buffer ){
    +    Py_XDECREF(tok->decoding_buffer);
    +    tok->decoding_buffer = 0;
    +  }
    +
     	return s;
     
     error:

    @jafo
    Copy link
    Mannequin

    jafo mannequin commented Sep 18, 2007

    Confirmed problem (used 4.5GB before I killed it), and that the second
    patch resolved the problem. I'm uploading the inline patch as an
    attachment, with the directory name in it as well (from svn diff).

    Bumping the priority to high because the side effect can cause all sorts
    of problems on a system including other processes being killed.

    @jafo jafo mannequin assigned nnorwitz Sep 18, 2007
    @nascheme
    Copy link
    Member

    It looks to me like fp_readl is no longer working as designed and the
    patch is not really the right fix. The use of "decoding_buffer" is
    tricky and I think the conversion to bytes screwed it up. It might be
    clearer to have a separate "decoding_overflow" struct member that's used
    for overflow rather than overloading "decoding_buffer".

    @tiran
    Copy link
    Member

    tiran commented Nov 14, 2007

    The issue isn't fixed yet. The script is still eating precious memory.

    @gvanrossum
    Copy link
    Member

    Amaury, can you have a look at this? I think it's a bug in tok_nextc()
    in tokenizer.c.

    @gvanrossum gvanrossum assigned amauryfa and unassigned nnorwitz Nov 14, 2007
    @complex
    Copy link
    Mannequin Author

    complex mannequin commented Nov 14, 2007

    This bug prevents me and many others to do preliminary testing on Py3k,
    which slows down it's development. This bug is _really_ hurts. I've a
    completely developed new module for Py3k that cannot be released due to
    this bug, since it's unit tests are affected by this bug and would crash
    the user's machine.

    Sadly I've not enough free time and readily available in-depth knowledge
    to fix this, especially after the first attempt was not perfect, which
    shows that it may be a bug that cannot be fixed by correcting a typo
    somewhere... :-)

    @tiran
    Copy link
    Member

    tiran commented Nov 14, 2007

    I've already raised the priority to draw more attention to this bug.

    So far I'm not able to solve the bug but I've nailed down the issue to a
    short test case:

    HANGS:
    # -- coding: ascii --
    """
    """

    The problem manifests itself only in the combination of the ascii
    encoding and triple quotes across two or more line. Neither a different
    encoding nor a string across a single line has the same problem

    WORKS:
    # -- coding: ascii --
    """ """

    WORKS:
    # -- coding: latin.1 --
    """
    """

    WORKS:
    # -- coding: ascii --
    """ """

    DOESN'T COMPILE:
    # -- coding: ascii --
    "\
    "
    File "hungry_script2.py", line 5
    SyntaxError: EOL while scanning single-quoted string

    The latest example does compile with Python 2.5. Please note also the
    wrong line number. The file has only three (!) lines.

    During my debugging session I saw an infinite loop in tokenzize.c:1429

      letter_quote:
    	/* String */
    	if (c == '\'' || c == '"') {
    		...
    		for (;;) {
    			INFINITE LOOP
    		}

    @gvanrossum
    Copy link
    Member

    Is this also broken in the 3.0a1 release? If not, it might be useful
    to try to find the most recent rev where it's not broken.

    1 similar comment
    @gvanrossum
    Copy link
    Member

    Is this also broken in the 3.0a1 release? If not, it might be useful
    to try to find the most recent rev where it's not broken.

    @amauryfa
    Copy link
    Member

    fp_readl is indeed broken in several ways:

    • decoding_buffer should be reset to NULL when all data has been read
      (buflen <= size).
    • the (buflen > size) case will cause an error on the next pass, since
      the function cannot handle PyBytesObject.

    IOW, the function is always wrong ;-)

    I have a correction ready (jafo's patch already addresses the first
    case), but cannot access svn here. I will try to provide a patch + test
    cases later tonight.

    @complex
    Copy link
    Mannequin Author

    complex mannequin commented Nov 14, 2007

    In response to Guido:

    According to pythonmeister's post (2007-09-10):

    "Same under Linux with Python 3.0a1.
    Eats all cpu + memory"

    I found the bug with this version:

    fviktor@rigel:~$ python3.0 --version
    Python 3.0a1

    AFAIK it is the latest alpha released.
    I did not try the SVN trunk, but may be
    buggy with high probability, since this
    issue has not been closed yet.

    Viktor (complex)

    @amauryfa
    Copy link
    Member

    Corrected in revision 59001, with a modified patch.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants