Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in PyErr_NormalizeException() after memory exhaustion #74882

Closed
xdegaye mannequin opened this issue Jun 18, 2017 · 16 comments
Closed

segfault in PyErr_NormalizeException() after memory exhaustion #74882

xdegaye mannequin opened this issue Jun 18, 2017 · 16 comments
Labels
3.7 interpreter-core type-crash

Comments

@xdegaye
Copy link
Mannequin

@xdegaye xdegaye mannequin commented Jun 18, 2017

BPO 30697
Nosy @brettcannon, @pitrou, @vstinner, @ned-deily, @xdegaye, @serhiy-storchaka, @asottile
PRs
  • #2327
  • #4135
  • #4941
  • Files
  • memerr.py
  • except_raises_except.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-03-14.06:07:50.620>
    created_at = <Date 2017-06-18.13:29:41.195>
    labels = ['interpreter-core', '3.7', 'type-crash']
    title = 'segfault in PyErr_NormalizeException() after memory exhaustion'
    updated_at = <Date 2018-03-14.06:07:50.619>
    user = 'https://github.com/xdegaye'

    bugs.python.org fields:

    activity = <Date 2018-03-14.06:07:50.619>
    actor = 'ned.deily'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-03-14.06:07:50.620>
    closer = 'ned.deily'
    components = ['Interpreter Core']
    creation = <Date 2017-06-18.13:29:41.195>
    creator = 'xdegaye'
    dependencies = []
    files = ['46960', '47239']
    hgrepos = []
    issue_num = 30697
    keywords = ['patch']
    message_count = 16.0
    messages = ['296272', '296274', '296358', '296385', '296462', '296639', '305048', '305049', '305066', '308672', '308686', '308736', '311023', '311193', '311303', '313808']
    nosy_count = 7.0
    nosy_names = ['brett.cannon', 'pitrou', 'vstinner', 'ned.deily', 'xdegaye', 'serhiy.storchaka', 'Anthony Sottile']
    pr_nums = ['2327', '4135', '4941']
    priority = None
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue30697'
    versions = ['Python 3.6', 'Python 3.7']

    @xdegaye
    Copy link
    Mannequin Author

    @xdegaye xdegaye mannequin commented Jun 18, 2017

    Nosying reviewers of PR 1981 of bpo-22898.

    The memerr.py script segfaults with the following gdb backtrace:

    #0  0x0000000000550268 in PyErr_NormalizeException (exc=exc@entry=0x7fffffffdee8, 
        val=val@entry=0x7fffffffdef0, tb=tb@entry=0x7fffffffdef8) at Python/errors.c:315
    #1  0x000000000055045f in PyErr_NormalizeException (exc=exc@entry=0x7fffffffdee8, 
        val=val@entry=0x7fffffffdef0, tb=tb@entry=0x7fffffffdef8) at Python/errors.c:319
    #2  0x000000000055045f in PyErr_NormalizeException (exc=exc@entry=0x7fffffffdee8, 
        val=val@entry=0x7fffffffdef0, tb=tb@entry=0x7fffffffdef8) at Python/errors.c:319
    #3  0x000000000055045f in PyErr_NormalizeException (exc=exc@entry=0x7fffffffdee8, 
        val=val@entry=0x7fffffffdef0, tb=tb@entry=0x7fffffffdef8) at Python/errors.c:319
    ...

    To be able to run this patch, one needs to apply the nomemory_allocator.patch from bpo-30695 and the infinite_loop.patch from bpo-30696.

    This raises two different problems:

    a) The segfault itself that occurs upon setting the PyExc_RecursionErrorInst singleton. Oh! That had already been pointed out in msg231933 in bpo-22898 at case 4).

    b) When the size of the Python frames stack is greater than the size of the list of preallocated MemoryError instances, this list becomes exhausted and PyErr_NormalizeException() enters an infinite recursion which is stopped:
    * by the PyExc_RecursionErrorInst singleton when a) is fixed
    * by a Fatal Error (abort) when applying PR 1981 in its current state (there is no stack overflow as expected even in the absence of any guard before the recursive call to PyErr_NormalizeException())
    The user is presented in both cases with an error hinting at a recursion problem instead of a problem with memory exhaustion. This is bug b).

    @xdegaye xdegaye mannequin added 3.7 interpreter-core type-crash labels Jun 18, 2017
    @xdegaye
    Copy link
    Mannequin Author

    @xdegaye xdegaye mannequin commented Jun 18, 2017

    Problem b) is IMO a clear demonstration that using tstate->recursion_depth and the PyExc_RecursionErrorInst singleton is not the correct way to control the recursive calls to PyErr_NormalizeException() since the problem here is memory exhaustion, not recursion. One could instead abort with a Fatal Error message printing the type of the last exception, when the depth of the recursivity of PyErr_NormalizeException() exceeds let's say 128 (knowing that anyway the stack is protected in the functions that attempt to instantiate those exceptions). The normalization of an exception that fails with an exception whose normalization fails with an ... and this, 128 times in a row, surely this can be considered as a fatal error, no ?

    PR 2035 eliminates the tail recursive call in PyErr_NormalizeException() and transforms it into a loop. This loop obviously does not involve the stack anymore. This is another argument that shows that tstate->recursion_depth and the PyExc_RecursionErrorInst singleton which are related to the stack should not be used to control the recursivity of PyErr_NormalizeException() or the iterations of this loop.

    @brettcannon
    Copy link
    Member

    @brettcannon brettcannon commented Jun 19, 2017

    So is PR 2035 a fix for this? This discussion on this exact problem seems to have ended up spanning a couple issues and a PR so I'm losing track of where things sit at the moment.

    @xdegaye
    Copy link
    Mannequin Author

    @xdegaye xdegaye mannequin commented Jun 19, 2017

    The two issues you are refering to are the instruments that are needed to reproduce the problem. The reference to PR 2035 is only made here to argue about the question of the correct way to control the successive calls to PyErr_NormalizeException(). This question is relevant here since one of the problems raised by this issue is that in the case of memory exhaustion the user is given a RecursionError as the cause of the problem.

    FWIW PR 2035 transforms the tail recursion in PyErr_NormalizeException() into a loop (as compilers would do during optimization). An infinite recursion becomes then an infinite loop instead. The advantage is that there is no stack overflow. The drawback is that it is an infinite loop.

    @brettcannon
    Copy link
    Member

    @brettcannon brettcannon commented Jun 20, 2017

    And hence why you proposed having a counter of 128 (or some number) to prevent the infinite recursion.

    I think this has gotten sufficiently complicated and into the bowels of CPython itself it might make sense to ask for a reviewer from python-committers (I don't feel like I'm in a good position to dive into this myself).

    @xdegaye
    Copy link
    Mannequin Author

    @xdegaye xdegaye mannequin commented Jun 22, 2017

    PR 2327 lacks the test cases mentionned below for the moment.

    1. With PR 2327, the memerr.py script runs correctly:
    $ ./python /path/to/memerr.py
    Fatal Python error: Cannot recover from MemoryErrors while normalizing exceptions.

    Current thread 0x00007f37eab54fc0 (most recent call first):
    File "/path/to/memerr.py", line 8 in foo
    File "/path/to/memerr.py", line 13 in <module>
    Aborted (core dumped)

    1. With PR 2327, exceeding the recursion limit in PyErr_NormalizeException() raises a RecursionError:
    $ ./python -q
    >>> import _testcapi
    >>> raise _testcapi.RecursingInfinitelyError
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    RecursionError: maximum recursion depth exceeded while normalizing an exception
    >>>

    Note that when the infinite recursion is started by instantiating an exception written in Python code instead, the RecursionError is set by Py_EnterRecursiveCall() instead of by PyErr_NormalizeException().

    1. With PR 2327, the test case in PR 1981 runs correctly (so PR 2327 fixes also bpo-22898):
    $ ./python /path/to/crasher.py    # crasher.py is the code run by test_recursion_normalizing_exception() in PR 1981
    Done.
    Traceback (most recent call last):
      File "/path/to/crasher.py", line 36, in <module>
        recurse(setrecursionlimit(depth + 2) - depth - 1)
      File "/path/to/crasher.py", line 19, in recurse
        recurse(cnt)
      File "/path/to/crasher.py", line 19, in recurse
        recurse(cnt)
      File "/path/to/crasher.py", line 19, in recurse
        recurse(cnt)
      [Previous line repeated 1 more times]
      File "/path/to/crasher.py", line 21, in recurse
        generator.throw(MyException)
      File "/path/to/crasher.py", line 25, in gen
        yield
    RecursionError: maximum recursion depth exceeded while calling a Python object
    sys:1: ResourceWarning: unclosed file <_io.FileIO name='/path/to/crasher.py' mode='rb' closefd=True>

    @xdegaye
    Copy link
    Mannequin Author

    @xdegaye xdegaye mannequin commented Oct 26, 2017

    Checking the test_exceptions test cases that are added by PR 2327 on the current master branch, before the merge of PR 2327:

    • test_recursion_normalizing_exception still fails (SIGABRT on a debug build and SIGSEGV otherwise)
    • test_recursion_normalizing_infinite_exception is ok as expected
    • test_recursion_normalizing_with_no_memory still fails with a SIGSEGV

    The attached script except_raises_except.py exercises case 3) described in msg231933. The output is as follows when PR 2327 is applied and confirms that the ResourceWarning is now printed:

    Traceback (most recent call last):
      File "except_raises_except.py", line 11, in <module>
        generator.throw(MyException)
      File "except_raises_except.py", line 7, in gen
        yield
      File "except_raises_except.py", line 3, in __init__
        raise MyException
      File "except_raises_except.py", line 3, in __init__
        raise MyException
      File "except_raises_except.py", line 3, in __init__
        raise MyException
      [Previous line repeated 495 more times]
    RecursionError: maximum recursion depth exceeded while calling a Python object
    sys:1: ResourceWarning: unclosed file <_io.FileIO name='except_raises_except.py' mode='rb' closefd=True>

    @xdegaye
    Copy link
    Mannequin Author

    @xdegaye xdegaye mannequin commented Oct 26, 2017

    New changeset 56d1f5c by xdegaye in branch 'master':
    bpo-30697: Fix PyErr_NormalizeException() when no memory (GH-2327)
    56d1f5c

    @xdegaye
    Copy link
    Mannequin Author

    @xdegaye xdegaye mannequin commented Oct 26, 2017

    New changeset 4b27d51 by xdegaye in branch '3.6':
    [3.6] bpo-30697: Fix PyErr_NormalizeException() when no memory (GH-2327). (bpo-4135)
    4b27d51

    @xdegaye xdegaye mannequin closed this as completed Oct 26, 2017
    @asottile
    Copy link
    Mannequin

    @asottile asottile mannequin commented Dec 19, 2017

    Should this have landed in python3.6? It removes a public symbol PyExc_RecursionErrorInst (abi break?)

    @brettcannon
    Copy link
    Member

    @brettcannon brettcannon commented Dec 19, 2017

    Re-opened as a release blocker to make sure we're okay with the potential ABI breakage.

    @xdegaye
    Copy link
    Mannequin Author

    @xdegaye xdegaye mannequin commented Dec 20, 2017

    As a reference the discussion about the removal of PyExc_RecursionErrorInst took place at PR 1981.

    The new PR 4941 restores PyExc_RecursionErrorInst on 3.6, not sure if this should be merged. The same operation could be done on 3.7. Python is not using PyExc_RecursionErrorInst anymore and it is doubtful that anyone does.

    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented Jan 28, 2018

    We should make a decision on this for both 3.6.5 and for 3.7.0. Victor, Antoine, Brett, Serhiy, any opinions?

    @brettcannon
    Copy link
    Member

    @brettcannon brettcannon commented Jan 29, 2018

    No opinion from me on how critical this is.

    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented Jan 30, 2018

    I don't know what the right answer here is. But since there don't seem to be strong opinions one way or the other with regard to 3.7, I am not going to hold 3.7.0b1 for a resolution.

    @ned-deily
    Copy link
    Member

    @ned-deily ned-deily commented Mar 14, 2018

    It's been three months since Anthony raised the question about whether this was an acceptable ABI change in 3.6.4. Since then, I am not aware of any reports of problems this has caused and there has been no agreement that it is a critical problem. Since 3.6.5rc1 has now been released without this being resolved, I'm going to close this issue and hope for the best. Thanks, everyone! Feel free to reopen if a real life problem can be demonstrated.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 interpreter-core type-crash
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants