Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys.exit() caught in async event loop exception handler #69675

Closed
jinty mannequin opened this issue Oct 27, 2015 · 36 comments
Closed

sys.exit() caught in async event loop exception handler #69675

jinty mannequin opened this issue Oct 27, 2015 · 36 comments

Comments

@jinty
Copy link
Mannequin

jinty mannequin commented Oct 27, 2015

BPO 25489
Nosy @gvanrossum, @bitdancer, @asvetlov, @1st1, @iritkatriel
Files
  • test_sys_exit_in_exception_handler.py: Test Case
  • future.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2015-10-27.11:25:55.941>
    labels = ['expert-asyncio']
    title = 'sys.exit() caught in async event loop exception handler'
    updated_at = <Date 2022-03-22.13:11:35.019>
    user = 'https://bugs.python.org/jinty'

    bugs.python.org fields:

    activity = <Date 2022-03-22.13:11:35.019>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['asyncio']
    creation = <Date 2015-10-27.11:25:55.941>
    creator = 'jinty'
    dependencies = []
    files = ['40867', '40869']
    hgrepos = []
    issue_num = 25489
    keywords = ['patch']
    message_count = 27.0
    messages = ['253529', '253540', '253543', '253544', '253549', '253552', '253554', '253555', '253556', '253557', '253558', '253576', '253594', '253601', '253603', '253686', '253723', '253746', '415599', '415618', '415619', '415620', '415621', '415622', '415737', '415738', '415742']
    nosy_count = 6.0
    nosy_names = ['gvanrossum', 'r.david.murray', 'asvetlov', 'yselivanov', 'jinty', 'iritkatriel']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue25489'
    versions = ['Python 3.5']

    @jinty
    Copy link
    Mannequin Author

    jinty mannequin commented Oct 27, 2015

    Running the attached file with python3 shows that SystemExit is caught rather than causing the process to stop. That's quite surprising.

    @jinty jinty mannequin added the topic-asyncio label Oct 27, 2015
    @bitdancer
    Copy link
    Member

    I expected this function to be parallel to sys.excepthook, but I see that sys.excepthook does not get called if you call sys.exit() at the python prompt. So I guess I agree with you that it is surprising, although I actually expected the opposite (that sys.excepthook would get called for sys.exit).

    @jinty
    Copy link
    Mannequin Author

    jinty mannequin commented Oct 27, 2015

    the workaround I am using at the moment is this:

    def handler(loop, context):
        print('Got error, exiting')
        loop.call_soon(sys.exit, 42)

    which actually does cause the process to exit

    @bitdancer
    Copy link
    Member

    Ah, I misunderstood your report, because I didn't actually run the example. The exception is being ignored because it is raised during a __del__ method execution. This has nothing to do with set_exception_handler. And in fact if you raise sys.exit in an excepthook handler it is ignored completely, so the cases are parallel in that sense. Interestingly, if you replace your Exception in boom with sys.exit, your sys.exit in the handler actually works, because in that case the handler isn't getting called from the Task __del__.

    I don't think there's a bug in python here, I think the bug is in your program. You aren't yielding from the task, so the exception from boom only gets dealt with during Task finalization. I find this to be the most confusing part of asyncio programming, myself. I don't know why the handler doesn't get called just from creating the Task and running the loop. I often end up writing wrapper functions that yield from a function that does the work inside a try/except, and/or yield from the task in a 'wait_closed' method.

    Maybe someone with more asyncio inside knowledge will be able to clear this up, and we can figure out a doc improvement.
    I suspect we need a "best practices" document about this...

    @1st1
    Copy link
    Member

    1st1 commented Oct 27, 2015

    because in that case the handler isn't getting called from the Task __del__

    It's possible to fix -- see the attached future.patch. And perhaps this should be fixed in Future.__del__ and Task.__del__ -- ignoring BaseExceptions isn't good.

    @bitdancer
    Copy link
    Member

    So, any exception raised in the exception handler will be re-raised via call_soon. I think the message would be clearer if it said that (that the exception comes from the registered exception handler).

    But, I'm not sure this is a good idea. Exceptions are ignored in __del__ because they are asynchronous to the currently running code when the exception is executed (because they are triggered by garbage collection)...that is, there's no currently active statement when the exception is raised. Just because asyncio is an async framework doesn't really change this fundamental truth (between explicit yield points, asyncio code is synchronous, that's its big attraction).

    Making this change would make asyncio inconsistent with python's normal practice, and I don't (yet?) see a coherent motivation for doing so.

    @1st1
    Copy link
    Member

    1st1 commented Oct 27, 2015

    I see your points, but we're talking about BaseExceptions here -- KeyboardInterrupt, SystemExit etc. Those things usually mean that the program has to crash.

    @1st1
    Copy link
    Member

    1st1 commented Oct 27, 2015

    Trapping those exceptions in __del__ isn't good.

    Another way to address this would be to at least modify call_exception_handler to log when an exception handler raises a BaseException error.

    @bitdancer
    Copy link
    Member

    No, you are talking about *all* exceptions, since they all descend from BaseException. Python's normal __del__ handling just prints the ignored exception to stdout, even if it is a BaseException. Why should asyncio be different (other than logging it instead, which would be useful)?

    @1st1
    Copy link
    Member

    1st1 commented Oct 27, 2015

    No, you are talking about *all* exceptions, since they all descend from BaseException. Python's normal __del__ handling just prints the ignored exception to stdout, even if it is a BaseException. Why should asyncio be different (other than logging it instead, which would be useful)?

    My bad -- my current patch does indeed re-raise all exceptions. But would you be OK with something like this:

            def __del__():
                ...
            try:
                self._loop.call_exception_handler(context)
            except Exception as ex:
                # log or re-raise
                logger.log(...)
            except BaseException as ex:
                logger.error(
                    'BaseException in Future.__del__, will be re-raised soon',
                    exc_info=True)
                def throw():
                    raise ex
                self._loop.call_soon(throw)
    

    @bitdancer
    Copy link
    Member

    Not until you convince me there is a reason for deviating from Python's normal __del__ handling :) (Or other asyncio developers agree with you and not me.)

    @gvanrossum
    Copy link
    Member

    Whew. Complex issue!

    The OP should probably use his workaround or call loop.stop() instead of raising SystemExit.

    Asyncio in general is rather careless about "true" BaseExceptions (i.e. that aren't also Exceptions), we should decide what we want to do for these (there are definitely ways to get in a bad state if you catch one and then resume the loop).

    call_exception_handler() tries to log and then ignore exceptions raised by the handler, so the except clause Yuri's patch adds to __del__ in fact will only see true BaseExceptions.

    Adding an except clause to __del__ seems very unprincipled -- there's another __del__ that would require the same treatment (all our __del__ methods seem to call call_exception_handler()) and *if* we want to catch it we should probably do it in call_exception_handler().

    boom() does in fact yield.

    @jinty
    Copy link
    Mannequin Author

    jinty mannequin commented Oct 28, 2015

    Calling loop.stop() means that I need other, more complex code, to store and return the non-zero exit status.

    @bitdancer
    Copy link
    Member

    Using sys.exit means you are depending on garbage collection to clean up all of your program's resources. In the general case this is a bad idea. A better design is to call loop.stop, and then do cleanup (which might involve calling some wait_closed functions via loop.run_until_complete). If you just call sys.exit, your resources may not get cleaned up correctly, or may not get cleaned up correctly somewhat randomly due to the indeterminacies in the order in which garbage collection is done. This may not matter for a simple program, but I find it makes it easier for me if I just do it "the right way" always :)

    @gvanrossum
    Copy link
    Member

    How about we extend loop.stop() so that you can pass it an exception to
    raise once the loop is stopped? This exception would then be thrown out of
    run_forever(). There may be some delay (callbacks already scheduled will
    run first) but it is how things were meant to be.

    FWIW this isn't really enough to ensure cleanup happens before destructors
    run -- when the loop exits, tasks may still be active unless you keep track
    of all of them and explicitly cancel them (and run the loop until they have
    processed the cancellation).

    @jinty
    Copy link
    Mannequin Author

    jinty mannequin commented Oct 29, 2015

    On Wed, Oct 28, 2015 at 02:49:55PM +0000, R. David Murray wrote:

    R. David Murray added the comment:

    Using sys.exit means you are depending on garbage collection to clean
    up all of your program's resources. In the general case this is a bad
    idea. A better design is to call loop.stop, and then do cleanup
    (which might involve calling some wait_closed functions via
    loop.run_until_complete). If you just call sys.exit, your resources
    may not get cleaned up correctly, or may not get cleaned up correctly
    somewhat randomly due to the indeterminacies in the order in which
    garbage collection is done. This may not matter for a simple program,
    but I find it makes it easier for me if I just do it "the right way"
    always :)

    I think it depends on the problem, sometimes "crash-only" software is
    safer. You have to design for immediate failure anyway, can't stop those
    pesky hardware failures. So if you always immediately crash there is
    only one, well-tested path to stopping a program.

    Also saves a lot of cleanup code writing;)

    Hmm, I suppose that means I should really be using os._exit(42) to avoid
    garbage collection...

    @jinty
    Copy link
    Mannequin Author

    jinty mannequin commented Oct 30, 2015

    On Wed, Oct 28, 2015 at 03:32:36PM +0000, Guido van Rossum wrote:

    Guido van Rossum added the comment:

    How about we extend loop.stop() so that you can pass it an exception to
    raise once the loop is stopped? This exception would then be thrown out of
    run_forever(). There may be some delay (callbacks already scheduled will
    run first) but it is how things were meant to be.

    It is better than the current situation where to call loop.stop() and
    have the correct exit code I would need to resort to a global variable.

    I would then probably try to write an exception handler like this:

        def exception_handler(loop, context):
            loop.stop(context.get('exception', Exception('unknown error')))

    And then hope I don't lose the traceback!

    FWIW this isn't really enough to ensure cleanup happens before destructors
    run -- when the loop exits, tasks may still be active unless you keep track
    of all of them and explicitly cancel them (and run the loop until they have
    processed the cancellation).

    At least in my situation I don't think this is a problem, the objective
    is to shut down the process quickly to prevent bad things happening.

    @gvanrossum
    Copy link
    Member

    OK, I'll wait for someone to submit a patch for this. (I tried a quick hack, but unless we limit this to BaseException instances that aren't also Exception instances it's complicated).

    @iritkatriel
    Copy link
    Member

    See also bpo-46759.

    @gvanrossum
    Copy link
    Member

    Andrew, would you be interested in investigating this? I can't even follow the flow through asyncio that causes the observed behavior (though I seem to have confirmed it).

    @asvetlov
    Copy link
    Contributor

    Not sure if the issue is still relevant.
    asyncio re-raises KeyboardInterrupt explicitly in all loop.call_exception_handler() branches: https://github.com/python/cpython/blob/main/Lib/asyncio/base_events.py#L1759-L1814

    @asvetlov
    Copy link
    Contributor

    The change exists since Python 3.8 (October of 2019)
    The issue discussion belongs to 2015

    @gvanrossum
    Copy link
    Member

    Nevertheless, the example code still hangs after calling sys.exit(). I can't quite tell where it is hanging.

    @gvanrossum
    Copy link
    Member

    gvanrossum commented Mar 20, 2022

    With python built from main I get:

    /Users/guido/test_sys_exit_in_exception_handler.py:12: DeprecationWarning: There is no current event loop
      loop = asyncio.get_event_loop()
    Got error, exiting
    Exception ignored in: <bound method BaseEventLoop.call_exception_handler of <_UnixSelectorEventLoop running=True closed=False debug=False>>
    Traceback (most recent call last):
      File "/Users/guido/cpython/Lib/asyncio/base_events.py", line 1749, in call_exception_handler
        self._exception_handler(self, context)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/Users/guido/test_sys_exit_in_exception_handler.py", line 6, in handler
        sys.exit(42)
        ^^^^^^^^^^^^
    SystemExit: 42

    This is printed from write_unraisable_exc_file(). Earlier comments in this issue indicate that the exception is being raised from a finalizer (__del__ method). But how?

    @gpshead gpshead changed the title sys.exit() caught in exception handler sys.exit() caught in async event loop exception handler Mar 20, 2022
    @asvetlov
    Copy link
    Contributor

    asvetlov commented Mar 22, 2022

    I can describe what happens with test_sys_exit_in_exception_handler.py

    1. The 'boom' task raises an exception.
    2. The task is not awaited, Future.__del__ calls the exception handler with 'Task exception was never retrieved' message.
    3. The custom handler raises SystemExit.
    4. SystemExit bubbles up and swallowed by __del__, the __del__ method cannot re-raise.

    The question is: what is the behavior expected?
    a) Now an exception raised by a custom exception handler is swallowed in this particular case (but is propagated if loop.call_exception_handler() is called from a function other than __del__).
    b) Yuri suggested re-schedule an exception generated by loop.call_exception_handler by loop.call_soon(). asyncio.Handle catches it and... calls call_exception_handler() with 'Exception in callback ...' message. At the end, we have an endless recursion.
    c) asyncio loop can call loop.stop() if an exception is raised by loop.call_exception_handler() from __del__. I think this behavior is terrible: a subtle error can terminate asyncio program.
    d) Assume that a custom exception handler should not raise an exception. Catch all exceptions in call_exception_handler, call sys.unraisablehook(), and suppress the exception.

    I believe that d) is the best thing that we can do here.

    I can prepare a fix if we agree on the solution.

    @asvetlov
    Copy link
    Contributor

    Guido, perhaps you had problems with the problem detection because the asyncio uses _asyncio C Extesions by default. It drops some calls from the python stack trace.

    @gvanrossum
    Copy link
    Member

    Andrew, thanks for explaining this.

    The key thing I was missing was that the root cause of the problem is that Future.__del__ is trying to log an error about the un-awaited task by calling the exception handler directly. That actually feels a little dodgy.

    This is why I'm not yet comfortable with (d). Looking at call_exception_handler(), whether it calls the default handler or a custom handler, it explicitly checks for SystemExit and KeyboardInterrupt and re-raises those. And only those -- everything ends up logging an error.

    Which makes me wonder. Maybe that error in Future.__del__ should not call any exception handler at all, but just call logger.error()? Or maybe Future.__del__ should catch exceptions coming out of there and log an error? Maybe a modified version of (d), but only implemented in Future.__del__, not in call_exception_handler()?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @kumaraditya303
    Copy link
    Contributor

    The simplest solution is to just let the loop call the exception handler with call_soon rather than calling it manually.

    @gvanrossum
    Copy link
    Member

    Isn't that (b)? @asvetlov didn't seem to think that would work (see his message above).

    @kumaraditya303
    Copy link
    Contributor

    No, I propose to call exception handler with call_soon rather than (b) which is about rescheduling the raised error with call_soon.

    @gvanrossum
    Copy link
    Member

    Okay, but I don't know how we can assess that your solution will be an improvement in real-world code and not cause further issues. Maybe we should carefully analyze the consequences once you have a draft PR up.

    @kumaraditya303
    Copy link
    Contributor

    @gvanrossum: Try the following patch:

    diff --git a/Lib/asyncio/futures.py b/Lib/asyncio/futures.py
    index 4bd9629e9eb..973f61451d7 100644
    --- a/Lib/asyncio/futures.py
    +++ b/Lib/asyncio/futures.py
    @@ -103,7 +103,7 @@ def __del__(self):
             }
             if self._source_traceback:
                 context['source_traceback'] = self._source_traceback
    -        self._loop.call_exception_handler(context)
    +        self._loop.call_soon(lambda: self._loop.call_exception_handler(context))
     
         __class_getitem__ = classmethod(GenericAlias)
     
    

    Script:

    import sys
    sys.modules['_asyncio'] = None
    import asyncio
    
    def handler(loop, context):
        print('Got error, exiting')
        sys.exit(42)
    
    async def boom():
        await asyncio.sleep(1)
        raise Exception('boom')
    
    loop = asyncio.get_event_loop()
    
    # Schedule a call to hello_world()
    loop.set_exception_handler(handler)
    loop.create_task(boom())
    loop.run_forever()

    @gvanrossum
    Copy link
    Member

    The lambda isn't even needed, you can write

            self._loop.call_soon(self._loop.call_exception_handler, context)

    @gvanrossum
    Copy link
    Member

    But the more I think about this the less I like it. There are quite a few __del__ methods in asyncio, and most of then call warnings.warn, which may raise an exception -- which is then ignored. Many also call self.close(), which in many cases can also raise.

    I think the OP's best workaround is, as they said, os._exit(42). Second-best would seem to be loop.call_soon(sys.exit, 42).

    So I would like to close as "not planned".

    @jinty Do you have any further comments, 7 years later?

    @gvanrossum gvanrossum added the pending The issue will be closed if no feedback is provided label Sep 18, 2022
    @jinty
    Copy link

    jinty commented Sep 18, 2022

    @gvanrossum I have long since moved on with a workaround, so don't have any skin in the game anymore.

    But for my original usecase of writing a piece of crash-only software, os._exit() would have likely been a better implementation. I still think it's surprising that my example testcase didn't work, but fixing it is sounds like it's not worth complexity or risk of unintended consequences.

    @gvanrossum
    Copy link
    Member

    Thanks — there is always complexity somewhere, I think it’s a reasonable balance now. Closing.

    @gvanrossum gvanrossum closed this as not planned Won't fix, can't repro, duplicate, stale Sep 18, 2022
    @gvanrossum gvanrossum removed the pending The issue will be closed if no feedback is provided label Sep 18, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    Status: Done
    Development

    No branches or pull requests

    7 participants