Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make threading._register_atexit public? #86128

Open
bdarnell mannequin opened this issue Oct 7, 2020 · 13 comments
Open

Make threading._register_atexit public? #86128

bdarnell mannequin opened this issue Oct 7, 2020 · 13 comments
Labels
3.9 only security fixes stdlib Python modules in the Lib dir

Comments

@bdarnell
Copy link
Mannequin

bdarnell mannequin commented Oct 7, 2020

BPO 41962
Nosy @pitrou, @vstinner, @bdarnell, @ericsnowcurrently, @aeros

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2020-10-07.01:09:44.475>
labels = ['library', '3.9']
title = 'Make threading._register_atexit public?'
updated_at = <Date 2022-02-04.19:07:30.806>
user = 'https://github.com/bdarnell'

bugs.python.org fields:

activity = <Date 2022-02-04.19:07:30.806>
actor = 'Ben.Darnell'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2020-10-07.01:09:44.475>
creator = 'Ben.Darnell'
dependencies = []
files = []
hgrepos = []
issue_num = 41962
keywords = []
message_count = 10.0
messages = ['378144', '385589', '385598', '385688', '412438', '412470', '412471', '412472', '412475', '412526']
nosy_count = 6.0
nosy_names = ['pitrou', 'vstinner', 'sa', 'Ben.Darnell', 'eric.snow', 'aeros']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue41962'
versions = ['Python 3.9']

@bdarnell
Copy link
Mannequin Author

bdarnell mannequin commented Oct 7, 2020

I'm dealing with a subtle deadlock involving concurrent.futures.ThreadPoolExecutor, and my solution that worked in Python 3.8 broke with 3.9. I'm running some long-running (possibly infinite) tasks in the thread pool, and I cancel them in an atexit callback so that everything can shut down cleanly (before ThreadPoolExecutor joins all worker threads in its own atexit hook).

Python 3.9 broke this due to https://bugs.python.org/issue39812. That change introduced a new atexit-like mechanism to the threading module and uses it where Python 3.8 used regular atexit. This means that ThreadPoolExecutor's atexit runs before mine, and since I never get a chance to cancel my tasks, it deadlocks.

One way I can solve this is to move my own atexit function to threading._register_atexit, so my strawman proposal here is to make that function public and documented.

On the other hand, even without the change in Python 3.9, my use of atexit smells like an abuse of implementation details in ThreadPoolExecutor (getting the atexit callbacks called in the right order was tricky when the concurrent.futures module started using lazy loading in Python 3.7). So I would welcome other suggestions about how to handle long-running but cancelable operations in a ThreadPoolExecutor at shutdown.

One clean solution is to do the cancellation at the end of the main module instead of in an atexit hook. However, I'm doing this at a library so I don't have any way but atexit to ensure that this happens. Another option is to forego ThreadPoolExecutor entirely and manage the threads myself.

My code in question is in a not-yet-released branch of Tornado: https://github.com/tornadoweb/tornado/blob/5913aa43ecfdaa76876fc57867062227b907b1dd/tornado/platform/asyncio.py#L57-L73

With the master branch of Tornado, Python 3.9, and Windows, python -c "from tornado.httpclient import HTTPClient; c = HTTPClient() reliably deadlocks at interpreter shutdown.

@bdarnell bdarnell mannequin added 3.9 only security fixes labels Oct 7, 2020
@iritkatriel iritkatriel added stdlib Python modules in the Lib dir labels Jan 24, 2021
@aeros
Copy link
Contributor

aeros commented Jan 24, 2021

I'm dealing with a subtle deadlock involving concurrent.futures.ThreadPoolExecutor, and my solution that worked in Python 3.8 broke with 3.9. I'm running some long-running (possibly infinite) tasks in the thread pool, and I cancel them in an atexit callback so that everything can shut down cleanly (before ThreadPoolExecutor joins all worker threads in its own atexit hook).

IMO, a better practice would be providing those potentially infinite running tasks a direct method of escape and invoking it before calling executor.shutdown(), it would be a more reliable approach. But, perhaps there is some convenience utility in being able to provide custom atexit hooks. It also might help the user to separate the shutdown logic from the rest of the program.

Since you worked with me in adding threading._register_atexit(), Do you have any thoughts, Antoine? I would personally not be opposed to it being made public assuming there's real utility present in doing so.

My only concern is that it might be a potential foot-gun. If the user submits an atexit hook that deadlocks, it might prevent threads from shutting down safely prior to interpreter finalization. I'm presently undecided if explicitly mentioning that it in the docs would be sufficient warning.

@bdarnell
Copy link
Mannequin Author

bdarnell mannequin commented Jan 25, 2021

IMO, a better practice would be providing those potentially infinite running tasks a direct method of escape and invoking it before calling executor.shutdown(), it would be a more reliable approach.

Agreed, but the problem is that I'm in a library (so I don't control the main module), and the library's interface does not mandate any sort of explicit shutdown method. There is a shutdown method, but almost no one calls it, and it's never caused a problem until Python 3.9 changed things so it deadlocks.

My only concern is that it might be a potential foot-gun. If the user submits an atexit hook that deadlocks, it might prevent threads from shutting down safely prior to interpreter finalization.

Yes, and that is exactly the problem. concurrent.futures submits an atexit hook whose behavior depends on application code, and through that I have inadvertently caused a deadlock.

@bdarnell
Copy link
Mannequin Author

bdarnell mannequin commented Jan 26, 2021

I have resolved my issue here by moving from ThreadPoolExecutor to a plain threading.Thread that I manage by hand (tornadoweb/tornado@15832bc). Therefore I no longer need this for myself and I leave it up to you to decide whether there's anything worth doing at this point.

@sa
Copy link
Mannequin

sa mannequin commented Feb 3, 2022

Another way to do this is to call threading.main_thread().join() in another thread and do the shutdown cleanup when it returns.

The main thread is stopped at shutdown just before the threading._threading_atexits are called.

@ericsnowcurrently
Copy link
Member

I'm running some long-running (possibly infinite) tasks in the thread pool,
and I cancel them in an atexit callback

To be clear, by "cancel" you are not talking about Future.cancel(). Rather, your handler causes all running tasks to finish (by sending a special message on the socket corresponding to each running task). Is that right?

Some other things I inferred from your atexit handler:

  • it does not make sure the task associated with the socket finishes (no way of knowing?)
  • so if a task hangs while trying to stop then the running thread in the ThreadPoolExecutor would block shutdown forever
  • similarly, if a task is stuck handling a request then it will never receive the special message on the socket, either blocking the send() in your handler or causing ThreadPoolExecutor shutdown/atexit to wait forever
  • it vaguely implies a 1-to-1 relationship between sockets and *running* tasks
  • likewise that pending (queued) tasks do not have an associated socket (until started)
  • so once your handler finishes, any tasks pending in the ThreadPoolExecutor queue will eventually get started but never get stopped by your handler; thus you're back to the deadlock situation

Does all that sound right? Most of that is relevant to some possible solutions I have in mind.

@ericsnowcurrently
Copy link
Member

I'm running some long-running (possibly infinite) tasks in the thread pool,
and I cancel them in an atexit callback

Alternately, perhaps ThreadPoolExecutor isn't the right fit here, as implied by the route you ended up going. It seems like it's not well-suited for long-running (or infinite) tasks. In that case, perhaps the concurrent.futures docs could be more clear about when ThreadPoolExecutor is not a good fit and what the alternatives are.

@ericsnowcurrently
Copy link
Member

FWIW, here's a brain dump about ThreadPoolExecutor and its atexit handler after having looked at the code.

----

First, the relationship between the objects involved:

  • work item -> Future
  • work item -> task (e.g. function)
  • queue -> [work item]
  • worker -> executor
  • worker -> queue
  • worker -> currently running work item
  • thread -> worker
  • ThreadPoolExecutor -> [thread]
  • ThreadPoolExecutor -> queue
  • global state -> {thread: queue}

Observations:

  • a work item's future and task are not aware of each other, and operations on either have no effect on the other

----

Next, let's look at the relevant ways the objects can be used:

  • publicly
    • ThreadPoolExecutor.submit(task) -> Future
    • ThreadPoolExecutor.shutdown()
    • Future.result() and Future.exception()
    • Future.cancel()
    • Future.add_done_callback()
  • internally
    • queue.pop() -> work item
    • <work item>.run()
    • thread.join()
    • Future.set_running_or_notify_cancel()
    • Future.set_result() and Future.set_exception()

Observations:

  • nothing interacts with a worker directly; it is waited on via its thread and it receives work (or None) via the queue it was given
  • once a worker pops the next work item off the queue, nothing else has access to that work item (the original ThreadPoolExecutor().submit() caller has the Future, but that's it)
  • there is no cancelation mechanism for tasks
  • there is no simple way to interact with the queued tasks
  • realistically, there is no way to interact with the currently running task
  • there is no reliable way to "kill" a thread

----

Regarding how tasks run:

  • ThreadPoolExecutor.submit(task) -> Future
  • ThreadPoolExecutor.submit(task) -> work item (Future, task) -> queue
  • ThreadPoolExecutor.submit(task) -> thread (worker)
  • thread -> worker -> ( queue -> work item -> task )

Observations::

  • the worker loop exits if the next item in the queue is None (and the executor is shutting down)

----

Now lets look more closely at the atexit handler.

  • as you noted, since 3.9 it is registered with threading._register_atexit() instead of atexit.register()
  • the threading atexit handlers run before the regular atexit handlers
  • the ThreadPoolExecutor handler does not actually interact with ThreadPoolExecutor instances directly
  • it only deals with a module-global list of (thread, [work item]) pairs, to which ThreadPoolExecutor instances add items as they go

The handler does the following:

  1. disables ThreadPoolExecutor.submit() (for all instances)
  2. (indirectly) tells each worker to stop ASAP
  3. lets every pending task run (and lets every running task keep running)
  4. waits until all tasks have finished

It does not:

  • call any ThreadPoolExecutor.shutdown()
  • otherwise deal with the ThreadPoolExecutor instances directly
  • call Future.cancel() for any of the tasks
  • use any timeout in step 4, so it may block forever
  • notify tasks that they should finish
  • deal well with any long-running (or infinite) task

ThreadPoolExecutor.shutdown() basically does the same thing. However, it only does the steps above for its own tasks. It also optionally calls Future.cancel() for each queued task (right before step 2). However, all that does is keep any queued-but-not-running tasks from starting. Also, you can optionally skips step 4.

@ericsnowcurrently
Copy link
Member

This means that ThreadPoolExecutor's atexit runs before mine,
and since I never get a chance to cancel my tasks, it deadlocks.

(assuming we want to support long-running tasks here)

With all the above in mind, there are a few things that may help.

The first that comes to mind is to have the atexit handler call ThreadPoolExecutor.shutdown() for each instance.

So something like this:

def _python_exit():
    global _shutdown
    with _global_shutdown_lock:
        _shutdown = True
    for executor in list(_executors):
        executor.shutdown()

That would require a little refactoring to make it work. However, the change is simpler if each executor has its own atexit handler:

class ThreadPoolExecutor(_base.Executor):

    def __init__(self, ...):
        ...
        threading._register_atexit(self._atexit())

    def _atexit(self):
        global _shutdown
        _shutdown = True
        self.shutdown()

The value of either approach is that you can then subclass ThreadPoolExecutor to get what you want:

class MyExecutor(ThreadPoolExecutor):
    def shutdown(self, *args, **kwargs):
        stop_my_tasks()
        super().shutdown(*args, **kwwargs)

One thing I thought about was supporting a per-task finalizer instead, since that aligns more closely with the way ThreadPoolExecutor works. It would only apply So something like one of the following:

  • ThreadPoolExecutor(finalize_task=<callable>)
  • ThreadPoolExecutor.submit(finalize=<callable)

----

Other things that could be helpful:

  • always cancel all the pending tasks during shutdown (and maybe let users opt out)
  • use a timeout during shutdown

----

FWIW, adding support for some sort of sub-atexit handler isn't so appealing. I'm talking about something like one of the following:

  • ThreadPoolExecutor(onshutdown=<callable>)
  • ThreadPoolExecutor.register_atexit(<callable>)
  • (classmethod) ThreadPoolExecutor.register_atexit(<callable>)
  • concurrent.futures.register_atexit(<callable>)

(It would probably make sense to pass the list of currently running tasks to the callable.)

@bdarnell
Copy link
Mannequin Author

bdarnell mannequin commented Feb 4, 2022

To be clear, by "cancel" you are not talking about Future.cancel(). Rather, your handler causes all running tasks to finish (by sending a special message on the socket corresponding to each running task). Is that right?

Correct. My tasks here are calls to functions from the select module (one select call per executor task), and cancelling them means writing a byte to a pipe set up for this purpose.

The select calls could be given a timeout so there is never an infinite task, but that's not ideal - a timeout that's too low has a performance cost as calls timeout and restart even when the system is "at rest", and a too-long timeout is still going to be perceived as a hanging application.

  • it does not make sure the task associated with the socket finishes (no way of knowing?)
  • so if a task hangs while trying to stop then the running thread in the ThreadPoolExecutor would block shutdown forever
  • similarly, if a task is stuck handling a request then it will never receive the special message on the socket, either blocking the send() in your handler or causing ThreadPoolExecutor shutdown/atexit to wait forever

Correct. If the task were buggy it could still cause a deadlock. In my case the task is simple enough (a single selector call) that this is not a risk.

  • it vaguely implies a 1-to-1 relationship between sockets and *running* tasks
  • likewise that pending (queued) tasks do not have an associated socket (until started)

Each task is associated with a selector object (managing a set of sockets), not a single socket. There is only ever one task at a time; a task is enqueued only after the previous one finishes. (This thread pool is not used for any other purpose)

  • so once your handler finishes, any tasks pending in the ThreadPoolExecutor queue will eventually get started but never get stopped by your handler; thus you're back to the deadlock situation

In my case this one-at-a-time rule means that the queue is always empty. But yes, in a more general solution you'd need some sort of interlock between cancelling existing tasks and starting new ones.

Alternately, perhaps ThreadPoolExecutor isn't the right fit here, as implied by the route you ended up going.

Yes, this is my conclusion as well. I filed this issue because I was frustrated that Python 3.9 broke previously-working code, but I'm willing to chalk this up to Hyrum's law and I'm not sure that this is something that ThreadPoolExecutor should be modified to support.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@lucaswiman
Copy link
Contributor

lucaswiman commented Jun 18, 2022

Another issue (also related to long-running threads, though I don't think that's essential) is that there is no way to use an atexit function to tell your threads to exit cleanly. I reduced the issue to this:

import atexit
import concurrent.futures
import time


stop_signal = False

def waiter():
    while not stop_signal:
        time.sleep(1)
        print('.', end="")
        import sys; sys.stdout.flush()


def stop_workers():
    global stop_signal
    print("called!")
    stop_signal = True


atexit.register(stop_workers)


if __name__ == "__main__":
    executor = concurrent.futures.ThreadPoolExecutor(max_workers=1)
    executor.submit(waiter)
    assert False

When you run this, it will block and continue executing the waiter thread. When I hit control+C it generates this error:

..^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 40, in _python_exit
    t.join()
  File "/usr/lib/python3.8/threading.py", line 1011, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):
KeyboardInterrupt
called!

So there's effectively a deadlock between the ThreadPoolExecutor's atexit handler, which runs before any methods registered by atexit.register. This seems very counter intuitive since the documentation in atexit says the atexit handlers are processed in a LIFO fashion.

I feel like an inability to tell threads to gracefully exit when the main thread ends is a pretty significant limitation of ThreadPoolExecutor and should be mentioned in the docs, even if it's not clear what the right fix for it is. Would the maintainers be supportive if I added a documentation PR to that effect?

Note: the same issue seems to occur on python 3.10, so it's not specific to 3.8.

lucaswiman added a commit to lucaswiman/cpython that referenced this issue Jun 19, 2022
See python#86128 (comment) for
an example where this happens. Essentially, even if it looks like
you added an `atexit` handler to instruct your thread to exit gracefully,
it will only be executed _after_ your thread has finished. For
long-running threads (e.g. threads listening to a queue), that may
never happen at all.

Elsewhere in python#86128, it's recommended that `ThreadPoolExecutor` not
be used for long-running tasks, but this was not reflected in the
documentation. Based solely on the API, there is no reason to think
you shouldn't use it for long-running tasks. The only reason appears
to be a limitation in its implementation, so that should be made
explicit in the docs.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 28, 2022
…ehaviour (pythonGH-94008)

(cherry picked from commit 7df2f4d)

Co-authored-by: [object Object] <lucas.wiman@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 28, 2022
…ehaviour (pythonGH-94008)

(cherry picked from commit 7df2f4d)

Co-authored-by: [object Object] <lucas.wiman@gmail.com>
miss-islington added a commit that referenced this issue Jul 28, 2022
…ur (GH-94008)

(cherry picked from commit 7df2f4d)

Co-authored-by: [object Object] <lucas.wiman@gmail.com>
miss-islington added a commit that referenced this issue Jul 28, 2022
…ur (GH-94008)

(cherry picked from commit 7df2f4d)

Co-authored-by: [object Object] <lucas.wiman@gmail.com>
@SyntaxColoring
Copy link

SyntaxColoring commented Jul 25, 2023

How much of this is actually specific to ThreadPoolExecutor, as opposed to non-daemon threads in general? Here's a demo of deadlocking thread cleanup on shutdown. Is this not the same thing?

@lucaswiman
Copy link
Contributor

How much of this is actually specific to ThreadPoolExecutor, as opposed to non-daemon threads in general?

@SyntaxColoring Prior to 3.9, ThreadPoolExecutor did use daemon threads, with unusual behavior for daemon threads. (Updated in this commit).

I think your example is expected behavior for non-daemon threads, though the threading/atexit docs could probably be clarified a bit on this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes stdlib Python modules in the Lib dir
Projects
Status: No status
Development

No branches or pull requests

5 participants