Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

user_done_callback fires too early on cancellation or timeout #105

Closed
dcnieho opened this issue Sep 30, 2022 · 9 comments
Closed

user_done_callback fires too early on cancellation or timeout #105

dcnieho opened this issue Sep 30, 2022 · 9 comments

Comments

@dcnieho
Copy link

dcnieho commented Sep 30, 2022

See the code around line

self.task_manager.task_done(

Here the user is notified of the cancellation before the worker process is actually killed. That is an issue for me because i would like to clear up some file system objects the worker process creates when it is cancelled, but i cannot as they are still in use since the worker isn't stopped yet.

I do not see a reason to notify about cancellation before it has actually occurred, but perhaps i am shortsighted :). If there is a reason why cancellation is notified before it has actually occurred, could you consider making notification timing configurable?

@noxdafox
Copy link
Owner

noxdafox commented Oct 2, 2022

Goal of a process pool is to abstract the management of the worker processes from the main application/service.

Hence, there is not a proper moment when running a callback as the nature of the problem is asynchronous. The main reason why callbacks were ran before terminating the processes was to execute the post-processing as fast as possible with the idea of handling the process termination in a later phase.

How is the main loop supposed to know what resources to clean up?

@dcnieho
Copy link
Author

dcnieho commented Oct 2, 2022 via email

@noxdafox
Copy link
Owner

noxdafox commented Oct 3, 2022

This is not the first time something similar comes out and I now see a valid Use Case for it.

I cannot see issues where the current behaviour is expected (running a callback while the timing out/cancelled process still runs) so I think it would be safe changing the behaviour.

I ran some tests over the WE and did not identify issues. I will make a new release soon with this enhancement.

@dcnieho
Copy link
Author

dcnieho commented Oct 4, 2022

@noxdafox: thank you very much! Glad to hear!

noxdafox added a commit that referenced this issue Oct 5, 2022
A legitimate use case for callbacks is resources cleanup. This cannot
happen while the processes are still running as they might be holding
the resources to cleanup.

Signed-off-by: Matteo Cafasso <noxdafox@gmail.com>
@noxdafox
Copy link
Owner

noxdafox commented Oct 5, 2022

Issue resolved in 5.0.1. Thanks for reporting.

@noxdafox noxdafox closed this as completed Oct 5, 2022
@dcnieho
Copy link
Author

dcnieho commented Oct 6, 2022

Thanks super much! However, i find this is not working as i expected (and i think this is possibly an issue with my expectations), because cancellation of a running task is not notified to the user callback through task_done (the calling of which we just reordered in 5.0.1). What happens when you cancel a running task scheduled on a process pool:

  1. pebble/common.py: 77
    set cancelled state, and invokes callbacks!

  2. pebble/pool/process.py
    in separate thread,
    a. pool manager loop updates pool's status (line 187),
    b. which updates task status (line 240),
    c. which for every cancelled task (line 251)
    d. stops the worker and calls task_done, which (line 303)
    e. calls set_running_or_notify_cancel() on the task's future (line 311).

All of 2 will happen after 1, since it runs in a different thread that will be assigned a processing slice after 1 is done (if i understand Python correctly, or at least this ordering may occur). This is not easy to fix. Fixing it to behave like a finished or failed task (callback invoked after this status is actually reached) would involve adding an extra state to the PebbleFuture (e.g. BEING_CANCELLED), the task manager picking up on this state and cancelling the worker, and then task_done calling a new member function of PebbleFuture set_cancelled() (analogous to set_result() and set_exception()). cancel() would skip all this machinery if State is not running (could defer to super class implementation). set_cancelled() would presumably call add_cancelled() on any waiters, making calling set_running_or_notify_cancel() superfluous in this case(?). On top of that, what should cancel() return when invoked on a task that is already running? Probably False since it isn't cancelled yet.

All rather complicated in any case i guess. Perhaps i would have more success installing my own waiter (implementation detail as that is) which would get invoked at the right time for cancellation and normal or exceptional finishes now in 5.0.1! I'll give that a try. I'm happy to think along and try out the above however, should you wish to pursue it. Thanks in any case!

@dcnieho
Copy link
Author

dcnieho commented Oct 6, 2022

Yes, registering a waiter and (ab)using it as a way to get my callback run at the right time does the trick! So my problem is solved, if a bit dirtyly

@noxdafox
Copy link
Owner

noxdafox commented Oct 6, 2022

I actually forgot that callbacks are executed on Future.cancel(). This is why I was mentioning that it's hard to provide guarantees given the asynchronous nature of the problem.

The callback should work on TimeoutError as expected but I won't change the Future behavior as it would diverge too much from the original concurrent.futures API design.

Glad you found a workaround for your need.

@dcnieho
Copy link
Author

dcnieho commented Oct 7, 2022

@noxdafox, fully agree, you shouldn't change future's behavior. It may however be good to document that Future.cancel() may lead to callback execution before the worker process is actually cancelled.

Thanks again for the 5.0.1 enhancement, which made my workaround possible!

dcnieho added a commit to dcnieho/glassesValidator that referenced this issue Oct 11, 2022
…y, now (ab)use a waiter as a way to get notified, implementation detail as it is. Now we get notified at the right time and can finally clean up any mess we make when canceling or when the process fails. See noxdafox/pebble#105
noxdafox added a commit that referenced this issue Nov 15, 2022
A legitimate use case for callbacks is resources cleanup. This cannot
happen while the processes are still running as they might be holding
the resources to cleanup.

Signed-off-by: Matteo Cafasso <noxdafox@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants