user_done_callback fires too early on cancellation or timeout #105

dcnieho · 2022-09-30T12:32:59Z

See the code around line

Line 248 in 706966a

self.task_manager.task_done(

Here the user is notified of the cancellation before the worker process is actually killed. That is an issue for me because i would like to clear up some file system objects the worker process creates when it is cancelled, but i cannot as they are still in use since the worker isn't stopped yet.

I do not see a reason to notify about cancellation before it has actually occurred, but perhaps i am shortsighted :). If there is a reason why cancellation is notified before it has actually occurred, could you consider making notification timing configurable?

noxdafox · 2022-10-02T15:31:32Z

Goal of a process pool is to abstract the management of the worker processes from the main application/service.

Hence, there is not a proper moment when running a callback as the nature of the problem is asynchronous. The main reason why callbacks were ran before terminating the processes was to execute the post-processing as fast as possible with the idea of handling the process termination in a later phase.

How is the main loop supposed to know what resources to clean up?

dcnieho · 2022-10-02T20:09:19Z

In my case my process does its work in a temporary directory that i have to clean up manually if the process is cancelled. On Windows the process is just killed, no signal that can be caught at all, so i have to do this in my main process that launched the worker task. Right now i just tried a simple if process was cancelled, then delete the directory, but that fails since files in the directory are still in use because the process isn't killed yet. Indeed in general pebble can't know what should be cleaned up, but i as developer receiving the callback conceivably can know. Its why i want to receive the callback in the first case. I understand now that in this case it is a notification that the process will be cancelled, but that is inconsistent with the other two states (failed (if exception is set and its not due to timeout)) and done), which are called after the process has moved on to the next job. The next line from the code that i linked to stopping the worker calls stop_process, which just kills the worker. Am i correct that this is not asynchronous, in that calling stop_worker and only then firing the callback would never result in the callback sometimes being received before the process is stopped? Whether there is a proper moment to run the callback is almost philosophical (since to my understanding this is not an asynchronous problem in the case of the code in the `update_tasks` function). I would argue that it would be good to have it consistent with the other situations in which the callback would be called, or to at least make it configurable to be so. Thank you for a fantastic library, this is the only problem i am running into :)

…

On Sun, Oct 2, 2022 at 5:31 PM Matteo Cafasso ***@***.***> wrote: Goal of a process pool is to abstract the management of the worker processes from the main application/service. Hence, there is not a proper moment when running a callback as the nature of the problem is asynchronous. The main reason why callbacks were ran before terminating the processes was to execute the post-processing as fast as possible with the idea of handling the process termination in a later phase. How is the main loop supposed to know what resources to clean up? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

noxdafox · 2022-10-03T18:36:46Z

This is not the first time something similar comes out and I now see a valid Use Case for it.

I cannot see issues where the current behaviour is expected (running a callback while the timing out/cancelled process still runs) so I think it would be safe changing the behaviour.

I ran some tests over the WE and did not identify issues. I will make a new release soon with this enhancement.

dcnieho · 2022-10-04T07:36:56Z

@noxdafox: thank you very much! Glad to hear!

A legitimate use case for callbacks is resources cleanup. This cannot happen while the processes are still running as they might be holding the resources to cleanup. Signed-off-by: Matteo Cafasso <noxdafox@gmail.com>

noxdafox · 2022-10-05T20:56:01Z

Issue resolved in 5.0.1. Thanks for reporting.

dcnieho · 2022-10-06T07:39:13Z

Thanks super much! However, i find this is not working as i expected (and i think this is possibly an issue with my expectations), because cancellation of a running task is not notified to the user callback through task_done (the calling of which we just reordered in 5.0.1). What happens when you cancel a running task scheduled on a process pool:

pebble/common.py: 77
set cancelled state, and invokes callbacks!
pebble/pool/process.py
in separate thread,
a. pool manager loop updates pool's status (line 187),
b. which updates task status (line 240),
c. which for every cancelled task (line 251)
d. stops the worker and calls task_done, which (line 303)
e. calls set_running_or_notify_cancel() on the task's future (line 311).

All of 2 will happen after 1, since it runs in a different thread that will be assigned a processing slice after 1 is done (if i understand Python correctly, or at least this ordering may occur). This is not easy to fix. Fixing it to behave like a finished or failed task (callback invoked after this status is actually reached) would involve adding an extra state to the PebbleFuture (e.g. BEING_CANCELLED), the task manager picking up on this state and cancelling the worker, and then task_done calling a new member function of PebbleFuture set_cancelled() (analogous to set_result() and set_exception()). cancel() would skip all this machinery if State is not running (could defer to super class implementation). set_cancelled() would presumably call add_cancelled() on any waiters, making calling set_running_or_notify_cancel() superfluous in this case(?). On top of that, what should cancel() return when invoked on a task that is already running? Probably False since it isn't cancelled yet.

All rather complicated in any case i guess. Perhaps i would have more success installing my own waiter (implementation detail as that is) which would get invoked at the right time for cancellation and normal or exceptional finishes now in 5.0.1! I'll give that a try. I'm happy to think along and try out the above however, should you wish to pursue it. Thanks in any case!

dcnieho · 2022-10-06T08:01:36Z

Yes, registering a waiter and (ab)using it as a way to get my callback run at the right time does the trick! So my problem is solved, if a bit dirtyly

noxdafox · 2022-10-06T15:22:11Z

I actually forgot that callbacks are executed on Future.cancel(). This is why I was mentioning that it's hard to provide guarantees given the asynchronous nature of the problem.

The callback should work on TimeoutError as expected but I won't change the Future behavior as it would diverge too much from the original concurrent.futures API design.

Glad you found a workaround for your need.

dcnieho · 2022-10-07T06:08:05Z

@noxdafox, fully agree, you shouldn't change future's behavior. It may however be good to document that Future.cancel() may lead to callback execution before the worker process is actually cancelled.

Thanks again for the 5.0.1 enhancement, which made my workaround possible!

…y, now (ab)use a waiter as a way to get notified, implementation detail as it is. Now we get notified at the right time and can finally clean up any mess we make when canceling or when the process fails. See noxdafox/pebble#105

A legitimate use case for callbacks is resources cleanup. This cannot happen while the processes are still running as they might be holding the resources to cleanup. Signed-off-by: Matteo Cafasso <noxdafox@gmail.com>

noxdafox added the enhancement label Oct 3, 2022

noxdafox closed this as completed Oct 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

user_done_callback fires too early on cancellation or timeout #105

user_done_callback fires too early on cancellation or timeout #105

dcnieho commented Sep 30, 2022

noxdafox commented Oct 2, 2022

dcnieho commented Oct 2, 2022 via email

noxdafox commented Oct 3, 2022

dcnieho commented Oct 4, 2022

noxdafox commented Oct 5, 2022

dcnieho commented Oct 6, 2022

dcnieho commented Oct 6, 2022

noxdafox commented Oct 6, 2022 •

edited

Loading

dcnieho commented Oct 7, 2022

user_done_callback fires too early on cancellation or timeout #105

user_done_callback fires too early on cancellation or timeout #105

Comments

dcnieho commented Sep 30, 2022

noxdafox commented Oct 2, 2022

dcnieho commented Oct 2, 2022 via email

noxdafox commented Oct 3, 2022

dcnieho commented Oct 4, 2022

noxdafox commented Oct 5, 2022

dcnieho commented Oct 6, 2022

dcnieho commented Oct 6, 2022

noxdafox commented Oct 6, 2022 • edited Loading

dcnieho commented Oct 7, 2022

noxdafox commented Oct 6, 2022 •

edited

Loading