Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand cancellation usability from native trio threads #2392

Merged

Conversation

richardsheridan
Copy link
Contributor

@richardsheridan richardsheridan commented Aug 8, 2022

Fixes #606. I used types as suggested there to distinguish the different messages from the worker thread, but it also IMHO nicely refactored the logic of the from_thread functions; they are now thin wrappers with docstrings. The host task containing a call to to_thread.run_sync is reused to handle calls to from_thread.run* if trio_token is None. The context of the thread is copied and used back in the main thread to propagate any contextvar changes that may have happened in the thread. I'd be happy to hear suggestions for applying the context in from_thread.run that could avoid two extra checkpoints!

This PR reuses the thread local object in _threads.py to stash the raise_cancel provided to abort, which basically implements a suggestion by @njsmith in #961 (comment):

Another idea: in run_sync_in_thread, start the thread, then block in wait_task_rescheduled. In our blocked state, if our >abort_fn is called, stash the raise_fn somewhere where the thread can see it. Then trio.from_thread.check_cancel() is just >something like if THREAD_STATE.raise_fn is not None: raise_fn().

This might be simpler, but I bet it will also have some complications by the time we figure out how to forward arbitrary >cancellations to the child thread, and also let the thread re-enter trio and deliver cancellations to those tasks.

This is used in the existing from_thread functions to avoid race conditions, but I would like to use this to allow a sqlite query to poll for cancellation. Currently, I would have to dig into the internals and use this recipe from @oremanj:

def make_cancel_poller():
    _cancel_status = trio.lowlevel.current_task()._cancel_status

    def poll_for_cancel(*args, **kwargs):
        if _cancel_status.effectively_cancelled:
            raise trio.Cancelled._create()

    return poll_for_cancel

Furthermore, that method is not thread safe if shielding is being toggled:

trio/trio/_core/_run.py

Lines 223 to 231 in 2d62ff0

# True iff the tasks in self._tasks should receive cancellations
# when they checkpoint. Always True when scope.cancel_called is True;
# may also be True due to a cancellation propagated from our
# parent. Unlike scope.cancel_called, this does not necessarily stay
# true once it becomes true. For example, we might become
# effectively cancelled due to the cancel scope two levels out
# becoming cancelled, but then the cancel scope one level out
# becomes shielded so we're not effectively cancelled anymore.
effectively_cancelled = attr.ib(default=False)

So I added a new function from_thread.check_cancelled which is a very simple wrapper to grab the appropriate raise_cancel function if it is there and call it. A question though: Would it be better to call it or just return it?

if trio.from_thread.check_cancelled():
   ... # do cleanup and don't raise

might be more along the lines of what people are expecting compared to

try:
    trio.from_thread.check_cancelled()
except:
    ... # do cleanup and don't raise

But I feel like people should be nudged to actually raise Cancelled if they are in fact responding to cancellation!

@codecov
Copy link

codecov bot commented Aug 8, 2022

Codecov Report

Merging #2392 (ab092b0) into master (b161fec) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #2392    +/-   ##
========================================
  Coverage   99.13%   99.14%            
========================================
  Files         115      115            
  Lines       17242    17432   +190     
  Branches     3085     3107    +22     
========================================
+ Hits        17093    17283   +190     
  Misses        104      104            
  Partials       45       45            
Files Coverage Δ
trio/_tests/test_threads.py 100.00% <100.00%> (ø)
trio/_threads.py 100.00% <100.00%> (ø)
trio/from_thread.py 100.00% <100.00%> (ø)

@graingert
Copy link
Member

I'd like to see threads be able to run code directly in the nursery that started it, this should fix this issue more robustly.

I'll edit this comment to link to what I mean shortly. Remind me if I don't

@richardsheridan
Copy link
Contributor Author

Is this similar to what you are suggesting: #606 (comment)

@richardsheridan richardsheridan changed the title Add trio.from_thread.check_cancelled api to allow threads to efficiently poll for cancellation Expand cancellation usability from native trio threads Mar 11, 2023
@richardsheridan
Copy link
Contributor Author

@graingert I have implemented your request (finally!). Would you have a look at it? Also, @agronholm this might have implications for AnyIO so it would be better to have your input sooner than later!

@agronholm
Copy link
Contributor

This looks very interesting, and I can use this. I'm looking forward to adding this to AnyIO once it's released for Trio.

@richardsheridan
Copy link
Contributor Author

it should be type-safe now, but i'm a typing newbie so maybe I could get a review on that aspect as well?

# Conflicts:
#	trio/_tests/test_threads.py
#	trio/_threads.py
#	trio/from_thread.py
Copy link
Member

@njsmith njsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do if the thread call gets "uncancelled"? Ie this sequence:

  1. Some outer enclosing scope enters the cancelled state, so run_sync becomes cancelled
  2. A more-inner enclosing scope gets mutated as cscope.shield = True, so run_sync is no longer cancelled
  3. Then the user calls check_cancelled

trio/_threads.py Outdated Show resolved Hide resolved
@richardsheridan
Copy link
Contributor Author

What does this do if the thread call gets "uncancelled"? Ie this sequence:

  1. Some outer enclosing scope enters the cancelled state, so run_sync becomes cancelled
  2. A more-inner enclosing scope gets mutated as cscope.shield = True, so run_sync is no longer cancelled
  3. Then the user calls check_cancelled

The straightforward implementation was to make this a level change in cancellation for the thread, so check_cancelled forever raises Cancelled IFF the cancellation delivery was attempted at least once. I believe the docs reflect this, though maybe that needs to be clarified.

I think this is also the "best" behavior because it is an approximation of what would happen if the thread were blocked in trio.from_thread.run(afn). In that case, even in your scenario, a Cancelled would come out of afn and take an indeterminate amount of time to propagate back to the host task, meaning that it could be delivered after raising shields.

Copy link
Member

@oremanj oremanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this generally looks excellent!

docs/source/reference-core.rst Outdated Show resolved Hide resolved
newsfragments/2392.feature.rst Outdated Show resolved Hide resolved
trio/_tests/test_threads.py Outdated Show resolved Hide resolved
trio/_tests/test_threads.py Show resolved Hide resolved
await to_thread_run_sync(sync_check, cancellable=True)

assert cancel_scope.cancelled_caught
assert await to_thread_run_sync(partial(queue.get, timeout=1))
Copy link
Member

@oremanj oremanj Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this suggests that from_thread.run_sync does raise Cancelled if the thread was abandoned. Where is that coming from? It can't be from the cancel scope, because the cancel scope doesn't exist anymore (in the general case). If it's a general "this thread was abandoned" marker, then that's mildly backcompat-breaking and I think I disprefer it regardless - "sync functions have no checkpoints and therefore can't be the cause of a Cancelled" is a nice and easy rule to remember and works better if we don't have corner-case exceptions to it.

I'm not sure how async from_thread.run from an abandoned thread should work. Options I see are:

  • run it in a system task in a cancelled cancel scope (and raise Cancelled in the thread if the cancel scope catches a Cancelled)
  • run it in a system task that isn't cancelled, like we did before this PR
  • raise Cancelled immediately

I would prefer one of the first two, since the third is unlike anything done elsewhere in Trio (Cancelled is raised at checkpoints, entry to an async function is not a checkpoint) and makes it hard to hand off resources safely (if you 'own' a resource and you pass it to a task that begins [async] with resource:, then that's normally safe but wouldn't be with the new semantics because Cancelled could be raised before the recipient enters the context manager).

Regardless of what we choose, we need to document it carefully, because it's a subtle consideration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I could on board for "run in cancelled system task" but documenting the nuances of "if you abandon your thread, your async functions are going to run in a weird different task" seems really hard.

What do you think of re-using RunFinishedError or making a TaskFinishedError to raise immediately? This would at least not trample on the semantics of Cancelled, but it wouldn't help with the resource handoff issue (on the other hand, users must already handle RunFinishedError in the present case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented the semantics I think you expected, and there were enough degrees of freedom in the test suite that it didn't need much adjustment. That makes me think that we should think of a way to (a) document the surprising abandon-to-system-task semantics and (b) anchor them in the test suite. I can't yet think of which behaviors really should be represented that way though.

Copy link
Member

@oremanj oremanj Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, on further consideration I agree my previous proposal is fairly wonky and we can probably do better. I think my preference would actually be to have cancellable=True threads use the old semantics (whether they've actually been cancelled yet or not): don't pass cancellation, don't run from_thread.run reentrant calls in the to_thread.run_sync task, just use a system task (that is not cancelled) like we did before this PR. Rationale:

  • Making stuff fail that worked before isn't great for backcompat. If someone is using the explicit "abandon thread when I'm cancelled" flag, then they might specifically want the abandoned thread to continue operating; our attempt to interrupt it as soon as we regain control might be counterproductive.
  • Since the cancellable=True thread explicitly might outlive the original task, any attempt it makes to call back into Trio is probably for something unrelated to the original task's context, so it's strange for that attempt to reflect the cancellation status of the original task.
  • I imagine a conceptual distinction between cancellable=False threads being an "extension of the current task", vs cancellable=True being "fire-and-forget + retrieve the result if you're still around when it becomes available". It just feels wrong to propagate cancellation in the latter case.

This discussion is underscoring for me that cancellable=True is a bad name; cancellable=False threads are arguably more cancellable! My first thought for a replacement is detachable but maybe you have a better idea. Doesn't need to go in this PR but maybe should arrive in the same release to reduce confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to suggest a new, overriding kwarg abandon_on_cancel. That tells you what happens and when, and my multiprocessing library could use kill_on_cancel instead. cancellable would then hang around indefinitely, deprecated but with unchanged semantics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm equally happy with abandon_on_cancel as a name.

trio/_threads.py Show resolved Hide resolved
trio/_threads.py Outdated Show resolved Hide resolved
trio/_threads.py Show resolved Hide resolved
trio/_threads.py Outdated Show resolved Hide resolved
trio/_threads.py Outdated Show resolved Hide resolved
@oremanj
Copy link
Member

oremanj commented Oct 11, 2023

I'd be happy to hear suggestions for applying the context in from_thread.run that could avoid two extra checkpoints!

That's the only available way to change a task's context without creating a new task, and it's probably lower-overhead than the new task approach (but I didn't measure). The new task approach would be to push the reentrant call into a system task, which can have whatever context you want, and wrap it in a cancel scope that gets cancelled when the to_thread.run_sync call gets cancelled. I probably prefer the approach here where the reentrant functions run underneath the original task, but they're both workable.

What does this do if the thread call gets "uncancelled"?

from_thread.check_cancelled(): The initial cancellation will call to_thread.run_sync's abort_fn, which will save the raise_cancel and use it the next time the thread checks for cancellation; the "uncancellation" will not be noticed. I think this is fine; it's well-defined, deterministic modulo the question of whether the thread checks for cancellation at all in between when cancellation is asserted and when the thread completes (which is a source of nondeterminism even without shielding), and the Cancelled exception will propagate fine through the shield (we have and test a similar edge case in the non-threads world).

from_thread.run(): The function that runs upon reentry is executed as a normal async call in the to_thread.run_sync task, so it does see the uncancellation. I'm realizing that this results in a discrepancy between from_thread.run(trio.sleep, 0) and from_thread.check_cancelled(), which is a bit unfortunate, and probably we should document it, but I don't think it's a problem in practice. from_thread.check_cancelled() can't observe that the 'uncancellation' has occurred at all unless it pokes into internals (the CancelStatus object). There's no way I know of to be woken up if 'uncancellation' occurs, so the main to_thread.run_sync task can't help it out. Maybe inspecting CancelStatus is fine/worthwhile? Doesn't quite seem worth it to me (it breaks our rules about how non-_core code is supposed to consume _core code), but it's a near thing. But it would be easy to add later if we decide to; I'm comfortable taking the simpler road for now.

There was a comment upthread that said inspecting CancelStatus wouldn't be thread-safe, but I don't think that's actually a problem: if the check is racing with the uncancellation then AFAICT it doesn't really matter whether the check sees the uncancellation or not. However my earlier snipept (which would raise trio.Cancelled._create() if the CancelStatus showed a cancelled status) was incorrect because it didn't handle the KeyboardInterrupt possibility. So we need the current raise_cancel logic regardless; any CancelStatus check for uncancellation would be in addition to that.

@richardsheridan
Copy link
Contributor Author

from_thread.check_cancelled(): The initial cancellation will call to_thread.run_sync's abort_fn, which will save the raise_cancel and use it the next time the thread checks for cancellation; the "uncancellation" will not be noticed. I think this is fine; it's well-defined, deterministic [...]

I think there may be another nondeterministic edge case: if the cancellation arrives while the task is busy in await message.run() AND that work happens to be shielded, AND a shield gets raised uptree, then the main "have we ever been cancelled" check will not be triggered. I think a nursery with a task with the sole purpose of waiting for cancellation might be needed to get that exactly right.

As long as we need a nursery to get those semantics right, we might as well also consider an alternative implementation where the each from_thread call spawns a new task in that nursery. After all, it's not really the fact that we are reusing the task which is so useful, but rather that it exists in the right cancellation, contextvar, and treevar contexts. Child tasks would have all these properties, although they might be marginally less efficient.

@oremanj
Copy link
Member

oremanj commented Oct 13, 2023

I think there may be another nondeterministic edge case: if the cancellation arrives while the task is busy in await message.run() AND that work happens to be shielded, AND a shield gets raised uptree, then the main "have we ever been cancelled" check will not be triggered. I think a nursery with a task with the sole purpose of waiting for cancellation might be needed to get that exactly right.

I think it depends what you mean by "exactly right". :-) The analogous situation without the thread would not raise a cancellation in this case, so I don't think the thread needs to either. In any event I don't think it would be worth the trouble and overhead of a whole separate task to wait for cancellation.

In general the way Trio behaves is that you need to execute a checkpoint while you are effectively cancelled (have a cancelled cancel scope closer to you than the closest shielded cancel scope) in order to raise Cancelled. If you perform a cancellation and then a shielding in the way that causes the effectively-cancelled status to toggle back and forth, but no checkpoints observe the "cancelled" state, then for your purposes it didn't happen. I think this analogizes nicely to threads if we take the "checkpoints" to be calls to check_cancelled() + (unshielded) checkpoints in reentrant calls to from_thread.run() functions. We can only get that exact behavior if we peek inside the internal CancelStatus object on check_cancelled; otherwise we will get a more conservative / "cancel-happy" behavior where check_cancelled will raise Cancelled if the to_thread.run_sync() call has ever been effectively cancelled, even if it's not anymore. Which is also reasonable! We just need to document the subtleties here.

in short, cancellable threads always use system tasks. normal threads use the host task, unless passed a token
Copy link
Member

@oremanj oremanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thank you for seeing this through! I think the main remaining thing is documentation wording, for which I've left some suggestions. It looks like there are also a few new lines that aren't covered by the tests. You're welcome to merge once you've resolved the remaining comments.

docs/source/reference-core.rst Outdated Show resolved Hide resolved
trio/_threads.py Outdated Show resolved Hide resolved
trio/_threads.py Outdated Show resolved Hide resolved
trio/_threads.py Outdated Show resolved Hide resolved
richardsheridan and others added 2 commits October 17, 2023 21:50
Documentation clarifications

Fix function name typo leading to missed coverage

Co-authored-by: Joshua Oreman <oremanj@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Propagating cancellation through threads
8 participants