Expand cancellation usability from native trio threads #2392

richardsheridan · 2022-08-08T16:56:15Z

Fixes #606. I used types as suggested there to distinguish the different messages from the worker thread, but it also IMHO nicely refactored the logic of the from_thread functions; they are now thin wrappers with docstrings. The host task containing a call to to_thread.run_sync is reused to handle calls to from_thread.run* if trio_token is None. The context of the thread is copied and used back in the main thread to propagate any contextvar changes that may have happened in the thread. I'd be happy to hear suggestions for applying the context in from_thread.run that could avoid two extra checkpoints!

This PR reuses the thread local object in _threads.py to stash the raise_cancel provided to abort, which basically implements a suggestion by @njsmith in #961 (comment):

Another idea: in run_sync_in_thread, start the thread, then block in wait_task_rescheduled. In our blocked state, if our >abort_fn is called, stash the raise_fn somewhere where the thread can see it. Then trio.from_thread.check_cancel() is just >something like if THREAD_STATE.raise_fn is not None: raise_fn().

This might be simpler, but I bet it will also have some complications by the time we figure out how to forward arbitrary >cancellations to the child thread, and also let the thread re-enter trio and deliver cancellations to those tasks.

This is used in the existing from_thread functions to avoid race conditions, but I would like to use this to allow a sqlite query to poll for cancellation. Currently, I would have to dig into the internals and use this recipe from @oremanj:

def make_cancel_poller():
    _cancel_status = trio.lowlevel.current_task()._cancel_status

    def poll_for_cancel(*args, **kwargs):
        if _cancel_status.effectively_cancelled:
            raise trio.Cancelled._create()

    return poll_for_cancel

Furthermore, that method is not thread safe if shielding is being toggled:

trio/trio/_core/_run.py

Lines 223 to 231 in 2d62ff0

    
           # True iff the tasks in self._tasks should receive cancellations 
        
           # when they checkpoint. Always True when scope.cancel_called is True; 
        
           # may also be True due to a cancellation propagated from our 
        
           # parent.  Unlike scope.cancel_called, this does not necessarily stay 
        
           # true once it becomes true. For example, we might become 
        
           # effectively cancelled due to the cancel scope two levels out 
        
           # becoming cancelled, but then the cancel scope one level out 
        
           # becomes shielded so we're not effectively cancelled anymore. 
        
           effectively_cancelled = attr.ib(default=False)

So I added a new function from_thread.check_cancelled which is a very simple wrapper to grab the appropriate raise_cancel function if it is there and call it. A question though: Would it be better to call it or just return it?

if trio.from_thread.check_cancelled():
   ... # do cleanup and don't raise

might be more along the lines of what people are expecting compared to

try:
    trio.from_thread.check_cancelled()
except:
    ... # do cleanup and don't raise

But I feel like people should be nudged to actually raise Cancelled if they are in fact responding to cancellation!

codecov · 2022-08-08T17:00:28Z

Codecov Report

Merging #2392 (ab092b0) into master (b161fec) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files

@@           Coverage Diff            @@
##           master    #2392    +/-   ##
========================================
  Coverage   99.13%   99.14%            
========================================
  Files         115      115            
  Lines       17242    17432   +190     
  Branches     3085     3107    +22     
========================================
+ Hits        17093    17283   +190     
  Misses        104      104            
  Partials       45       45

Files	Coverage Δ
trio/_tests/test_threads.py	`100.00% <100.00%> (ø)`
trio/_threads.py	`100.00% <100.00%> (ø)`
trio/from_thread.py	`100.00% <100.00%> (ø)`

graingert · 2022-08-08T18:10:14Z

I'd like to see threads be able to run code directly in the nursery that started it, this should fix this issue more robustly.

I'll edit this comment to link to what I mean shortly. Remind me if I don't

richardsheridan · 2022-08-08T18:31:47Z

Is this similar to what you are suggesting: #606 (comment)

…tly poll for cancellation

avoids creating a Cancelled manually and a double-reschedule race

richardsheridan · 2023-03-11T00:44:44Z

@graingert I have implemented your request (finally!). Would you have a look at it? Also, @agronholm this might have implications for AnyIO so it would be better to have your input sooner than later!

from_thread.run_sync now only raises in the cancellable=True case

agronholm · 2023-09-03T17:13:49Z

This looks very interesting, and I can use this. I'm looking forward to adding this to AnyIO once it's released for Trio.

richardsheridan · 2023-09-03T17:14:23Z

it should be type-safe now, but i'm a typing newbie so maybe I could get a review on that aspect as well?

# Conflicts: # trio/_tests/test_threads.py # trio/_threads.py # trio/from_thread.py

njsmith

What does this do if the thread call gets "uncancelled"? Ie this sequence:

Some outer enclosing scope enters the cancelled state, so run_sync becomes cancelled
A more-inner enclosing scope gets mutated as cscope.shield = True, so run_sync is no longer cancelled
Then the user calls check_cancelled

trio/_threads.py

richardsheridan · 2023-09-03T22:46:20Z

What does this do if the thread call gets "uncancelled"? Ie this sequence:

Some outer enclosing scope enters the cancelled state, so run_sync becomes cancelled

A more-inner enclosing scope gets mutated as cscope.shield = True, so run_sync is no longer cancelled

Then the user calls check_cancelled

The straightforward implementation was to make this a level change in cancellation for the thread, so check_cancelled forever raises Cancelled IFF the cancellation delivery was attempted at least once. I believe the docs reflect this, though maybe that needs to be clarified.

I think this is also the "best" behavior because it is an approximation of what would happen if the thread were blocked in trio.from_thread.run(afn). In that case, even in your scenario, a Cancelled would come out of afn and take an indeterminate amount of time to propagate back to the host task, meaning that it could be delivered after raising shields.

oremanj

Thank you, this generally looks excellent!

docs/source/reference-core.rst

newsfragments/2392.feature.rst

trio/_tests/test_threads.py

oremanj · 2023-10-06T05:10:22Z

trio/_tests/test_threads.py

+        await to_thread_run_sync(sync_check, cancellable=True)
+
+    assert cancel_scope.cancelled_caught
+    assert await to_thread_run_sync(partial(queue.get, timeout=1))


Hm, this suggests that from_thread.run_sync does raise Cancelled if the thread was abandoned. Where is that coming from? It can't be from the cancel scope, because the cancel scope doesn't exist anymore (in the general case). If it's a general "this thread was abandoned" marker, then that's mildly backcompat-breaking and I think I disprefer it regardless - "sync functions have no checkpoints and therefore can't be the cause of a Cancelled" is a nice and easy rule to remember and works better if we don't have corner-case exceptions to it.

I'm not sure how async from_thread.run from an abandoned thread should work. Options I see are:

run it in a system task in a cancelled cancel scope (and raise Cancelled in the thread if the cancel scope catches a Cancelled)

run it in a system task that isn't cancelled, like we did before this PR

raise Cancelled immediately

I would prefer one of the first two, since the third is unlike anything done elsewhere in Trio (Cancelled is raised at checkpoints, entry to an async function is not a checkpoint) and makes it hard to hand off resources safely (if you 'own' a resource and you pass it to a task that begins [async] with resource:, then that's normally safe but wouldn't be with the new semantics because Cancelled could be raised before the recipient enters the context manager).

Regardless of what we choose, we need to document it carefully, because it's a subtle consideration.

I think I could on board for "run in cancelled system task" but documenting the nuances of "if you abandon your thread, your async functions are going to run in a weird different task" seems really hard.

What do you think of re-using RunFinishedError or making a TaskFinishedError to raise immediately? This would at least not trample on the semantics of Cancelled, but it wouldn't help with the resource handoff issue (on the other hand, users must already handle RunFinishedError in the present case).

I implemented the semantics I think you expected, and there were enough degrees of freedom in the test suite that it didn't need much adjustment. That makes me think that we should think of a way to (a) document the surprising abandon-to-system-task semantics and (b) anchor them in the test suite. I can't yet think of which behaviors really should be represented that way though.

Yeah, on further consideration I agree my previous proposal is fairly wonky and we can probably do better. I think my preference would actually be to have cancellable=True threads use the old semantics (whether they've actually been cancelled yet or not): don't pass cancellation, don't run from_thread.run reentrant calls in the to_thread.run_sync task, just use a system task (that is not cancelled) like we did before this PR. Rationale:

Making stuff fail that worked before isn't great for backcompat. If someone is using the explicit "abandon thread when I'm cancelled" flag, then they might specifically want the abandoned thread to continue operating; our attempt to interrupt it as soon as we regain control might be counterproductive.

Since the cancellable=True thread explicitly might outlive the original task, any attempt it makes to call back into Trio is probably for something unrelated to the original task's context, so it's strange for that attempt to reflect the cancellation status of the original task.

I imagine a conceptual distinction between cancellable=False threads being an "extension of the current task", vs cancellable=True being "fire-and-forget + retrieve the result if you're still around when it becomes available". It just feels wrong to propagate cancellation in the latter case.

This discussion is underscoring for me that cancellable=True is a bad name; cancellable=False threads are arguably more cancellable! My first thought for a replacement is detachable but maybe you have a better idea. Doesn't need to go in this PR but maybe should arrive in the same release to reduce confusion.

I was going to suggest a new, overriding kwarg abandon_on_cancel. That tells you what happens and when, and my multiprocessing library could use kill_on_cancel instead. cancellable would then hang around indefinitely, deprecated but with unchanged semantics.

I'm equally happy with abandon_on_cancel as a name.

trio/_threads.py

…_cancelled # Conflicts: # trio/_tests/verify_types_darwin.json # trio/_tests/verify_types_linux.json # trio/_tests/verify_types_windows.json

oremanj · 2023-10-11T05:28:17Z

I'd be happy to hear suggestions for applying the context in from_thread.run that could avoid two extra checkpoints!

That's the only available way to change a task's context without creating a new task, and it's probably lower-overhead than the new task approach (but I didn't measure). The new task approach would be to push the reentrant call into a system task, which can have whatever context you want, and wrap it in a cancel scope that gets cancelled when the to_thread.run_sync call gets cancelled. I probably prefer the approach here where the reentrant functions run underneath the original task, but they're both workable.

What does this do if the thread call gets "uncancelled"?

from_thread.check_cancelled(): The initial cancellation will call to_thread.run_sync's abort_fn, which will save the raise_cancel and use it the next time the thread checks for cancellation; the "uncancellation" will not be noticed. I think this is fine; it's well-defined, deterministic modulo the question of whether the thread checks for cancellation at all in between when cancellation is asserted and when the thread completes (which is a source of nondeterminism even without shielding), and the Cancelled exception will propagate fine through the shield (we have and test a similar edge case in the non-threads world).

from_thread.run(): The function that runs upon reentry is executed as a normal async call in the to_thread.run_sync task, so it does see the uncancellation. I'm realizing that this results in a discrepancy between from_thread.run(trio.sleep, 0) and from_thread.check_cancelled(), which is a bit unfortunate, and probably we should document it, but I don't think it's a problem in practice. from_thread.check_cancelled() can't observe that the 'uncancellation' has occurred at all unless it pokes into internals (the CancelStatus object). There's no way I know of to be woken up if 'uncancellation' occurs, so the main to_thread.run_sync task can't help it out. Maybe inspecting CancelStatus is fine/worthwhile? Doesn't quite seem worth it to me (it breaks our rules about how non-_core code is supposed to consume _core code), but it's a near thing. But it would be easy to add later if we decide to; I'm comfortable taking the simpler road for now.

There was a comment upthread that said inspecting CancelStatus wouldn't be thread-safe, but I don't think that's actually a problem: if the check is racing with the uncancellation then AFAICT it doesn't really matter whether the check sees the uncancellation or not. However my earlier snipept (which would raise trio.Cancelled._create() if the CancelStatus showed a cancelled status) was incorrect because it didn't handle the KeyboardInterrupt possibility. So we need the current raise_cancel logic regardless; any CancelStatus check for uncancellation would be in addition to that.

richardsheridan · 2023-10-11T18:43:49Z

from_thread.check_cancelled(): The initial cancellation will call to_thread.run_sync's abort_fn, which will save the raise_cancel and use it the next time the thread checks for cancellation; the "uncancellation" will not be noticed. I think this is fine; it's well-defined, deterministic [...]

I think there may be another nondeterministic edge case: if the cancellation arrives while the task is busy in await message.run() AND that work happens to be shielded, AND a shield gets raised uptree, then the main "have we ever been cancelled" check will not be triggered. I think a nursery with a task with the sole purpose of waiting for cancellation might be needed to get that exactly right.

As long as we need a nursery to get those semantics right, we might as well also consider an alternative implementation where the each from_thread call spawns a new task in that nursery. After all, it's not really the fact that we are reusing the task which is so useful, but rather that it exists in the right cancellation, contextvar, and treevar contexts. Child tasks would have all these properties, although they might be marginally less efficient.

oremanj · 2023-10-13T01:49:51Z

I think there may be another nondeterministic edge case: if the cancellation arrives while the task is busy in await message.run() AND that work happens to be shielded, AND a shield gets raised uptree, then the main "have we ever been cancelled" check will not be triggered. I think a nursery with a task with the sole purpose of waiting for cancellation might be needed to get that exactly right.

I think it depends what you mean by "exactly right". :-) The analogous situation without the thread would not raise a cancellation in this case, so I don't think the thread needs to either. In any event I don't think it would be worth the trouble and overhead of a whole separate task to wait for cancellation.

In general the way Trio behaves is that you need to execute a checkpoint while you are effectively cancelled (have a cancelled cancel scope closer to you than the closest shielded cancel scope) in order to raise Cancelled. If you perform a cancellation and then a shielding in the way that causes the effectively-cancelled status to toggle back and forth, but no checkpoints observe the "cancelled" state, then for your purposes it didn't happen. I think this analogizes nicely to threads if we take the "checkpoints" to be calls to check_cancelled() + (unshielded) checkpoints in reentrant calls to from_thread.run() functions. We can only get that exact behavior if we peek inside the internal CancelStatus object on check_cancelled; otherwise we will get a more conservative / "cancel-happy" behavior where check_cancelled will raise Cancelled if the to_thread.run_sync() call has ever been effectively cancelled, even if it's not anymore. Which is also reasonable! We just need to document the subtleties here.

in short, cancellable threads always use system tasks. normal threads use the host task, unless passed a token

oremanj

Looking good, thank you for seeing this through! I think the main remaining thing is documentation wording, for which I've left some suggestions. It looks like there are also a few new lines that aren't covered by the tests. You're welcome to merge once you've resolved the remaining comments.

docs/source/reference-core.rst

trio/_threads.py

Documentation clarifications Fix function name typo leading to missed coverage Co-authored-by: Joshua Oreman <oremanj@gmail.com>

richardsheridan added 11 commits March 9, 2023 07:44

Add tests and stub for thread cancellation and reuse.

d9d02db

make from_thread tests pass

0c0ff10

remove failing tests and untested feature

5f86fb2

refactor system tasks to use messages as well

5f75c86

factor out trio_token checking logic

7ccf0f7

Add trio.from_thread.check_cancelled api to allow threads to efficien…

6d3b2d6

…tly poll for cancellation

Document trio.from_thread.check_cancelled

2d911ea

Add Newsfragment

bc6c82a

Linting and fix docs

644b913

use cancel_register in _send_message_to_host_task

b87456f

avoids creating a Cancelled manually and a double-reschedule race

Document thread reuse semantics

f6b27b2

richardsheridan force-pushed the from_thread_check_cancelled branch from 53f48a2 to f6b27b2 Compare March 11, 2023 00:25

richardsheridan changed the title ~~Add trio.from_thread.check_cancelled api to allow threads to efficiently poll for cancellation~~ Expand cancellation usability from native trio threads Mar 11, 2023

richardsheridan added 3 commits March 11, 2023 00:19

test coverage for cancelled host task

2b088cc

unnest unprotected_fn

832da41

flip _send_message_to_host_task.in_trio_thread semantics

5a10e9b

from_thread.run_sync now only raises in the cancellable=True case

richardsheridan force-pushed the from_thread_check_cancelled branch from 7ec12a5 to 7a2456e Compare September 3, 2023 17:18

Merge branch 'master' into from_thread_check_cancelled

a09aa84

# Conflicts: # trio/_tests/test_threads.py # trio/_threads.py # trio/from_thread.py

richardsheridan force-pushed the from_thread_check_cancelled branch from 7a2456e to a09aa84 Compare September 3, 2023 17:55

richardsheridan added 3 commits September 3, 2023 13:59

Aesthetic refactor of API functions

787d286

fix typos in docs

881180f

remove extra cvar toggling

c0e11d4

njsmith reviewed Sep 3, 2023

View reviewed changes

trio/_threads.py Outdated Show resolved Hide resolved

oremanj reviewed Oct 6, 2023

View reviewed changes

richardsheridan added 8 commits October 7, 2023 11:27

Update docs based on review comments

c1990d4

Merge remote-tracking branch 'upstream/master' into from_thread_check…

d4d10bb

…_cancelled # Conflicts: # trio/_tests/verify_types_darwin.json # trio/_tests/verify_types_linux.json # trio/_tests/verify_types_windows.json

fiddle type completeness

daed7bb

fix test_from_thread_run_during_shutdown

2be054a

apply nits from code review

a667a52

document "extra" checkpoints needed to pick up context

d6df308

add TODOs for future assert_never type cleverness

9f4e79e

implement cancellation semantics suggestions from code review

0e18c93

richardsheridan force-pushed the from_thread_check_cancelled branch from 67de17b to 0941d05 Compare October 8, 2023 23:39

adjust coverage pragma

eab30c4

richardsheridan force-pushed the from_thread_check_cancelled branch from 0941d05 to eab30c4 Compare October 8, 2023 23:40

richardsheridan force-pushed the from_thread_check_cancelled branch from ca9b856 to 5d93ed9 Compare October 15, 2023 16:28

revise and document cancellation semantics

2f79f15

in short, cancellable threads always use system tasks. normal threads use the host task, unless passed a token

richardsheridan force-pushed the from_thread_check_cancelled branch from 5d93ed9 to 2f79f15 Compare October 15, 2023 18:13

oremanj approved these changes Oct 17, 2023

View reviewed changes

docs/source/reference-core.rst Outdated Show resolved Hide resolved

trio/_threads.py Outdated Show resolved Hide resolved

trio/_threads.py Outdated Show resolved Hide resolved

richardsheridan commented Oct 18, 2023

View reviewed changes

trio/_threads.py Outdated Show resolved Hide resolved

richardsheridan and others added 2 commits October 17, 2023 21:50

Apply suggestions from code review

96e45c5

Documentation clarifications Fix function name typo leading to missed coverage Co-authored-by: Joshua Oreman <oremanj@gmail.com>

Merge branch 'master' into from_thread_check_cancelled

ab092b0

richardsheridan enabled auto-merge October 18, 2023 02:00

richardsheridan disabled auto-merge October 18, 2023 02:06

richardsheridan merged commit b324b3a into python-trio:master Oct 18, 2023
28 of 30 checks passed

richardsheridan deleted the from_thread_check_cancelled branch October 21, 2023 17:17

jakkdl mentioned this pull request Oct 23, 2023

timeout coverage/pytest with sigint to help debug CI timeout issues #2827

Closed

This was referenced Oct 24, 2023

Fix thread reentry deadlocks #2832

Merged

In to_thread_run_sync(), add abandon_on_cancel= as an alias for the cancellable= flag #2841

Merged

richardsheridan mentioned this pull request Nov 6, 2023

Implemented voluntary cancellation in worker threads agronholm/anyio#629

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand cancellation usability from native trio threads #2392

Expand cancellation usability from native trio threads #2392

richardsheridan commented Aug 8, 2022 •

edited

codecov bot commented Aug 8, 2022 •

edited

graingert commented Aug 8, 2022

richardsheridan commented Aug 8, 2022

richardsheridan commented Mar 11, 2023

agronholm commented Sep 3, 2023

richardsheridan commented Sep 3, 2023

njsmith left a comment

richardsheridan commented Sep 3, 2023

oremanj left a comment

oremanj Oct 6, 2023 •

edited

richardsheridan Oct 7, 2023 •

edited

richardsheridan Oct 9, 2023

oremanj Oct 11, 2023 •

edited

richardsheridan Oct 11, 2023

oremanj Oct 13, 2023

oremanj commented Oct 11, 2023

richardsheridan commented Oct 11, 2023

oremanj commented Oct 13, 2023

oremanj left a comment

	# True iff the tasks in self._tasks should receive cancellations
	# when they checkpoint. Always True when scope.cancel_called is True;
	# may also be True due to a cancellation propagated from our
	# parent. Unlike scope.cancel_called, this does not necessarily stay
	# true once it becomes true. For example, we might become
	# effectively cancelled due to the cancel scope two levels out
	# becoming cancelled, but then the cancel scope one level out
	# becomes shielded so we're not effectively cancelled anymore.
	effectively_cancelled = attr.ib(default=False)

Expand cancellation usability from native trio threads #2392

Expand cancellation usability from native trio threads #2392

Conversation

richardsheridan commented Aug 8, 2022 • edited

codecov bot commented Aug 8, 2022 • edited

Codecov Report

graingert commented Aug 8, 2022

richardsheridan commented Aug 8, 2022

richardsheridan commented Mar 11, 2023

agronholm commented Sep 3, 2023

richardsheridan commented Sep 3, 2023

njsmith left a comment

Choose a reason for hiding this comment

richardsheridan commented Sep 3, 2023

oremanj left a comment

Choose a reason for hiding this comment

oremanj Oct 6, 2023 • edited

Choose a reason for hiding this comment

richardsheridan Oct 7, 2023 • edited

Choose a reason for hiding this comment

richardsheridan Oct 9, 2023

Choose a reason for hiding this comment

oremanj Oct 11, 2023 • edited

Choose a reason for hiding this comment

richardsheridan Oct 11, 2023

Choose a reason for hiding this comment

oremanj Oct 13, 2023

Choose a reason for hiding this comment

oremanj commented Oct 11, 2023

richardsheridan commented Oct 11, 2023

oremanj commented Oct 13, 2023

oremanj left a comment

Choose a reason for hiding this comment

richardsheridan commented Aug 8, 2022 •

edited

codecov bot commented Aug 8, 2022 •

edited

oremanj Oct 6, 2023 •

edited

richardsheridan Oct 7, 2023 •

edited

oremanj Oct 11, 2023 •

edited