Skip to content

refactor: make CancelToken Promise-based#13303

Merged
nomeata merged 9 commits intomasterfrom
joachim/cancellation-promise
Apr 28, 2026
Merged

refactor: make CancelToken Promise-based#13303
nomeata merged 9 commits intomasterfrom
joachim/cancellation-promise

Conversation

@nomeata
Copy link
Copy Markdown
Collaborator

@nomeata nomeata commented Apr 7, 2026

This PR moves IO.CancelToken from Init.System.IO to its own file Init.System.CancelToken, backed by IO.Promise Unit instead of IO.Ref Bool. This enables non-polling cancellation propagation: the token's underlying promise can be used directly with IO.waitAny, and callbacks can be registered to fire when cancellation is requested.

The structure carries both the promise and a plain IO.Ref Bool flag, set in lockstep by set. isSet reads the flag directly (used on hot paths like Core.checkInterrupted); task/onSet go through the promise. The avoids a ~0.4% regression that a pure-promise representation introduced.

API additions:

  • CancelToken.task : Task (Option Unit). Returns the underlying promise's result? task directly — the same task object on every call, so further Task.map/BaseIO.bindTask dependencies can be safely attached. Resolves with some () when set is called, or none if the token is dropped without ever being set.
  • CancelToken.onSet : BaseIO Unit → BaseIO Unit. Registers a callback that runs synchronously on the cancelling thread when set is called (or immediately if the token is already set). Implemented via BaseIO.chainTask on result?, so no fresh Task.map per call and no GC hazard.

Runtime cleanup:

  • Add LEAN_TASK_STATE_{WAITING,RUNNING,FINISHED} constants in lean.h matching IO.TaskState.
  • Factor lean::promise_is_resolved inline in object.h, replacing three open-coded lean_io_get_task_state_core(...) == 2 checks (in interrupt.cpp, uv/timer.cpp, uv/signal.cpp).
  • Drop the manual inc_ref(g_cancel_tk) in check_interrupted; the token is owned by the enclosing scope_cancel_tk for the duration of the call (documented).
  • Replace the bare lean_always_assert(g_task_manager) in lean_promise_new with an explicit lean_internal_panic carrying a message that names Promise.new, identifies the typical trigger (initialize blocks, transitively via IO.CancelToken.new), and recommends lazy construction. Without this, users got an opaque "LEAN ASSERTION VIOLATION ... Condition: g_task_manager" with no actionable hint.

Behavioural notes documented inline:

  • new cannot be called from initialize blocks (task manager not running yet); construct lazily.
  • task documents the dropped-promise case (none) and steers callers to onSet for callback chaining.

A consumer of onSet for parent → child cancel-token propagation in parallel tactic combinators is in #13428 (fixes #13300).

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

Move CancelToken from Init.System.IO to its own file Init.System.CancelToken,
backed by IO.Promise Unit instead of IO.Ref Bool. This adds a CancelToken.task
method returning a Task Unit that completes when cancelled, enabling
non-polling cancellation propagation via IO.mapTask.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nomeata nomeata added the merge-ci Enable merge queue CI checks for PR. In particular, produce artifacts for all major platforms. label Apr 7, 2026
@github-actions github-actions Bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Apr 7, 2026
@mathlib-lean-pr-testing
Copy link
Copy Markdown

mathlib-lean-pr-testing Bot commented Apr 7, 2026

Mathlib CI status (docs):

  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase d48863fc2bb15ffbb9b393fba4b5b0bd767fe391 --onto 4f6bcc5adac8d6234a9e3def3675402cb89823c6. You can force Mathlib CI using the force-mathlib-ci label. (2026-04-07 13:41:37)
  • ✅ Mathlib branch lean-pr-testing-13303 has successfully built against this PR. (2026-04-28 12:29:13) View Log
  • ✅ Mathlib branch lean-pr-testing-13303 has successfully built against this PR. (2026-04-28 18:42:12) View Log

@leanprover-bot
Copy link
Copy Markdown
Collaborator

leanprover-bot commented Apr 7, 2026

Reference manual CI status:

  • ❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase d48863fc2bb15ffbb9b393fba4b5b0bd767fe391 --onto 861bc19e0c1c45b5766cc5c9ef0488568368b236. You can force reference manual CI using the force-manual-ci label. (2026-04-07 13:41:39)
  • ❗ Reference manual CI can not be attempted yet, as the nightly-testing-2026-04-27 tag does not exist there yet. We will retry when you push more commits. If you rebase your branch onto nightly-with-manual, reference manual CI should run now. You can force reference manual CI using the force-manual-ci label. (2026-04-28 11:30:27)

nomeata and others added 2 commits April 7, 2026 16:04
This PR updates `check_interrupted()` in the C++ runtime to check the
Promise-based `CancelToken` directly via `lean_io_get_task_state_core`,
rather than calling the Lean-exported `lean_io_cancel_token_is_set`.
The old approach failed during stage2 build because the stage0-compiled
export assumed the IORef-based representation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update `delabConst.lean` and `printStructure.lean` to reflect that
`IO.CancelToken` now wraps `IO.Promise Unit` instead of `IO.Ref Bool`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
nomeata and others added 4 commits April 28, 2026 09:38
Add `LEAN_TASK_STATE_{WAITING,RUNNING,FINISHED}` constants in `lean.h`
matching `IO.TaskState`, and an `lean::promise_is_resolved` inline in
`object.h`. Replace the open-coded `lean_io_get_task_state_core(...) == 2`
checks in `interrupt.cpp`, `uv/timer.cpp`, and `uv/signal.cpp` with this
helper.

Also drop the manual `inc_ref(g_cancel_tk)` in `check_interrupted`: the
token is owned by the enclosing `scope_cancel_tk`, so it's alive for the
duration of the call. Document this with a comment.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add `onSet` for registering a callback to run when the token is set.
Internally uses `BaseIO.chainTask` on the promise's `result?` task
directly; this avoids the fresh `Task.map` produced by `Promise.result!`
that `task` exposes, which is a footgun when chaining further dependencies
(intermediate task gets GC'd, dep chain silently breaks). Document the
hazard on `task`.

Document that `new` cannot be called from `initialize` blocks (task
manager not running yet) and recommend lazy construction.

Fix a race in `getUnblockedCancelTk` (test helper): previously two
callers could each create a fresh token with only one stored; use
`modifyGet` for an atomic check-or-install.

Add `tests/elab/cancel_token.lean` exercising `onSet` (before/after set,
multiple callbacks) and `task`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Switch from `tk.promise.result!` (which wraps `result?` in a fresh
`Task.map` per call, and panics if the promise is dropped without
being set) to `tk.promise.result?` directly. The signature changes
to `Task (Option Unit)`.

The returned task is now stable across calls — same task object every
time — so chaining further dependencies (`Task.map`, `BaseIO.bindTask`)
on it is safe. The `Option` distinguishes a normal `set` (`some ()`)
from the token being dropped without ever being set (`none`).

For attaching a callback when the token is set, prefer `onSet`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
In compiled binaries (and `lean --run`), the task manager is initialized
*after* `io_mark_end_initialization`, so `initialize` blocks see
`g_task_manager == nullptr`. Previously this surfaced as a bare
`LEAN ASSERTION VIOLATION ... Condition: g_task_manager` with no clue
about the cause. Replace the assert with `lean_internal_panic` carrying
a message that names `Promise.new`, mentions the typical trigger
(`initialize` / `IO.CancelToken.new`), and recommends lazy construction.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@nomeata
Copy link
Copy Markdown
Collaborator Author

nomeata commented Apr 28, 2026

!radar

@leanprover-radar
Copy link
Copy Markdown

leanprover-radar commented Apr 28, 2026

Benchmark results for a5ae5cb against d48863f are in. There are significant results. @nomeata

Warning

These warnings may indicate that the benchmark results are not directly comparable, for example due to changes in the runner configuration or hardware.

  • Bench repo commit hashes for run build differ between commits.
  • Bench repo commit hashes for run other differ between commits.
  • 🟥 build//instructions: +47.3G (+0.39%)

Large changes (1🟥)

  • 🟥 elab/grind_bitvec2//instructions: +2.7G (+1.40%)

Medium changes (6🟥)

  • 🟥 build/module/Init.Data.BitVec.Lemmas//instructions: +1.1G (+0.81%)
  • 🟥 build/module/Std.Data.DTreeMap.Internal.Lemmas//instructions: +1.0G (+0.48%)
  • 🟥 build/module/Std.Data.HashMap.RawLemmas//instructions: +1.5G (+1.18%)
  • 🟥 build/profile/type checking//wall-clock: +9s (+8.13%)
  • 🟥 elab/grind_list2//instructions: +523.4M (+1.03%)
  • 🟥 misc/import Init.Data.BitVec.Lemmas//instructions: +1.1G (+0.91%)

Small changes (324🟥)

Too many entries to display here. View the full report on radar instead.

@nomeata
Copy link
Copy Markdown
Collaborator Author

nomeata commented Apr 28, 2026

!radar

@leanprover-radar
Copy link
Copy Markdown

leanprover-radar commented Apr 28, 2026

Benchmark results for 633962f against 3c6317b are in. There are significant results. @nomeata

  • 🟥 build//instructions: +48.2G (+0.40%)

Large changes (2🟥)

  • 🟥 build/profile/type checking//wall-clock: +12s (+11.11%)
  • 🟥 elab/grind_bitvec2//instructions: +2.7G (+1.38%)

Medium changes (5🟥)

  • 🟥 build/module/Std.Data.DTreeMap.Internal.Lemmas//instructions: +1.0G (+0.49%)
  • 🟥 build/module/Std.Data.HashMap.RawLemmas//instructions: +1.4G (+1.12%)
  • 🟥 elab/grind_list2//instructions: +559.1M (+1.12%)
  • 🟥 misc/import Init.Data.BitVec.Lemmas//instructions: +1.0G (+0.84%)
  • 🟥 misc/import Lean//instructions: +30.9M (+2.02%)

Small changes (344🟥)

Too many entries to display here. View the full report on radar instead.

@nomeata nomeata marked this pull request as ready for review April 28, 2026 10:51
@nomeata nomeata requested a review from mhuisi as a code owner April 28, 2026 10:51
@nomeata nomeata added the changelog-language Language features and metaprograms label Apr 28, 2026
@github-actions github-actions Bot added the mathlib4-nightly-available A branch for this PR exists at leanprover-community/mathlib4-nightly-testing:lean-pr-testing-NNNN label Apr 28, 2026
Copy link
Copy Markdown
Contributor

@mhuisi mhuisi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me.

  • Creating cancellation tokens in initializers is hopefully nothing anyone needs to do?
  • I wonder if we can somehow avoid the +0.4% overhead?

The `Option` distinguishes a normal `set` (`some ()`) from the token being dropped without ever
being set (`none`). For attaching a callback, prefer `onSet`.
-/
def task (tk : CancelToken) : Task (Option Unit) :=
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be Task Bool (via a sync map)?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had something like that before on this branch (via Promise.result! which does a sync map) and it was causing issues (either things being GC’ed or just the extra cost of more Tasks).

If you think that it should not have caused these issues and it was likely something else I can look again.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that Promise.result! uses Option.getOrBlock!, which is almost certainly not what you want here, just the sync map.

I do think that we should get this API right now instead of having to break it again in the future, yes.

@mathlib-lean-pr-testing mathlib-lean-pr-testing Bot added the builds-mathlib CI has verified that Mathlib builds against this PR label Apr 28, 2026
@nomeata
Copy link
Copy Markdown
Collaborator Author

nomeata commented Apr 28, 2026

I wonder if we can somehow avoid the +0.4% overhead?

Henrik already alerted me and suggested I follow the RequestCancellationToken idiom of keeping a faster-to-check IO Ref. Trying that right now.

Add an `IO.Ref Bool` field to `CancelToken` that's set in lockstep with
the promise resolution. `CancelToken.isSet` reads the flag directly,
and the C++ `check_interrupted` hot path uses a small inline accessor
that projects the flag and reads it without walking the promise's task
state.

The intent is to recover the ~0.4% performance regression that the
Promise-based representation introduced on `Core.checkInterrupted`-heavy
workloads, by replacing a task-state walk with a direct ref read. Not
benchmarked yet — the optimization is plausible but unmeasured.

The same dual-state pattern is already used in
`Lean.Server.RequestCancellationToken` (CancelToken + separate Promise);
folding it into `CancelToken` itself lets us simplify that downstream.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@nomeata
Copy link
Copy Markdown
Collaborator Author

nomeata commented Apr 28, 2026

!radar

@leanprover-radar
Copy link
Copy Markdown

leanprover-radar commented Apr 28, 2026

Benchmark results for a8edb0e against 3c6317b are in. There are significant results. @nomeata

  • build//instructions: -11.2G (-0.09%)

Medium changes (2✅, 1🟥)

  • elab/grind_ring_5//instructions: -91.2M (-1.04%)
  • misc/import Init.Data.BitVec.Lemmas//instructions: -871.9M (-0.72%)
  • 🟥 misc/import Lean//instructions: +31.0M (+2.02%)

Small changes (47✅, 1🟥)

Too many entries to display here. View the full report on radar instead.

mathlib-nightly-testing Bot pushed a commit to leanprover-community/batteries that referenced this pull request Apr 28, 2026
mathlib-nightly-testing Bot pushed a commit to leanprover-community/mathlib4-nightly-testing that referenced this pull request Apr 28, 2026
@nomeata nomeata added this pull request to the merge queue Apr 28, 2026
Merged via the queue into master with commit c36b0fb Apr 28, 2026
24 checks passed
Comment on lines 55 to 56
A task that completes when the cancellation token is set or dropped. Useful for waiting on
cancellation without polling — e.g. as one of the tasks given to `IO.waitAny`.
Copy link
Copy Markdown
Contributor

@mhuisi mhuisi Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strictly speaking not true anymore. Consumers can now occasionally observe that tk.isSet is true while tk.task has not completed yet.

This isn't a big deal in the language server since it's all internal API, but I don't think it's impossible that some code calls e.g. isSet to determine whether the cancellation token has been set and then (carelessly) calls some other downstream function they don't control that uses something akin to SnapshotTask.get?, assuming that it will work, when in reality they have now introduced a super subtle race condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

builds-mathlib CI has verified that Mathlib builds against this PR changelog-language Language features and metaprograms mathlib4-nightly-available A branch for this PR exists at leanprover-community/mathlib4-nightly-testing:lean-pr-testing-NNNN merge-ci Enable merge queue CI checks for PR. In particular, produce artifacts for all major platforms. toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parallel tactic combinators leak subtasks on re-elaboration (cancel token not propagated)

4 participants