Draft
Conversation
Group `.claude/` ignores per-skill instead of a flat list: `ai.skillz` symlinks, `/open-wkt`, `/code-review-changes`, `/pr-msg`, `/commit-msg`. Add missing symlink entries (`yt-url-lookup` -> `resolve-conflicts`, `inter-skill-review`). Drop stale `Claude worktrees` section (already covered by `.claude/wkts/`). (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
New "Inspect last failures" section reads the pytest `lastfailed` cache JSON directly — instant, no collection overhead, and filters to `tests/`-prefixed entries to avoid stale junk paths. Also, - add `jq` tool permission for `.pytest_cache/` files (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Rework section 3 from a worktree-only check into a structured 3-step flow: detect active venv, interpret results (Case A: active, B: none, C: worktree), then run import + collection checks. Deats, - Case B prompts via `AskUserQuestion` when no venv is detected, offering `uv sync` or manual activate - add `uv run` fallback section for envs where venv activation isn't practical - new allowed-tools: `uv run python`, `uv run pytest`, `uv pip show`, `AskUserQuestion` (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Land the scaffolding for a future sub-interpreter (PEP 734 `concurrent.interpreters`) actor spawn backend per issue #379. The spawn flow itself is not yet implemented; `subint_proc()` raises a placeholder `NotImplementedError` pointing at the tracking issue — this commit only wires up the registry, the py-version gate, and the harness. Deats, - bump `pyproject.toml` `requires-python` to `>=3.12, <3.15` and list the `3.14` classifier — the new stdlib `concurrent.interpreters` module only ships on 3.14 - extend `SpawnMethodKey = Literal[..., 'subint']` - `try_set_start_method('subint')` grows a new `match` arm that feature-detects the stdlib module and raises `RuntimeError` with a clear banner on py<3.14 - `_methods` registers the new `subint_proc()` via the same bottom-of-module late-import pattern used for `._trio` / `._mp` Also, - new `tractor/spawn/_subint.py` — top-level `try: from concurrent import interpreters` guards `_has_subints: bool`; `subint_proc()` signature mirrors `trio_proc`/`mp_proc` so the Phase B.2 impl can drop in without touching the registry - re-add `import sys` to `_spawn.py` (needed for the py-version msg in the gate-error) - `_testing.pytest.pytest_configure` wraps `try_set_start_method()` in a `pytest.UsageError` handler so `--spawn-backend=subint` on py<3.14 prints a clean banner instead of a traceback (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Since we're devving subints we require the 3.14+ stdlib API
and a couple compiled libs don't support it yet, namely:
- `cffi`, which we're only using for the `.ipc._linux` eventfd
stuff (now factored into `hotbaud` anyway).
- `greenback`, which requires `greenlet` which doesn't seem to be
wheeled yet
* on nixos the sdist build was failing due to lack of `g++` which
i don't care to figure out rn since we don't need `.devx` stuff
immediately for this subints prototype.
* [ ] we still need to adjust any dependent suites to skip.
Adjust `test_ringbuf` to skip on import failure.
Also project wide,
- pin us to py 3.13+ in prep for last-2-minor-version policy.
- drop `msgspec>=0.20.0`, the first release with py3.14 support.
Replace the B.1 scaffold stub w/ a working spawn flow driving PEP 734 sub-interpreters on dedicated OS threads. Deats, - use private `_interpreters` C mod (not the public `concurrent.interpreters` API) to get `'legacy'` subint config — avoids PEP 684 C-ext compat issues w/ `msgspec` and other deps missing the `Py_mod_multiple_interpreters` slot - bootstrap subint via code-string calling new `_actor_child_main()` from `_child.py` (shared entry for both CLI and subint backends) - drive subint lifetime on an OS thread using `trio.to_thread.run_sync(_interpreters.exec, ..)` - full supervision lifecycle mirrors `trio_proc`: `ipc_server.wait_for_peer()` → send `SpawnSpec` → yield `Portal` via `task_status.started()` - graceful shutdown awaits the subint's inner `trio.run()` completing; cancel path sends `portal.cancel_actor()` then waits for thread join before `_interpreters.destroy()` Also, - extract `_actor_child_main()` from `_child.py` `__main__` block as callable entry shape bc the subint needs it for code-string bootstrap - add `"subint"` to the `_runtime.py` spawn-method check so child accepts `SpawnSpec` over IPC Prompt-IO: ai/prompt-io/claude/20260417T124437Z_5cd6df5_prompt_io.md (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Expand the comment block above the `_interpreters` import explaining *why* we use the private C mod over `concurrent.interpreters`: the public API only exposes PEP 734's `'isolated'` config which breaks `msgspec` (missing PEP 684 slot). Add reference links to PEP 734, PEP 684, cpython sources, and the msgspec upstream tracker (jcrist/msgspec#563). Also, - update error msgs in both `_spawn.py` and `_subint.py` to say "3.13+" (matching the actual `_interpreters` availability) instead of "3.14+". - tweak the mod docstring to reflect py3.13+ availability via the private C module. Review: PR #444 (copilot-pull-request-reviewer) #444 (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
`trio.to_thread.run_sync(_interpreters.exec, ...)` runs `exec()` on a cached worker thread — and when that thread is returned to the cache after the subint's `trio.run()` exits, CPython still keeps the subint's tstate attached to the (now idle) worker. Result: the teardown `_interpreters.destroy(interp_id)` in the `finally` block can block the parent's trio loop indefinitely, waiting for a tstate release that only happens when the worker either picks up a new job or exits. Manifested as intermittent mid-suite hangs under `--spawn-backend=subint` — caught by a `faulthandler.dump_traceback_later()` showing the main thread stuck in `_interpreters.destroy()` at `_subint.py:293` with only an idle trio-cache worker as the other live thread. Deats, - drive the subint on a plain `threading.Thread` (not `trio.to_thread`) so the OS thread truly exits after `_interpreters.exec()` returns, releasing tstate and unblocking destroy - signal `subint_exited.set()` back to the parent trio loop from the driver thread via `trio.from_thread.run_sync(..., trio_token=...)` — capture the token at `subint_proc` entry - swallow `trio.RunFinishedError` in that signal path for the case where parent trio has already exited (proc teardown) - in the teardown `finally`, off-load the sync `driver_thread.join()` to `trio.to_thread.run_sync` (cache thread w/ no subint tstate → safe) so we actually wait for the driver to exit before `_interpreters.destroy()` (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Log the `claude-opus-4-7` session that produced the `_subint.py` dedicated-thread fix (`26fb8206`). Substantive bc the patch was entirely AI-generated; raw log also preserves the CPython-internals research informing Phase B.3 hard-kill work. Prompt-IO: ai/prompt-io/claude/20260418T042526Z_26fb820_prompt_io.md (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Unbounded `trio.CancelScope(shield=True)` at the soft-kill and thread-join sites can wedge the parent trio loop indefinitely when a stuck subint ignores portal-cancel (e.g. bc the IPC channel is already broken). Deats, - add `_HARD_KILL_TIMEOUT` (3s) module-level const - wrap both shield sites with `trio.move_on_after()` so we abandon a stuck subint after the deadline - flip driver thread to `daemon=True` so proc-exit also isn't blocked by a wedged subint - pass `abandon_on_cancel=True` to `trio.to_thread.run_sync(driver_thread.join)` — load-bearing for `move_on_after` to actually fire - log warnings when either timeout triggers - improve `InterpreterError` log msg to explain the abandoned-thread scenario (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Bottle up the diagnostic primitives that actually cracked the silent mid-suite hangs in the `subint` spawn-backend bringup (issue there" session has them on the shelf instead of reinventing from scratch. Deats, - `dump_on_hang(seconds, *, path)` — context manager wrapping `faulthandler.dump_traceback_later()`. Critical gotcha baked in: dumps go to a *file*, not `sys.stderr`, bc pytest's stderr capture silently eats the output and you can spend an hour convinced you're looking at the wrong thing - `track_resource_deltas(label, *, writer)` — context manager logging per-block `(threading.active_count(), len(_interpreters.list_all()))` deltas; quickly rules out leak-accumulation theories when a suite progressively worsens (if counts don't grow, it's not a leak, look for a race on shared cleanup instead) - `resource_delta_fixture(*, autouse, writer)` — factory returning a `pytest` fixture wrapping `track_resource_deltas` per-test; opt in by importing into a `conftest.py`. Kept as a factory (not a bare fixture) so callers own `autouse` / `writer` wiring Also, - export the three names from `tractor.devx` - dep-free on py<3.13 (swallows `ImportError` for `_interpreters`) - link back to the provenance in the module docstring (issue #379 / commit `26fb820`) (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
The private `_interpreters` C module ships since 3.13, but that vintage
wedges under our `threading.Thread` + multi-trio usage pattern
—> `_interpreters.exec()` silently never makes progress. 3.14 fixes it.
So gate on the presence of the public `concurrent.interpreters` wrapper
(3.14+ only) even tho we still call into the private module at runtime.
Deats,
- `try_set_start_method('subint')` error msg + `_subint` module
docstring/comments rewritten to document the 3.14 floor and why 3.13
can't work.
- `_subint._has_subints` gate now imports `concurrent.interpreters` (not
`_interpreters`) as the version sentinel.
Also, reshuffle `pyproject.toml` deps into
per-python-version `[tool.uv.dependency-groups]`:
- `subints` group: `msgspec>=0.21.0`, py>=3.14
- `eventfd` group: `cffi>=1.17.1`, py>=3.13,<3.14
- `sync_pause` group: `greenback`, py>=3.13,<3.14
(was in `devx`; moved out bc no 3.14 yet)
Bump top-level `msgspec>=0.20.0` too.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Lock in the escape-hatch machinery added to `tractor.spawn._subint` during the Phase B.2/B.3 bringup (issue #379) so future stdlib regressions or our own refactors don't silently re-introduce the mid-suite hangs. Deats, - `test_subint_happy_teardown`: baseline — spawn a subactor, one portal RPC, clean teardown. If this breaks, something's wrong unrelated to the hard-kill shields. - `test_subint_non_checkpointing_child`: cancel a subactor stuck in a non-checkpointing Python loop (`threading.Event.wait()` releases the GIL but never inserts a trio checkpoint). Validates the bounded-shield + daemon-driver-thread combo abandons the thread after `_HARD_KILL_TIMEOUT`. Every test is wrapped in `trio.fail_after()` for a deterministic per-test wall-clock ceiling (an unbounded audit would defeat itself) and arms `tractor.devx.dump_on_hang()` so a hang captures a stack dump — pytest's stderr capture swallows `faulthandler` output by default. Gated via `pytest.importorskip('concurrent.interpreters')` and a module-level skip when `--spawn-backend` isn't `'subint'`. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Classify and write up the two distinct hang modes hit during Phase B subint bringup (issue #379) so future triage doesn't re-derive them from scratch. Deats, two new `ai/conc-anal/` docs, - `subint_sigint_starvation_issue.md`: abandoned legacy-subint thread + shared GIL → main trio loop starves → signal-wakeup-fd pipe fills → `SIGINT` silently dropped (`strace` shows `write() = EAGAIN` on the wakeup-fd). Un- Ctrl-C-able. Structurally a CPython limit; blocked on `msgspec` PEP 684 (jcrist/msgspec#563) - `subint_cancel_delivery_hang_issue.md`: parent-side trio task parks on an orphaned IPC channel after subint teardown — no clean EOF delivered to the waiting receive. Ctrl-C-able (main loop iterates fine); OUR bug to fix. Candidate fix: explicit parent-side channel abort in `subint_proc`'s hard-kill teardown Cross-link the docs from their test reproducers, - `test_stale_entry_is_deleted` (→ starvation class): wrap `trio.run(main)` in `dump_on_hang(seconds=20)` so a future regression captures a stack dump. Kept un- skipped so the dump file is inspectable - `test_subint_non_checkpointing_child` (→ delivery class): extend docstring with a "KNOWN ISSUE" block pointing at the analysis (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Log the `claude-opus-4-7` collab that produced `e92e3cd2` ("Doc `subint`
backend hang classes + arm `dump_on_hang`"). Substantive bc the two new
`ai/conc-anal/` docs were jointly authored — user framed the two-class
split + set candidate-fix ordering for the class-2 (Ctrl-C-able) hang;
claude drafted the prose and the test-side cross-linking comments.
`.raw.md` is in diff-ref mode — per-file pointers via `git diff
e92e3cd~1..e92e3cd -- <path>` rather than re-embedding content that
already lives in `git log -p`.
Prompt-IO: ai/prompt-io/claude/20260420T192739Z_5e8cd8b2_prompt_io.md
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add a hard process-level wall-clock bound on the two known-hanging subint-backend tests so an unattended suite run can't wedge indefinitely in either of the hang classes doc'd in `ai/conc-anal/`. Deats, - New `testing` dep: `pytest-timeout>=2.3`. - `test_stale_entry_is_deleted`: `@pytest.mark.timeout(3, method='thread')`. The `method='thread'` choice is deliberate — `method='signal'` routes via `SIGALRM` which is starved by the same GIL-hostage path that drops `SIGINT` (see `subint_sigint_starvation_issue.md`), so it'd never actually fire in the starvation case. - `test_subint_non_checkpointing_child`: same decorator, same reasoning (defense-in-depth over the inner `trio.fail_after(15)`). At timeout, `pytest-timeout` hard-kills the pytest process itself — that's the intended behavior here; the alternative is the suite never returning. (this commit msg was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
Add two more tests to the catalog in
`conc-anal/subint_sigint_starvation_issue.md` — same
signal-wakeup-fd-saturation fingerprint (abandoned legacy-subint driver
threads → shared-GIL starvation → `write() = EAGAIN` on the wakeup pipe
→ silent SIGINT drop), different load patterns.
Deats,
- `test_cancel_while_childs_child_in_sync_sleep[subint-False]`: nested
actor-tree + sync-sleeping grandchild. Under `trio`/`mp_*` the "zombie
reaper" is a subproc `SIGKILL`; no equivalent exists under subint, so
the grandchild persists in its abandoned driver thread. Often only
manifests under full-suite runs (earlier tests seed the
abandoned-thread pool).
- `test_multierror_fast_nursery[subint-25-0.5]`: 25 concurrent subactors
all go through teardown on the multierror. Bounded hard-kills run in
parallel — so the total budget is ~3s, not 3s × 25. Leaves 25
abandoned driver threads simultaneously alive, an extreme pressure
multiplier. `strace` shows several successful `write(16, "\2", 1) = 1`
(GIL round-robin IS giving main brief slices) before finally
saturating with `EAGAIN`.
Also include a `pstree -snapt <pid>` capture showing
16+ live `{subint-driver[<interp_id>}` threads at the
moment of hang — the direct GIL-contender population.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
A reusable `@pytest.mark.skipon_spawn_backend( '<backend>' [, ...],
reason='...')` marker for backend-specific known-hang / -borked cases
— avoids scattering `@pytest.mark.skipif(lambda ...)` branches across
tests that misbehave under a particular `--spawn-backend`.
Deats,
- `pytest_configure()` registers the marker via
`addinivalue_line('markers', ...)`.
- New `pytest_collection_modifyitems()` hook walks
each collected item with `item.iter_markers(
name='skipon_spawn_backend')`, checks whether the
active `--spawn-backend` appears in `mark.args`, and
if so injects a concrete `pytest.mark.skip(
reason=...)`. `iter_markers()` makes the decorator
work at function, class, or module (`pytestmark =
[...]`) scope transparently.
- First matching mark wins; default reason is
`f'Borked on --spawn-backend={backend!r}'` if the
caller doesn't supply one.
Also, tighten type annotations on nearby `pytest`
integration points — `pytest_configure`, `debug_mode`,
`spawn_backend`, `tpt_protos`, `tpt_proto` — now taking
typed `pytest.Config` / `pytest.FixtureRequest` params.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Adopt the `@pytest.mark.skipon_spawn_backend('subint',
reason=...)` marker (a617b52) across the suites
reproducing the `subint` GIL-contention / starvation
hang classes doc'd in `ai/conc-anal/subint_*_issue.md`.
Deats,
- Module-level `pytestmark` on full-file-hanging suites:
- `tests/test_cancellation.py`
- `tests/test_inter_peer_cancellation.py`
- `tests/test_pubsub.py`
- `tests/test_shm.py`
- Per-test decorator where only one test in the file
hangs:
- `tests/discovery/test_registrar.py
::test_stale_entry_is_deleted` — replaces the
inline `if start_method == 'subint': pytest.skip`
branch with a declarative skip.
- `tests/test_subint_cancellation.py
::test_subint_non_checkpointing_child`.
- A few per-test decorators are left commented-in-
place as breadcrumbs for later finer-grained unskips.
Also, some nearby tidying in the affected files:
- Annotate loose fixture / test params
(`pytest.FixtureRequest`, `str`, `tuple`, `bool`) in
`tests/conftest.py`, `tests/devx/conftest.py`, and
`tests/test_cancellation.py`.
- Normalize `"""..."""` → `'''...'''` docstrings per
repo convention on a few touched tests.
- Add `timeout=6` / `timeout=10` to
`@tractor_test(...)` on `test_cancel_infinite_streamer`
and `test_some_cancels_all`.
- Drop redundant `spawn_backend` param from
`test_cancel_via_SIGINT`; use `start_method` in the
`'mp' in ...` check instead.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
goodboy
commented
Apr 24, 2026
| # https://peps.python.org/pep-0684/ | ||
| # - stdlib docs (3.14+): | ||
| # https://docs.python.org/3.14/library/concurrent.interpreters.html | ||
| # - CPython public wrapper source (`Lib/concurrent/interpreters/`): |
Owner
Author
There was a problem hiding this comment.
we're missing a link to the older pep-554 here?
goodboy
commented
Apr 24, 2026
| from trio import TaskStatus | ||
|
|
||
|
|
||
| # NOTE: we reach into the *private* `_interpreters` C module |
Owner
Author
There was a problem hiding this comment.
Reason for pinning to the legacy API..
goodboy
commented
Apr 24, 2026
| @@ -0,0 +1,350 @@ | |||
| # `subint` backend: abandoned-subint thread can wedge main trio event loop (Ctrl-C unresponsive) | |||
Owner
Author
There was a problem hiding this comment.
The full reason we need a GIL per subint..
10 tasks
99d7033 to
4b2a088
Compare
This was referenced Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add
subint(in-thread sub-interpreter) spawn backendMotivation
This PR is Phase B.2 of the multi-backend spawn rework tracked
under #379: it lands the first member of the
subintfamily — asingle-process spawn backend that runs each sub-actor inside a
CPython PEP 734 sub-interpreter driven by a dedicated OS thread,
while reusing tractor's existing IPC handshake (UDS/TCP) for the
parent↔child channel. The result is multiple
Actors in one OSprocess with state isolation and faster startup than
mp_spawn/mp_forkserver, but without the per-interp-GILparallelism that's still gated upstream on
msgspecPEP 684 support(jcrist/msgspec#563 and more specifically jcrist/msgspec#1026).
Why the dedicated-thread machinery? PEP 734's public
concurrent.interpretersonly exposes'isolated'config, whichrefuses to import any C extension missing the
Py_mod_multiple_interpretersslot — and that's currentlymsgspec,which tractor uses pervasively in the IPC layer. So we drop to the
private
_interpretersC module in'legacy'config: sharedmain GIL, but the
sys.modules/__main__/globals isolation weactually want for the actor model. We also feature-gate to 3.14+
via the public-module presence check: the private API exists on 3.13,
but tractor's multi-trio-task usage of
_interpreters.exec()wedgessilently there and only works cleanly on 3.14.
This PR also documents two known hang classes specific to in-thread
subints (a class-A SIGINT-starvation hang and an orphaned-channel
park) and ships the
skipon_spawn_backend()pytest marker +tractor.devx._debug_hangsmodule that fell out of triaging them. Itserves as the foundation for the stacked subint-family PRs (#447
forkserver, future fork variants).
Src of research
concurrent.interpreterspublic APIPy_mod_multiple_interpretersLib/concurrent/interpreters/) *https://github.com/python/cpython/tree/main/Lib/concurrent/interpreters
Modules/_interpretersmodule.c) *https://github.com/python/cpython/blob/main/Modules/_interpretersmodule.c
msgspecPEP 684 upstream tracker — Use multi-phase init for c-extension initialization jcrist/msgspec#563to_interpreter._Workerfor aset___main___attrs()reference impl *
https://github.com/agronholm/anyio/blob/master/src/anyio/to_interpreter.py
Summary of changes
By chronological commit (all hashes from
main..subint_spawner_backend),'subint'spawn-method scaffold totractor.spawn._spawn: register the key inSpawnMethodKey, gatetry_set_start_method('subint')on the publicconcurrent.interpretersmodule, and wire the dispatch table.subint_proc()intractor.spawn._subint:_interpreters.create('legacy')→bootstrap-string driven
_interpreters.exec()on a daemonthreading.Thread→ reuseipc_server.wait_for_peer()for theexisting UDS/TCP handshake → graceful
Portal.cancel_actor()teardown →
_interpreters.destroy()._interpretersprivate-API choicein the
_subintmodule docstring (PEP 734 isolated mode +msgspecPEP 684 incompat → forced legacy mode + 3.14 floor).tractor.ipc._ringbufimport path when
cffiis missing (now an optionaleventfddep).skips so the suite can boot under the new floor.
_interpreters.exec()to a dedicatedthreading.Threadratherthan
trio.to_thread.run_sync(), since trio's thread cacherecycles workers and leaves the subint tstate attached — blocking
_interpreters.destroy()in teardown._HARD_KILL_TIMEOUT=3.0s+daemon=Truedriver thread +move_on_after()shielded scopes; abandon-and-log if the subintcan't be cleanly destroyed (vs. wedging the parent forever).
tractor.devx._debug_hangsexposingdump_on_hang()(file-targetfaulthandler.dump_traceback_later()) andtrack_resource_deltas()(thread + subint counts) — theactually-useful triage primitives that fell out of subint bringup.
subint-floor to py3.14 inpyproject.tomland split deps into per-feature groups (subints,eventfd,sync_pause) so 3.14-only deps don't bleed into the3.13 install path.
tests/test_subint_cancellation.py—focused cancel + hard-kill audit covering the corner cases the
hard-kill paths are meant to cover.
subinthang classes(
subint_sigint_starvation_issue.md,subint_cancel_delivery_hang_issue.md) and armdump_on_hang()inthe test conftests.
subintaudit tests via@pytest.mark.timeout(method='thread')so a regressing hang failsfast instead of stalling CI.
uv.lockforpytest-timeout+py3.13-gated wheel deps.
200spytest-timeoutceilingin
pyproject.toml.test_stale_entry_is_deletedunder--spawn-backend=subint(class-A SIGINT-starvation reproducer;tracked in
subint_sigint_starvation_issue.md).subintSIGINT-starvation hangcatalog with additional reproducers / strace evidence.
@pytest.mark.skipon_spawn_backend('<backend>', reason='...')marker (registered in
tractor._testing.pytest, expanded inpytest_collection_modifyitems) so backend-borked tests can becleanly gated.
skipon_spawn_backend('subint', ...)(with hang-doc tracker refs as
reason=) to the affected testmodules.
the destroy-race fix and hang-class docs (NLNet generative-AI
provenance).
.gitignoreby skill/purpose; ignore notes & snippets subdirs./run-testsskillpolish: lastfailed-cache inspection + venv pre-flight expansion.
xonshto0.22.8-pre and pin to GHmaineditable as a workaround for anupstream packaging gap on 3.14.
Scopes changed
tractor.spawn._subint(new, ~433 LoC)subint_proc()— full subint actor lifecycle (create /driver-thread / handshake / cancel / destroy)
_HARD_KILL_TIMEOUT+ boundedmove_on_after()shields at everyteardown blocking-point
msgspectriangle + 3.14 gate rationale
tractor.spawn._spawn'subint'inSpawnMethodKey+_methodsdispatchtry_set_start_method('subint')on the publicconcurrent.interpretersmodule presencetractor._child_actor_child_main()shared entry shape used by both theCLI (trio/mp subproc backends) and the subint bootstrap string
tractor._testing.pytestskipon_spawn_backend(*backends, reason=None)marker +pytest_collection_modifyitems()expansiontry_set_start_methodRuntimeErrorinto a cleanpytest.UsageErrortractor.devx._debug_hangs(new)dump_on_hang()— file-targetfaulthandler.dump_traceback_later()(avoidspytest's stderrcapture)
track_resource_deltas()— log thread + subint counts across ablock
pyproject.toml+uv.lockrequires-pythonfloor to>=3.13, <3.15subints(msgspec≥0.21),eventfd(cffi),
sync_pause(greenback) groups[tool.uv.dependency-groups]→subints = {requires-python = ">=3.14"}pytest-timeout=200sceilingai/conc-anal/(new)subint_sigint_starvation_issue.md— class-A GIL-contentionSIGINT-never-delivered hang
subint_cancel_delivery_hang_issue.md— orphaned-channel parkafter subint teardown
tests/test_subint_cancellation.py(new)tests.discovery.test_registrar,tests.test_cancellation,tests.test_inter_peer_cancellation,tests.test_pubsub,tests.test_ringbuf,tests.test_shmpytestmark = pytest.mark.skipon_spawn_backend('subint', reason='...')with hang-doc tracker refs.claude/skills/run-tests/SKILL.mdlastfailed-cache inspection + venv pre-flight coverage(incidental)
Future follow up
Drop the dedicated-thread machinery once
msgspecPEP 684 +isolated mode lands Once Use multi-phase init for c-extension initialization jcrist/msgspec#563 ships and our
remaining C-ext deps grow
Py_mod_multiple_interpreters, switchfrom the private
_interpreterslegacy-mode driver to publicconcurrent.interpreters.create()(isolated) +trio.to_thread.run_sync(Interpreter.exec, ...). Tracked in Auditsubint_forkserverthread constraints once msgspec PEP 684 lands #450.Surface subint-bootstrap exceptions to the parent task via a
nonlocal errslottractor.spawn._subint:_subint_target()currently only
log.exception()s a hard bootstrap failure (e.g.ImportErrorof the actor module inside the subint, or syntaxerror in the bootstrap string). Adopt anyio's
(retval, is_exception)tuple pattern (and the equivalent already used inthe
_subint_forkserver.py:480-494block on the stacked PR Working toward a "subinterpreter-forkserver" spawning backend #447)to re-raise into the parent — needs to coordinate with the existing
trio.Cancelledpaths aroundsubint_exited.wait(). (?TODOin_subint.py.)Switch bootstrap arg-passing to
_interpreters.set___main___attrs()The currentrepr()-into-bootstrap-string approach only roundtrips literals.For a pre-built
SpawnSpecstruct, credentials, or callables we'dwant anyio's
set___main___attrs(interp_id, {...})pattern.Tracked in Trying out sub-interpreters (subints), maybe
fork()can be hacked now?' #379. (?TODOin_subint.py.)Resolve the class-A SIGINT-starvation hang catalog
ai/conc-anal/subint_sigint_starvation_issue.mddocumentsreproducers where an abandoned-subint thread starves the parent's
trio loop hard enough that SIGINT is dropped at the
kernel↔Python boundary. Currently mitigated via
skipon_spawn_backend('subint')on affected tests + the_HARD_KILL_TIMEOUTshields. Real fix is upstream-CPython-shaped(per-interp GIL / true subint cancellation primitive).
Fix the orphaned-channel park after subint teardown
ai/conc-anal/subint_cancel_delivery_hang_issue.md— Ctrl-C-able(so not the GIL-hostage class) but the parent-side
process_messagesloop parks on a now-dead subint's IPC channelwithout ever seeing a clean EOF/
BrokenResourceError.Tractor-shaped fix; needs an explicit channel-tear from the subint
side as part of the soft-kill sequence.
Cross-link the cancel-cascade hang from PR Working toward a "subinterpreter-forkserver" spawning backend #447
test_nested_multierrorscancel-cascade hang gated by pytest--capture=fdis mostly asubint_forkserversymptom (subint_forkserver:test_nested_multierrorscancel-cascade hang gated by pytest--capture=fd#449) butthe underlying mechanism may also affect single-process subints
under specific schedules — worth re-running the audit once Working toward a "subinterpreter-forkserver" spawning backend #447
lands.
(this pr content was generated in some part by
claude-code)