Skip to content

Commit eae478f

Browse files
committed
Add _testing._reap + auto-reap fixture
Zombie-subactor cleanup for the test suite, SC-polite discipline (`SIGINT` first, bounded grace, `SIGKILL` only on survivors). Two parts: a shared reaper module + an autouse session-end fixture that runs it. Deats, - new `tractor/_testing/_reap.py` (+230 LOC) — Linux- only reaper using `/proc/<pid>/{status,cwd,cmdline}` inspection. Two detection modes: - `find_descendants(parent_pid)` for the in-session case (PPid-direct-match while pytest is still alive). - `find_orphans(repo_root)` for the CLI / post- mortem case (`PPid==1` reparented to init + `cwd` filter to repo root + `python` cmdline filter). - `reap(pids, *, grace=3.0, poll=0.25)` does the signal ladder: SIGINT all, poll up to `grace` for exit, SIGKILL any survivors. Returns `(signalled, killed)` for caller-side reporting. - new `_reap_orphaned_subactors` session-scoped autouse fixture in `tractor/_testing/pytest.py` — after `yield`, runs `find_descendants(os.getpid())` + `reap(...)` so each pytest session leaves no surviving forks. - companion CLI scaffolding lives at `scripts/tractor-reap` (separate commit) for the pytest-died-mid-session case where the in-session fixture didn't get to run. Also, - promote `from tractor.spawn._spawn import SpawnMethodKey` to module-top in `pytest.py` (was inline-imported inside `pytest_generate_tests`), and reuse it in `pytest_collection_modifyitems` to assert each `skipon_spawn_backend` mark arg is a valid spawn-method literal — catches typos at collection time. - inline `# ?TODO` flags running these through the `try_set_backend` checker for stronger validation. Cross-refs `feedback_sc_graceful_cancel_first.md` for the SIGINT-before-SIGKILL discipline rationale. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code
1 parent 44bdb16 commit eae478f

2 files changed

Lines changed: 273 additions & 2 deletions

File tree

tractor/_testing/_reap.py

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
# tractor: structured concurrent "actors".
2+
# Copyright 2018-eternity Tyler Goodlet.
3+
4+
# This program is free software: you can redistribute it and/or modify
5+
# it under the terms of the GNU Affero General Public License as published by
6+
# the Free Software Foundation, either version 3 of the License, or
7+
# (at your option) any later version.
8+
9+
# This program is distributed in the hope that it will be useful,
10+
# but WITHOUT ANY WARRANTY; without even the implied warranty of
11+
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12+
# GNU Affero General Public License for more details.
13+
14+
# You should have received a copy of the GNU Affero General Public License
15+
# along with this program. If not, see <https://www.gnu.org/licenses/>.
16+
17+
'''
18+
Zombie-subactor reaper — SC-polite (SIGINT first, SIGKILL
19+
as last resort with a bounded grace window).
20+
21+
Shared implementation between the `tractor-reap` CLI
22+
(`scripts/tractor-reap`) and the pytest session-scoped
23+
auto-fixture that guards the test suite against leftover
24+
subactor processes.
25+
26+
Design notes
27+
------------
28+
29+
- Linux-only: reads `/proc/<pid>/{status,cwd,cmdline}`.
30+
- Two detection modes:
31+
32+
1. **descendant-mode** — when invoked from a still-live
33+
parent (e.g. a pytest session-end fixture), match by
34+
`PPid == parent_pid`. Direct + precise; the target
35+
PIDs are still reparented to the live pytest process
36+
at teardown time, before pytest exits.
37+
38+
2. **orphan-mode** — when invoked after the parent died
39+
(e.g. the `tractor-reap` CLI run post-Ctrl+C), match
40+
by `PPid == 1` (reparented to init) AND `cwd ==
41+
<repo-root>` AND cmdline contains `python`. The cwd
42+
filter is what keeps the heuristic from sweeping up
43+
unrelated init-children on the box.
44+
45+
- Escalation: for every matched PID, SIGINT, poll for up
46+
to `grace` seconds, then SIGKILL any survivors. The
47+
two-phase pattern is the SC-graceful-cancel discipline
48+
documented in `feedback_sc_graceful_cancel_first.md` —
49+
we want the subactor runtime to run its trio cancel
50+
shield + IPC teardown paths where it can.
51+
52+
'''
53+
from __future__ import annotations
54+
55+
import os
56+
import pathlib
57+
import signal
58+
import time
59+
60+
61+
def _read_status_ppid(pid: int) -> int | None:
62+
'''
63+
Return the parent-pid from `/proc/<pid>/status` or
64+
`None` if the proc went away / is unreadable.
65+
66+
'''
67+
try:
68+
with open(f'/proc/{pid}/status') as f:
69+
for line in f:
70+
if line.startswith('PPid:'):
71+
return int(line.split()[1])
72+
except (FileNotFoundError, PermissionError, ProcessLookupError):
73+
return None
74+
return None
75+
76+
77+
def _read_cwd(pid: int) -> str | None:
78+
try:
79+
return os.readlink(f'/proc/{pid}/cwd')
80+
except (FileNotFoundError, PermissionError, ProcessLookupError):
81+
return None
82+
83+
84+
def _read_cmdline(pid: int) -> str:
85+
try:
86+
with open(f'/proc/{pid}/cmdline', 'rb') as f:
87+
return f.read().replace(b'\0', b' ').decode(errors='replace')
88+
except (FileNotFoundError, PermissionError, ProcessLookupError):
89+
return ''
90+
91+
92+
def _iter_live_pids() -> list[int]:
93+
'''
94+
Enumerate currently-alive pids from `/proc`.
95+
96+
'''
97+
try:
98+
entries: list[str] = os.listdir('/proc')
99+
except OSError:
100+
return []
101+
return [int(e) for e in entries if e.isdigit()]
102+
103+
104+
def find_descendants(
105+
parent_pid: int,
106+
) -> list[int]:
107+
'''
108+
PIDs whose `PPid == parent_pid` — i.e. direct
109+
children of the given pid. Used by the pytest
110+
session-end fixture where `parent_pid` is still
111+
alive as the pytest-python process.
112+
113+
'''
114+
return [
115+
pid
116+
for pid in _iter_live_pids()
117+
if _read_status_ppid(pid) == parent_pid
118+
]
119+
120+
121+
def find_orphans(
122+
repo_root: pathlib.Path,
123+
) -> list[int]:
124+
'''
125+
PIDs that are:
126+
127+
- reparented to init (`PPid == 1`),
128+
- have `cwd == <repo_root>`,
129+
- and have a `python` in their cmdline.
130+
131+
This is the "pytest-died-mid-session" case where the
132+
subactor forks got reparented. The cwd filter is the
133+
critical bit that keeps us from sweeping up unrelated
134+
init-children on the box.
135+
136+
'''
137+
repo: str = str(repo_root)
138+
hits: list[int] = []
139+
for pid in _iter_live_pids():
140+
if _read_status_ppid(pid) != 1:
141+
continue
142+
cwd: str | None = _read_cwd(pid)
143+
if cwd != repo:
144+
continue
145+
cmd: str = _read_cmdline(pid)
146+
if 'python' not in cmd:
147+
continue
148+
hits.append(pid)
149+
return hits
150+
151+
152+
def reap(
153+
pids: list[int],
154+
*,
155+
grace: float = 3.0,
156+
poll: float = 0.25,
157+
log=print,
158+
) -> tuple[list[int], list[int]]:
159+
'''
160+
Deliver SIGINT to each pid, wait up to `grace`
161+
seconds for them to exit, then SIGKILL any that
162+
survive.
163+
164+
Returns `(signalled, survivors_killed)` so callers
165+
can report / assert.
166+
167+
`log` is the logger function for user-visible
168+
progress lines — default `print`; pytest fixture
169+
swaps it for a `pytest`-friendly writer.
170+
171+
'''
172+
if not pids:
173+
return ([], [])
174+
175+
signalled: list[int] = []
176+
for pid in pids:
177+
try:
178+
os.kill(pid, signal.SIGINT)
179+
signalled.append(pid)
180+
except ProcessLookupError:
181+
# raced — already gone
182+
pass
183+
184+
if signalled:
185+
log(
186+
f'[tractor-reap] SIGINT → {len(signalled)} '
187+
f'proc(s): {signalled}'
188+
)
189+
190+
deadline: float = time.monotonic() + grace
191+
while time.monotonic() < deadline:
192+
time.sleep(poll)
193+
alive: list[int] = [
194+
pid for pid in signalled if _is_alive(pid)
195+
]
196+
if not alive:
197+
return (signalled, [])
198+
199+
survivors: list[int] = [
200+
pid for pid in signalled if _is_alive(pid)
201+
]
202+
if survivors:
203+
log(
204+
f'[tractor-reap] SIGKILL (after {grace}s '
205+
f'grace) → {survivors}'
206+
)
207+
for pid in survivors:
208+
try:
209+
os.kill(pid, signal.SIGKILL)
210+
except ProcessLookupError:
211+
pass
212+
213+
return (signalled, survivors)
214+
215+
216+
def _is_alive(pid: int) -> bool:
217+
'''
218+
True iff `/proc/<pid>` still exists AND the proc
219+
isn't already a zombie (Z state).
220+
221+
'''
222+
try:
223+
with open(f'/proc/{pid}/status') as f:
224+
for line in f:
225+
if line.startswith('State:'):
226+
# e.g. 'State:\tZ (zombie)'
227+
return 'Z' not in line.split()[1]
228+
except (FileNotFoundError, ProcessLookupError):
229+
return False
230+
return True

tractor/_testing/pytest.py

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232

3333
import pytest
3434
import tractor
35+
from tractor.spawn._spawn import SpawnMethodKey
3536
import trio
3637

3738

@@ -274,7 +275,12 @@ class + module-level marks in the correct scope order (and
274275
default_reason: str = f'Borked on --spawn-backend={backend!r}'
275276
for item in items:
276277
for mark in item.iter_markers(name='skipon_spawn_backend'):
277-
if backend in mark.args:
278+
skip_backends: tuple[str] = mark.args
279+
for skip_backend in skip_backends:
280+
assert skip_backend in get_args(SpawnMethodKey)
281+
# ?TODO, run these through the try-set-backend checker to
282+
# avoid typos?
283+
if backend in skip_backends:
278284
reason: str = mark.kwargs.get(
279285
'reason',
280286
default_reason,
@@ -285,6 +291,42 @@ class + module-level marks in the correct scope order (and
285291
break
286292

287293

294+
@pytest.fixture(
295+
scope='session',
296+
autouse=True,
297+
)
298+
def _reap_orphaned_subactors():
299+
'''
300+
Session-scoped autouse fixture: after the whole test
301+
session finishes, SIGINT any subactor processes still
302+
parented to this `pytest` process, wait a bounded
303+
grace window, then SIGKILL survivors.
304+
305+
Rationale: under fork-based spawn backends (notably
306+
`subint_forkserver`), a test that times out or bails
307+
mid-teardown can leave subactor forks alive. Without
308+
this reap, they linger across sessions and compete
309+
for ports / inherit pytest's capture-pipe fds — which
310+
flakifies later tests. SC-polite discipline: SIGINT
311+
first to let the subactor's trio cancel shield + IPC
312+
teardown paths run before we escalate.
313+
314+
Matching companion CLI: `scripts/tractor-reap` for
315+
the pytest-died-mid-session case.
316+
317+
'''
318+
import os
319+
parent_pid: int = os.getpid()
320+
yield
321+
from tractor._testing._reap import (
322+
find_descendants,
323+
reap,
324+
)
325+
pids: list[int] = find_descendants(parent_pid)
326+
if pids:
327+
reap(pids, grace=3.0)
328+
329+
288330
@pytest.fixture(scope='session')
289331
def debug_mode(
290332
request: pytest.FixtureRequest,
@@ -398,7 +440,6 @@ def pytest_generate_tests(
398440
# drive the valid-backend set from the canonical `Literal` so
399441
# adding a new spawn backend (e.g. `'subint'`) doesn't require
400442
# touching the harness.
401-
from tractor.spawn._spawn import SpawnMethodKey
402443
assert spawn_backend in get_args(SpawnMethodKey)
403444

404445
# NOTE: used-to-be-used-to dyanmically parametrize tests for when

0 commit comments

Comments
 (0)