Skip to content

Rewrite TestChildProcessCleanup with socket-based deterministic liveness probe#2265

Merged
maxisbey merged 9 commits intomainfrom
fix/flaky-child-process-cleanup-polling
Mar 12, 2026
Merged

Rewrite TestChildProcessCleanup with socket-based deterministic liveness probe#2265
maxisbey merged 9 commits intomainfrom
fix/flaky-child-process-cleanup-polling

Conversation

@maxisbey
Copy link
Contributor

@maxisbey maxisbey commented Mar 10, 2026

Summary

Closes #1775. Supersedes #2071.

The three TestChildProcessCleanup tests in tests/client/test_stdio.py failed intermittently on Windows/macOS CI. The root cause was a timing-dependent test design: watching a file grow on disk as a proxy for "is the child process alive?", with fixed anyio.sleep() durations to "wait long enough" for nested Python interpreters to start. On loaded CI, "long enough" wasn't.

A secondary bug compounded this: when the flaky assertion failed, proc was never terminated — the finally blocks only cleaned up tempfiles. The leaked subprocess was GC'd during a later test, triggering PytestUnraisableExceptionWarning there (observed: test_call_tool — an in-memory test that never touches subprocesses — failing with ResourceWarning: subprocess 5592 is still running).

New design — socket-based, zero sleeps, zero polling

Each subprocess in the tree connects a TCP socket back to a listener owned by the test. Two kernel-guaranteed blocking-I/O signals then tell us everything:

  1. await listener.accept() blocks until the subprocess connects → proves it started. No polling for "did the file grow yet?".
  2. await stream.receive(1) after _terminate_process_tree() raises anyio.EndOfStream (or anyio.BrokenResourceError on Windows, where TerminateJobObject causes an abrupt RST rather than clean FIN) because the kernel closes all file descriptors — including sockets — when a process terminates. This is a direct, OS-level proof that the child is dead. No polling for "did the file stop growing?".

Every synchronization point is a blocking read that unblocks on a kernel event. The anyio.fail_after() wrappers are upper bounds for catastrophic failure (subprocess never started / survived termination), not timing assumptions.

# Old: fixed sleep, indirect proxy, races against CI load
await anyio.sleep(0.5)
initial_size = os.path.getsize(marker_file)
await anyio.sleep(0.3)
assert os.path.getsize(marker_file) > initial_size  # fails: assert 0 > 0

# New: blocking accept, kernel-level EOF signal
with anyio.fail_after(10):
    stream = await sock.accept()           # blocks until child connects
    assert await stream.receive(5) == b"alive"

await _terminate_process_tree(proc)

with pytest.raises((anyio.EndOfStream, anyio.BrokenResourceError)):
    await stream.receive(1)                # child dead → kernel closed its socket

Process + transport cleanup

_terminate_process_tree kills the OS process group / Job Object but doesn't call process.wait() or close the anyio stdin/stdout pipe wrappers. On Windows, the resulting _WindowsSubprocessTransport / PipeHandle objects were GC'd later, triggering ResourceWarning in whatever test ran next. The SDK's production code at stdio.py:180 avoids this via async with process: — these tests call _terminate_process_tree directly, so they need equivalent cleanup.

Fixed with _terminate_and_reap(proc) registered via AsyncExitStack.push_async_callback() immediately after process creation: terminates the tree (no-op if already dead) → await proc.wait() → close stdin/stdout pipes. Bounded by move_on_after(5). All three tests use AsyncExitStack for RAII-style resource management — socket listener, process, and accepted streams each have their cleanup callback pushed at acquisition, so there are no if x is not None: guards in finally blocks, and no coverage pragmas.

This made the @pytest.mark.filterwarnings("ignore::ResourceWarning" if win32 ...) decorators unnecessary — removed.

Also removed

  • tempfile import and all file I/O
  • escape_path_for_python import (scripts are now injected via {code!r} repr() literals, which eliminates all nested-quote escaping)
  • ~30 lines of boilerplate textwrap.dedent blocks with triply-nested f-string scripts
  • The parent_marker check from test 1, which was a no-op: NamedTemporaryFile(delete=False) creates the file, so os.path.exists() was always True

Timing

Test main PR #2265 Speedup
test_basic_child_process_cleanup 1.91s 0.15s 12.7×
test_nested_process_tree 3.41s 0.16s 21.3×
test_early_parent_exit 1.70s 0.14s 12.1×
Total 7.04s 0.46s 15.3×

30 runs × 4 parallel workers (flake-finder): ~2.1s, 30/30 pass.

Hygiene

Metric Before After
anyio.sleep() calls in TestChildProcessCleanup 12 0
Coverage pragmas in TestChildProcessCleanup 5 0
filterwarnings("ignore::ResourceWarning") decorators 3 0
Diff +180 −256 (net −76 lines)

AI Disclaimer

…c in finally

The three TestChildProcessCleanup tests were flaky on Windows/macOS CI
because they used fixed anyio.sleep() durations (0.5s + 0.3s = 800ms)
to wait for nested Python interpreter chains to start writing to marker
files. On loaded CI runners, two Python startups + subprocess.Popen +
open() + first write() can exceed 800ms, causing assert 0 > 0.

Secondary bug: when the assertion failed, proc was never terminated
(finally only cleaned up tempfiles). The leaked subprocess was GC'd
during a later test, triggering PytestUnraisableExceptionWarning and
causing knock-on failures in unrelated tests.

Changes:

- Added _wait_for_file_size() helper: polls until getsize(path) exceeds
  a threshold, bounded by anyio.fail_after(10). Replaces all startup
  sleep+assert chains. Raises TimeoutError with a clear message instead
  of a confusing assert 0 > 0.

- Added proc cleanup to each finally block. Uses proc=None tracking
  after successful in-test termination to avoid redundant double-calls,
  and anyio.move_on_after() so cleanup timeout never masks the real
  failure.

- Removed the parent_marker check from test_basic_child_process_cleanup.
  NamedTemporaryFile(delete=False) already creates the file, so the
  os.path.exists() assertion was a no-op that never verified anything.
  Child writing proves parent started.

- Simplified shutdown verification: terminate_posix_process_tree already
  polls for process-group death, so the first post-termination sleep(0.5)
  was redundant. Reduced to a single 0.3s stability check (3x child
  write interval).

Result: 3 tests drop from ~7s to ~1.9s locally, 30/30 pass under
4-worker parallel load (flake-finder).

Closes #1775

Github-Issue: #1775
…ess probe

Replaces the file-growth-polling approach with a fundamentally different
design: each subprocess in the tree connects a TCP socket back to a listener
owned by the test. Liveness and death are then detected via blocking I/O
operations that unblock on kernel-level events — zero sleeps, zero polling:

  1. await listener.accept() blocks until the subprocess connects
     (proves it started).
  2. After _terminate_process_tree(), await stream.receive(1) raises
     EndOfStream because the kernel closes all file descriptors of a
     terminated process. This is direct, OS-guaranteed proof that the
     child is dead.

Compared to the original file-watching approach:
- No tempfiles, no file I/O, no path escaping
- No fixed sleep() durations (the root cause of flakiness)
- No polling loops
- Failure modes are clear: accept times out = child never started;
  receive times out = child survived termination (actual bug)

Also fixes the process leak on assertion failure by adding proc cleanup
to each finally block (guarded by proc=None after successful in-test
termination to avoid noisy double-terminate).

Result: 3 tests drop from ~7.1s to ~0.5s, 30/30 pass under 4-worker
parallel load. Removes 80 net lines and the tempfile/path-escape deps.

Closes #1775

Github-Issue: #1775
@maxisbey maxisbey changed the title Fix flaky TestChildProcessCleanup: poll for file growth, clean up proc in finally Rewrite TestChildProcessCleanup with socket-based deterministic liveness probe Mar 10, 2026
On Windows, TerminateJobObject causes an abrupt TCP RST (not clean FIN),
so anyio raises BrokenResourceError instead of EndOfStream when reading
from a socket held by the terminated child. Both exceptions indicate the
peer is dead — catch both.

Github-Issue: #1775
_terminate_process_tree kills the OS process group / Job Object but does
not call process.wait() or close the anyio stdin/stdout pipe wrappers.
On Windows, the resulting _WindowsSubprocessTransport / PipeHandle objects
are GC'd after the test's filterwarnings scope has exited, triggering
PytestUnraisableExceptionWarning in whatever test runs next on the same
worker.

The old file-watching tests masked this because their ~1.8s of sleep()
calls per test gave the asyncio event loop's child watcher time to fire
_process_exited() callbacks (which close the transport) during the test.
The new socket-based tests are ~0.15s each — too fast for that callback
to fire before GC.

The SDK's production path (stdio.py:180) avoids this via 'async with
process:', whose __aexit__ handles reaping + stream closure. These tests
call _terminate_process_tree directly, so they must do the same cleanup
explicitly.

Added _terminate_and_reap() helper that runs unconditionally in each
test's finally: terminate tree (no-op if already dead), await wait()
to reap, close stdin/stdout pipe wrappers. Bounded by move_on_after(5).

Github-Issue: #1775
Replaced 8 coverage pragmas with restructuring:

- pytest.fail() # pragma: no cover → pytest.raises() — if the child
  survives, pytest.raises fails with "DID NOT RAISE" (library handles it)

- if proc.stdin is not None: # pragma: no branch → assert proc.stdin is
  not None — satisfies pyright Optional narrowing; asserts are statements
  not branches, so coverage doesn't need both paths

- if proc is not None: # pragma: no branch (×3 in finally blocks)
  → AsyncExitStack.push_async_callback(_terminate_and_reap, proc)
  immediately after proc is created. Cleanup is guaranteed, no
  None-check needed.

- if stream is not None: # pragma: no branch (×2 in finally blocks)
  → Same: push stream.aclose as a callback right after accept()

Also dropped the # noqa: PERF401 — the accept loop now does stream
acquisition + cleanup registration + list append (3 ops), which doesn't
match PERF401's single-append pattern.

Net: all 3 tests are single-indent, zero try/finally boilerplate, zero
pragmas, zero noqa. Still 100% coverage, ~0.5s for all 3 tests.

Github-Issue: #1775
@maxisbey maxisbey marked this pull request as ready for review March 11, 2026 18:23
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — test-only rewrite that replaces a flaky timing-based design with deterministic kernel-level socket signals.

Extended reasoning...

Overview

This PR rewrites the three TestChildProcessCleanup tests in tests/client/test_stdio.py to fix CI flakiness (#1775). It touches a single test file (+168/−248, net −80 lines), replacing a file-growth-watching approach (with hardcoded anyio.sleep() durations that raced against slow Python interpreter startup on loaded CI) with a socket-based liveness probe: each subprocess connects a TCP socket back to a test-owned listener, and the test uses accept() blocking until connect to prove liveness, and EndOfStream/BrokenResourceError on receive() after termination as kernel-level proof of death. It also fixes a secondary process-leak bug where failed assertions left subprocesses unreaped, polluting later tests with ResourceWarning.

Security risks

None. The TCP listener binds to 127.0.0.1 on an ephemeral port, lives only for the duration of a single test method, and transports a fixed 5-byte literal b"alive". No external network exposure, no user input, no auth/crypto/permissions code touched. This is purely test-harness plumbing.

Level of scrutiny

Low. This is a test-only change with zero production code impact. The new design is strictly more deterministic than what it replaces — blocking on kernel events (accept(), socket EOF) rather than sleeping and hoping. The FallbackProcess import from mcp.os.win32.utilities is safe cross-platform (the class is defined unconditionally; only the pywin32 imports are guarded). The {code!r} script injection is a nice simplification over the old triple-nested f-string escaping. The AsyncExitStack cleanup pattern with _terminate_and_reap correctly handles both the happy path (idempotent double-terminate is bounded by move_on_after) and the failure path (subprocess is reaped even if an assertion fails early).

Other factors

No CODEOWNERS file, no prior reviews on the timeline, no bugs flagged by the bug-hunting system. The PR description includes stress-test evidence (30 runs × 4 parallel workers, 30/30 pass) and a 15× speedup. The helper functions are well-factored with clear docstrings explaining the design rationale. The only theoretical nit — stream.receive(5) returning fewer than 5 bytes — is not a practical concern for a 5-byte message on loopback TCP sent via sendall.

…rators

The old tests had filterwarnings("ignore::ResourceWarning" if win32 ...)
because _terminate_process_tree didn't call process.wait() or close pipe
transports, so GC of the anyio Process wrapper emitted ResourceWarning.

_terminate_and_reap now explicitly calls wait() + closes stdin/stdout
pipes in the test's AsyncExitStack teardown (i.e., during the test,
before GC can fire). The warnings should no longer occur — if they do,
CI will tell us and we should fix the root cause rather than suppress.

Github-Issue: #1775
…tent

Addresses review feedback: calling _terminate_process_tree on an
already-reaped POSIX process hits the fallback path in
terminate_posix_process_tree (os.getpgid on a stale PID raises
ProcessLookupError), which emits two WARNING logs and an ERROR with
traceback. These are visible because pyproject.toml sets log_cli=true.

Previously, each test called _terminate_process_tree in the body and the
AsyncExitStack called _terminate_and_reap (which re-called
_terminate_process_tree) at exit — double-terminate, nine spurious log
records per run.

Fix: _terminate_and_reap now checks proc.returncode before terminating.
On POSIX, the asyncio child watcher sets returncode during
_terminate_process_tree's polling-loop yield, so by the time cleanup
runs it's populated and the guard skips the noisy re-terminate. On
Windows/FallbackProcess (no returncode attribute), getattr returns None
and terminate runs — but the Windows path is silent on double-call.

Restructured tests to call _terminate_and_reap (which wraps
_terminate_process_tree) as the action under test, and let the ExitStack
safety-net call it again. Since wait() and stream.aclose() are
idempotent, this is safe — and it covers both branches of the guard
(first call: returncode is None → terminate; second call: returncode set
→ skip). 100% coverage, still zero pragmas.

Github-Issue: #1775
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — the returncode guard (a070356) addresses the double-terminate log spam I flagged; the remaining FallbackProcess nit is cosmetic and unreachable in CI.

Extended reasoning...

Overview

This PR rewrites three flaky tests in tests/client/test_stdio.py (TestChildProcessCleanup), replacing a file-growth-polling approach with deterministic socket-based liveness probes. The change is test-only — no production code is touched. Net −76 lines, 15× faster, zero anyio.sleep() calls, zero coverage pragmas, zero filterwarnings decorators.

What changed since my last review

I previously flagged (2026-03-11) that _terminate_and_reap unconditionally called _terminate_process_tree on cleanup, causing spurious WARNING/ERROR logs on every POSIX run due to ProcessLookupError in the fallback path. The author addressed this in commit a070356 by adding if getattr(proc, "returncode", None) is None: before the terminate call, and updated the docstring to document idempotency. This fixes the issue for anyio.abc.Process (the POSIX path where the noisy logs actually occurred), which was the substance of my comment.

Remaining nit (posted inline)

The getattr(proc, "returncode", None) guard is technically ineffective for FallbackProcess, which exposes popen.returncode rather than returncode directly — so the guard always passes and double-terminate still happens. However: (a) FallbackProcess is only constructed on Windows with SelectorEventLoop, which hasn't been the default since Python 3.8 and isn't exercised in CI; (b) terminate_windows_process_tree swallows all exceptions, so double-calling it is silent. The bug report itself rates practical impact as "near-zero." A one-line returncode property on FallbackProcess would be the cleanest fix, but it's not blocking.

Security risks

None. Test-only code spawning local Python subprocesses that connect back to a localhost-bound ephemeral TCP listener. No external inputs, no auth/crypto, no shared state.

Level of scrutiny

Low-moderate. The changes are entirely in the test suite and improve reliability/determinism of CI. The socket-based mechanism (blocking accept() → kernel closes FDs on death → EndOfStream/BrokenResourceError) is sound and well-documented inline. The AsyncExitStack pattern for resource management is idiomatic. The {child_script!r} repr-based injection correctly avoids nested-quote escaping issues.

Other factors

The maintainer already reviewed and considered my prior (more substantive) finding non-blocking for approval. The author responded within a day with a targeted fix. The flake-finder validation (30 runs × 4 workers, 30/30 pass) demonstrates the reliability claim.

One remaining leak on Windows CI: _ProactorReadPipeTransport (stdout)
was never closed, triggering ResourceWarning in a later test.

Root cause: anyio's StreamReaderWrapper.aclose() only calls
set_exception(ClosedResourceError()) on the Python-level StreamReader —
it never touches the underlying transport. On Windows,
_ProactorReadPipeTransport starts with _paused=True and only detects
pipe EOF when someone reads. The EOF-detection path is what calls
transport.close(): _eof_received() → self.close() → _sock = None.
Without a read, the transport lives until __del__.

Production stdio.py avoids this because its stdout_reader task
continuously reads stdout, so EOF is detected naturally. These tests
never read stdout, so the transport stays paused with an orphaned
overlapped read until GC.

Fix: drain stdout with a single receive() call in _terminate_and_reap.
The process is already dead, so the read immediately gets EOF
(EndOfStream on clean close, BrokenResourceError on abrupt RST,
ClosedResourceError on second call when already aclose()'d — all
caught by contextlib.suppress).

Note: the agent's suggested fix (stack.enter_async_context(proc) for
Process.__aexit__) wouldn't help — __aexit__ calls aclose() which has
the same stdout.aclose() issue. The drain is what actually closes the
transport.

Github-Issue: #1775
FallbackProcess stands in for anyio.abc.Process on Windows with
SelectorEventLoop, but was missing the returncode property that Process
declares abstractly. This meant getattr(proc, "returncode", None) in
_terminate_and_reap always returned the default None for FallbackProcess,
so the double-terminate guard never fired (the docstring's 'Idempotent'
claim was false for half the declared union type).

Practical impact was near-zero — FallbackProcess is only active under
SelectorEventLoop (not the default since Python 3.8, not used in CI),
and terminate_windows_process_tree swallows all exceptions on the second
call. But the fix is trivial and brings FallbackProcess closer to the
interface it's supposed to implement.

With returncode present on both union members, the getattr can become a
direct attribute access, which also satisfies pyright without defensive
indirection.

Github-Issue: #1775
@maxisbey maxisbey merged commit dd52713 into main Mar 12, 2026
33 checks passed
@maxisbey maxisbey deleted the fix/flaky-child-process-cleanup-polling branch March 12, 2026 12:52
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — both issues addressed, the returncode property is the right fix.

Extended reasoning...

Overview

This PR rewrites three flaky integration tests (TestChildProcessCleanup) in tests/client/test_stdio.py to use a deterministic socket-based liveness probe instead of timing-dependent file-growth polling, and adds a 5-line returncode property to FallbackProcess in src/mcp/os/win32/utilities.py. The test rewrite is +180/−256 net and eliminates all sleep() calls, coverage pragmas, and filterwarnings decorators from the test class. The tests are now ~15× faster and flake-finder-verified.

Review history on this PR

I raised two issues on earlier revisions:

  1. Double-terminate spurious logs (POSIX): the AsyncExitStack refactor dropped a guard, so _terminate_process_tree was called twice per test, emitting WARNING/ERROR logs from terminate_posix_process_tree's fallback path. Fixed in a070356 with a returncode check.
  2. FallbackProcess missing returncode: the guard from (1) was ineffective for the Windows fallback path because FallbackProcess had no returncode attribute. Fixed in 0554fe0 by adding a property delegating to self.popen.returncode — the cleaner of the two fixes I suggested, since it brings the class closer to the anyio.abc.Process interface it stands in for.

Both inline threads are resolved. felixweinberger noted the first was legitimate but non-blocking. The current diff reflects both fixes correctly: _terminate_and_reap now checks proc.returncode is None directly (no getattr, since both union members have the property), and the FallbackProcess.returncode property is in place.

Security risks

None. The source change is a read-only property on a Windows-only fallback class. The test changes bind TCP listeners on 127.0.0.1 with ephemeral ports — standard test infrastructure, no external exposure.

Level of scrutiny

Low for the source change (5-line read-only property, obvious correctness), moderate for the test rewrite. Tests are where this PR's weight is, and test-only changes have inherently limited blast radius — worst case is the tests themselves flake or fail, which CI would catch. The socket-based design is sound (kernel-guaranteed accept() blocking + FD-close-on-process-death are reliable OS primitives), and the extensive docstrings make the mechanism clear. The _terminate_and_reap helper now correctly handles the idempotency requirement it claims in its docstring.

Other factors

No CODEOWNERS on these paths. No outstanding reviewer comments. The bug hunting system found no issues on this revision. The PR description's timing table and flake-finder results (30/30 pass × 4 workers) give good confidence the new design actually fixes the CI flakiness it targets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix flaky tests in TestChildProcessCleanup (test_stdio.py)

2 participants