Rewrite TestChildProcessCleanup with socket-based deterministic liveness probe by maxisbey · Pull Request #2265 · modelcontextprotocol/python-sdk

maxisbey · 2026-03-10T14:30:43Z

Summary

Closes #1775. Supersedes #2071.

The three TestChildProcessCleanup tests in tests/client/test_stdio.py failed intermittently on Windows/macOS CI. The root cause was a timing-dependent test design: watching a file grow on disk as a proxy for "is the child process alive?", with fixed anyio.sleep() durations to "wait long enough" for nested Python interpreters to start. On loaded CI, "long enough" wasn't.

A secondary bug compounded this: when the flaky assertion failed, proc was never terminated — the finally blocks only cleaned up tempfiles. The leaked subprocess was GC'd during a later test, triggering PytestUnraisableExceptionWarning there (observed: test_call_tool — an in-memory test that never touches subprocesses — failing with ResourceWarning: subprocess 5592 is still running).

New design — socket-based, zero sleeps, zero polling

Each subprocess in the tree connects a TCP socket back to a listener owned by the test. Two kernel-guaranteed blocking-I/O signals then tell us everything:

await listener.accept() blocks until the subprocess connects → proves it started. No polling for "did the file grow yet?".
await stream.receive(1) after _terminate_process_tree() raises anyio.EndOfStream (or anyio.BrokenResourceError on Windows, where TerminateJobObject causes an abrupt RST rather than clean FIN) because the kernel closes all file descriptors — including sockets — when a process terminates. This is a direct, OS-level proof that the child is dead. No polling for "did the file stop growing?".

Every synchronization point is a blocking read that unblocks on a kernel event. The anyio.fail_after() wrappers are upper bounds for catastrophic failure (subprocess never started / survived termination), not timing assumptions.

# Old: fixed sleep, indirect proxy, races against CI load
await anyio.sleep(0.5)
initial_size = os.path.getsize(marker_file)
await anyio.sleep(0.3)
assert os.path.getsize(marker_file) > initial_size  # fails: assert 0 > 0

# New: blocking accept, kernel-level EOF signal
with anyio.fail_after(10):
    stream = await sock.accept()           # blocks until child connects
    assert await stream.receive(5) == b"alive"

await _terminate_process_tree(proc)

with pytest.raises((anyio.EndOfStream, anyio.BrokenResourceError)):
    await stream.receive(1)                # child dead → kernel closed its socket

Process + transport cleanup

_terminate_process_tree kills the OS process group / Job Object but doesn't call process.wait() or close the anyio stdin/stdout pipe wrappers. On Windows, the resulting _WindowsSubprocessTransport / PipeHandle objects were GC'd later, triggering ResourceWarning in whatever test ran next. The SDK's production code at stdio.py:180 avoids this via async with process: — these tests call _terminate_process_tree directly, so they need equivalent cleanup.

Fixed with _terminate_and_reap(proc) registered via AsyncExitStack.push_async_callback() immediately after process creation: terminates the tree (no-op if already dead) → await proc.wait() → close stdin/stdout pipes. Bounded by move_on_after(5). All three tests use AsyncExitStack for RAII-style resource management — socket listener, process, and accepted streams each have their cleanup callback pushed at acquisition, so there are no if x is not None: guards in finally blocks, and no coverage pragmas.

This made the @pytest.mark.filterwarnings("ignore::ResourceWarning" if win32 ...) decorators unnecessary — removed.

Also removed

tempfile import and all file I/O
escape_path_for_python import (scripts are now injected via {code!r} repr() literals, which eliminates all nested-quote escaping)
~30 lines of boilerplate textwrap.dedent blocks with triply-nested f-string scripts
The parent_marker check from test 1, which was a no-op: NamedTemporaryFile(delete=False) creates the file, so os.path.exists() was always True

Timing

Test	main	PR #2265	Speedup
`test_basic_child_process_cleanup`	1.91s	0.15s	12.7×
`test_nested_process_tree`	3.41s	0.16s	21.3×
`test_early_parent_exit`	1.70s	0.14s	12.1×
Total	7.04s	0.46s	15.3×

30 runs × 4 parallel workers (flake-finder): ~2.1s, 30/30 pass.

Hygiene

Metric	Before	After
`anyio.sleep()` calls in `TestChildProcessCleanup`	12	0
Coverage pragmas in `TestChildProcessCleanup`	5	0
`filterwarnings("ignore::ResourceWarning")` decorators	3	0
Diff	—	+180 −256 (net −76 lines)

_{AI Disclaimer}

…c in finally The three TestChildProcessCleanup tests were flaky on Windows/macOS CI because they used fixed anyio.sleep() durations (0.5s + 0.3s = 800ms) to wait for nested Python interpreter chains to start writing to marker files. On loaded CI runners, two Python startups + subprocess.Popen + open() + first write() can exceed 800ms, causing assert 0 > 0. Secondary bug: when the assertion failed, proc was never terminated (finally only cleaned up tempfiles). The leaked subprocess was GC'd during a later test, triggering PytestUnraisableExceptionWarning and causing knock-on failures in unrelated tests. Changes: - Added _wait_for_file_size() helper: polls until getsize(path) exceeds a threshold, bounded by anyio.fail_after(10). Replaces all startup sleep+assert chains. Raises TimeoutError with a clear message instead of a confusing assert 0 > 0. - Added proc cleanup to each finally block. Uses proc=None tracking after successful in-test termination to avoid redundant double-calls, and anyio.move_on_after() so cleanup timeout never masks the real failure. - Removed the parent_marker check from test_basic_child_process_cleanup. NamedTemporaryFile(delete=False) already creates the file, so the os.path.exists() assertion was a no-op that never verified anything. Child writing proves parent started. - Simplified shutdown verification: terminate_posix_process_tree already polls for process-group death, so the first post-termination sleep(0.5) was redundant. Reduced to a single 0.3s stability check (3x child write interval). Result: 3 tests drop from ~7s to ~1.9s locally, 30/30 pass under 4-worker parallel load (flake-finder). Closes #1775 Github-Issue: #1775

…ess probe Replaces the file-growth-polling approach with a fundamentally different design: each subprocess in the tree connects a TCP socket back to a listener owned by the test. Liveness and death are then detected via blocking I/O operations that unblock on kernel-level events — zero sleeps, zero polling: 1. await listener.accept() blocks until the subprocess connects (proves it started). 2. After _terminate_process_tree(), await stream.receive(1) raises EndOfStream because the kernel closes all file descriptors of a terminated process. This is direct, OS-guaranteed proof that the child is dead. Compared to the original file-watching approach: - No tempfiles, no file I/O, no path escaping - No fixed sleep() durations (the root cause of flakiness) - No polling loops - Failure modes are clear: accept times out = child never started; receive times out = child survived termination (actual bug) Also fixes the process leak on assertion failure by adding proc cleanup to each finally block (guarded by proc=None after successful in-test termination to avoid noisy double-terminate). Result: 3 tests drop from ~7.1s to ~0.5s, 30/30 pass under 4-worker parallel load. Removes 80 net lines and the tempfile/path-escape deps. Closes #1775 Github-Issue: #1775

On Windows, TerminateJobObject causes an abrupt TCP RST (not clean FIN), so anyio raises BrokenResourceError instead of EndOfStream when reading from a socket held by the terminated child. Both exceptions indicate the peer is dead — catch both. Github-Issue: #1775

_terminate_process_tree kills the OS process group / Job Object but does not call process.wait() or close the anyio stdin/stdout pipe wrappers. On Windows, the resulting _WindowsSubprocessTransport / PipeHandle objects are GC'd after the test's filterwarnings scope has exited, triggering PytestUnraisableExceptionWarning in whatever test runs next on the same worker. The old file-watching tests masked this because their ~1.8s of sleep() calls per test gave the asyncio event loop's child watcher time to fire _process_exited() callbacks (which close the transport) during the test. The new socket-based tests are ~0.15s each — too fast for that callback to fire before GC. The SDK's production path (stdio.py:180) avoids this via 'async with process:', whose __aexit__ handles reaping + stream closure. These tests call _terminate_process_tree directly, so they must do the same cleanup explicitly. Added _terminate_and_reap() helper that runs unconditionally in each test's finally: terminate tree (no-op if already dead), await wait() to reap, close stdin/stdout pipe wrappers. Bounded by move_on_after(5). Github-Issue: #1775

Replaced 8 coverage pragmas with restructuring: - pytest.fail() # pragma: no cover → pytest.raises() — if the child survives, pytest.raises fails with "DID NOT RAISE" (library handles it) - if proc.stdin is not None: # pragma: no branch → assert proc.stdin is not None — satisfies pyright Optional narrowing; asserts are statements not branches, so coverage doesn't need both paths - if proc is not None: # pragma: no branch (×3 in finally blocks) → AsyncExitStack.push_async_callback(_terminate_and_reap, proc) immediately after proc is created. Cleanup is guaranteed, no None-check needed. - if stream is not None: # pragma: no branch (×2 in finally blocks) → Same: push stream.aclose as a callback right after accept() Also dropped the # noqa: PERF401 — the accept loop now does stream acquisition + cleanup registration + list append (3 ops), which doesn't match PERF401's single-append pattern. Net: all 3 tests are single-indent, zero try/finally boilerplate, zero pragmas, zero noqa. Still 100% coverage, ~0.5s for all 3 tests. Github-Issue: #1775

claude

LGTM — test-only rewrite that replaces a flaky timing-based design with deterministic kernel-level socket signals.

Extended reasoning...

Overview

This PR rewrites the three TestChildProcessCleanup tests in tests/client/test_stdio.py to fix CI flakiness (#1775). It touches a single test file (+168/−248, net −80 lines), replacing a file-growth-watching approach (with hardcoded anyio.sleep() durations that raced against slow Python interpreter startup on loaded CI) with a socket-based liveness probe: each subprocess connects a TCP socket back to a test-owned listener, and the test uses accept() blocking until connect to prove liveness, and EndOfStream/BrokenResourceError on receive() after termination as kernel-level proof of death. It also fixes a secondary process-leak bug where failed assertions left subprocesses unreaped, polluting later tests with ResourceWarning.

Security risks

None. The TCP listener binds to 127.0.0.1 on an ephemeral port, lives only for the duration of a single test method, and transports a fixed 5-byte literal b"alive". No external network exposure, no user input, no auth/crypto/permissions code touched. This is purely test-harness plumbing.

Level of scrutiny

Low. This is a test-only change with zero production code impact. The new design is strictly more deterministic than what it replaces — blocking on kernel events (accept(), socket EOF) rather than sleeping and hoping. The FallbackProcess import from mcp.os.win32.utilities is safe cross-platform (the class is defined unconditionally; only the pywin32 imports are guarded). The {code!r} script injection is a nice simplification over the old triple-nested f-string escaping. The AsyncExitStack cleanup pattern with _terminate_and_reap correctly handles both the happy path (idempotent double-terminate is bounded by move_on_after) and the failure path (subprocess is reaped even if an assertion fails early).

Other factors

No CODEOWNERS file, no prior reviews on the timeline, no bugs flagged by the bug-hunting system. The PR description includes stress-test evidence (30 runs × 4 parallel workers, 30/30 pass) and a 15× speedup. The helper functions are well-factored with clear docstrings explaining the design rationale. The only theoretical nit — stream.receive(5) returning fewer than 5 bytes — is not a practical concern for a 5-byte message on loopback TCP sent via sendall.

…rators The old tests had filterwarnings("ignore::ResourceWarning" if win32 ...) because _terminate_process_tree didn't call process.wait() or close pipe transports, so GC of the anyio Process wrapper emitted ResourceWarning. _terminate_and_reap now explicitly calls wait() + closes stdin/stdout pipes in the test's AsyncExitStack teardown (i.e., during the test, before GC can fire). The warnings should no longer occur — if they do, CI will tell us and we should fix the root cause rather than suppress. Github-Issue: #1775

tests/client/test_stdio.py

…tent Addresses review feedback: calling _terminate_process_tree on an already-reaped POSIX process hits the fallback path in terminate_posix_process_tree (os.getpgid on a stale PID raises ProcessLookupError), which emits two WARNING logs and an ERROR with traceback. These are visible because pyproject.toml sets log_cli=true. Previously, each test called _terminate_process_tree in the body and the AsyncExitStack called _terminate_and_reap (which re-called _terminate_process_tree) at exit — double-terminate, nine spurious log records per run. Fix: _terminate_and_reap now checks proc.returncode before terminating. On POSIX, the asyncio child watcher sets returncode during _terminate_process_tree's polling-loop yield, so by the time cleanup runs it's populated and the guard skips the noisy re-terminate. On Windows/FallbackProcess (no returncode attribute), getattr returns None and terminate runs — but the Windows path is silent on double-call. Restructured tests to call _terminate_and_reap (which wraps _terminate_process_tree) as the action under test, and let the ExitStack safety-net call it again. Since wait() and stream.aclose() are idempotent, this is safe — and it covers both branches of the guard (first call: returncode is None → terminate; second call: returncode set → skip). 100% coverage, still zero pragmas. Github-Issue: #1775

claude

LGTM — the returncode guard (a070356) addresses the double-terminate log spam I flagged; the remaining FallbackProcess nit is cosmetic and unreachable in CI.

Extended reasoning...

Overview

This PR rewrites three flaky tests in tests/client/test_stdio.py (TestChildProcessCleanup), replacing a file-growth-polling approach with deterministic socket-based liveness probes. The change is test-only — no production code is touched. Net −76 lines, 15× faster, zero anyio.sleep() calls, zero coverage pragmas, zero filterwarnings decorators.

What changed since my last review

I previously flagged (2026-03-11) that _terminate_and_reap unconditionally called _terminate_process_tree on cleanup, causing spurious WARNING/ERROR logs on every POSIX run due to ProcessLookupError in the fallback path. The author addressed this in commit a070356 by adding if getattr(proc, "returncode", None) is None: before the terminate call, and updated the docstring to document idempotency. This fixes the issue for anyio.abc.Process (the POSIX path where the noisy logs actually occurred), which was the substance of my comment.

Remaining nit (posted inline)

The getattr(proc, "returncode", None) guard is technically ineffective for FallbackProcess, which exposes popen.returncode rather than returncode directly — so the guard always passes and double-terminate still happens. However: (a) FallbackProcess is only constructed on Windows with SelectorEventLoop, which hasn't been the default since Python 3.8 and isn't exercised in CI; (b) terminate_windows_process_tree swallows all exceptions, so double-calling it is silent. The bug report itself rates practical impact as "near-zero." A one-line returncode property on FallbackProcess would be the cleanest fix, but it's not blocking.

Security risks

None. Test-only code spawning local Python subprocesses that connect back to a localhost-bound ephemeral TCP listener. No external inputs, no auth/crypto, no shared state.

Level of scrutiny

Low-moderate. The changes are entirely in the test suite and improve reliability/determinism of CI. The socket-based mechanism (blocking accept() → kernel closes FDs on death → EndOfStream/BrokenResourceError) is sound and well-documented inline. The AsyncExitStack pattern for resource management is idiomatic. The {child_script!r} repr-based injection correctly avoids nested-quote escaping issues.

Other factors

The maintainer already reviewed and considered my prior (more substantive) finding non-blocking for approval. The author responded within a day with a targeted fix. The flake-finder validation (30 runs × 4 workers, 30/30 pass) demonstrates the reliability claim.

tests/client/test_stdio.py

One remaining leak on Windows CI: _ProactorReadPipeTransport (stdout) was never closed, triggering ResourceWarning in a later test. Root cause: anyio's StreamReaderWrapper.aclose() only calls set_exception(ClosedResourceError()) on the Python-level StreamReader — it never touches the underlying transport. On Windows, _ProactorReadPipeTransport starts with _paused=True and only detects pipe EOF when someone reads. The EOF-detection path is what calls transport.close(): _eof_received() → self.close() → _sock = None. Without a read, the transport lives until __del__. Production stdio.py avoids this because its stdout_reader task continuously reads stdout, so EOF is detected naturally. These tests never read stdout, so the transport stays paused with an orphaned overlapped read until GC. Fix: drain stdout with a single receive() call in _terminate_and_reap. The process is already dead, so the read immediately gets EOF (EndOfStream on clean close, BrokenResourceError on abrupt RST, ClosedResourceError on second call when already aclose()'d — all caught by contextlib.suppress). Note: the agent's suggested fix (stack.enter_async_context(proc) for Process.__aexit__) wouldn't help — __aexit__ calls aclose() which has the same stdout.aclose() issue. The drain is what actually closes the transport. Github-Issue: #1775

FallbackProcess stands in for anyio.abc.Process on Windows with SelectorEventLoop, but was missing the returncode property that Process declares abstractly. This meant getattr(proc, "returncode", None) in _terminate_and_reap always returned the default None for FallbackProcess, so the double-terminate guard never fired (the docstring's 'Idempotent' claim was false for half the declared union type). Practical impact was near-zero — FallbackProcess is only active under SelectorEventLoop (not the default since Python 3.8, not used in CI), and terminate_windows_process_tree swallows all exceptions on the second call. But the fix is trivial and brings FallbackProcess closer to the interface it's supposed to implement. With returncode present on both union members, the getattr can become a direct attribute access, which also satisfies pyright without defensive indirection. Github-Issue: #1775

claude

LGTM — both issues addressed, the returncode property is the right fix.

Extended reasoning...

Overview

This PR rewrites three flaky integration tests (TestChildProcessCleanup) in tests/client/test_stdio.py to use a deterministic socket-based liveness probe instead of timing-dependent file-growth polling, and adds a 5-line returncode property to FallbackProcess in src/mcp/os/win32/utilities.py. The test rewrite is +180/−256 net and eliminates all sleep() calls, coverage pragmas, and filterwarnings decorators from the test class. The tests are now ~15× faster and flake-finder-verified.

Review history on this PR

I raised two issues on earlier revisions:

Double-terminate spurious logs (POSIX): the AsyncExitStack refactor dropped a guard, so _terminate_process_tree was called twice per test, emitting WARNING/ERROR logs from terminate_posix_process_tree's fallback path. Fixed in a070356 with a returncode check.
FallbackProcess missing returncode: the guard from (1) was ineffective for the Windows fallback path because FallbackProcess had no returncode attribute. Fixed in 0554fe0 by adding a property delegating to self.popen.returncode — the cleaner of the two fixes I suggested, since it brings the class closer to the anyio.abc.Process interface it stands in for.

Both inline threads are resolved. felixweinberger noted the first was legitimate but non-blocking. The current diff reflects both fixes correctly: _terminate_and_reap now checks proc.returncode is None directly (no getattr, since both union members have the property), and the FallbackProcess.returncode property is in place.

Security risks

None. The source change is a read-only property on a Windows-only fallback class. The test changes bind TCP listeners on 127.0.0.1 with ephemeral ports — standard test infrastructure, no external exposure.

Level of scrutiny

Low for the source change (5-line read-only property, obvious correctness), moderate for the test rewrite. Tests are where this PR's weight is, and test-only changes have inherently limited blast radius — worst case is the tests themselves flake or fail, which CI would catch. The socket-based design is sound (kernel-guaranteed accept() blocking + FD-close-on-process-death are reliable OS primitives), and the extensive docstrings make the mechanism clear. The _terminate_and_reap helper now correctly handles the idempotency requirement it claims in its docstring.

Other factors

No CODEOWNERS on these paths. No outstanding reviewer comments. The bug hunting system found no issues on this revision. The PR description's timing table and flake-finder results (30/30 pass × 4 workers) give good confidence the new design actually fixes the CI flakiness it targets.

maxisbey added 2 commits March 10, 2026 14:30

maxisbey changed the title ~~Fix flaky TestChildProcessCleanup: poll for file growth, clean up proc in finally~~ Rewrite TestChildProcessCleanup with socket-based deterministic liveness probe Mar 10, 2026

maxisbey added 3 commits March 10, 2026 17:38

maxisbey marked this pull request as ready for review March 11, 2026 18:23

claude bot reviewed Mar 11, 2026

View reviewed changes

tests/client/test_stdio.py Outdated Show resolved Hide resolved

felixweinberger previously approved these changes Mar 11, 2026

View reviewed changes

maxisbey dismissed felixweinberger’s stale review via a070356 March 12, 2026 11:07

claude bot reviewed Mar 12, 2026

View reviewed changes

tests/client/test_stdio.py Show resolved Hide resolved

maxisbey added 2 commits March 12, 2026 11:50

felixweinberger approved these changes Mar 12, 2026

View reviewed changes

maxisbey merged commit dd52713 into main Mar 12, 2026
33 checks passed

maxisbey deleted the fix/flaky-child-process-cleanup-polling branch March 12, 2026 12:52

claude bot reviewed Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite TestChildProcessCleanup with socket-based deterministic liveness probe#2265

Rewrite TestChildProcessCleanup with socket-based deterministic liveness probe#2265
maxisbey merged 9 commits intomainfrom
fix/flaky-child-process-cleanup-polling

maxisbey commented Mar 10, 2026 •

edited

Loading

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

claude bot left a comment

Uh oh!

Uh oh!

Uh oh!

claude bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maxisbey commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New design — socket-based, zero sleeps, zero polling

Process + transport cleanup

Also removed

Timing

Hygiene

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

What changed since my last review

Remaining nit (posted inline)

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Review history on this PR

Security risks

Level of scrutiny

Other factors

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

maxisbey commented Mar 10, 2026 •

edited

Loading