Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions docs/migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,41 @@ The `headers`, `timeout`, `sse_read_timeout`, and `auth` parameters have been re

Note: `sse_client` retains its `headers`, `timeout`, `sse_read_timeout`, and `auth` parameters — only the streamable HTTP transport changed.

### `terminate_windows_process` removed

The deprecated `mcp.os.win32.utilities.terminate_windows_process` function has been
removed. Process termination is handled internally by the `stdio_client` context
manager; there is no replacement API. The Windows tree-termination helper
`terminate_windows_process_tree` no longer accepts a `timeout_seconds` argument —
the value was never used (Job Object termination is immediate).

### `stdio_client` no longer kills children of a gracefully-exited server on POSIX

When a server exits on its own after `stdio_client` closes its stdin, background
child processes the server leaves behind are no longer killed on POSIX — their
lifetime is the server's business. The old behavior was a side effect of a shutdown
wait gated on the stdio pipes closing rather than on process exit: a child holding
an inherited pipe made a well-behaved server look hung, so its whole process tree
was killed. (That gating is an asyncio behavior specific to Python 3.11+ — on
Python 3.10 and the trio backend the old wait already resolved on process exit, so
the spurious kill never fired there.) A server that does not exit within the grace
period is still terminated
along with its entire process group. On Windows, children stay in the server's Job
Object and are still killed at shutdown — now deterministically when the job handle
is closed, rather than whenever the handle happened to be garbage-collected.

If you relied on `stdio_client` killing everything the server spawned, make the
server terminate its own children on shutdown (its stdin reaching EOF is the
shutdown signal), or clean up the process tree from the host application after
`stdio_client` exits.

Two related shutdown refinements: `stdio_client` now closes its end of the pipes
deterministically at shutdown, so a surviving child that keeps writing to an
inherited stdout receives `EPIPE`/`SIGPIPE` once the client is gone (previously the
pipe lingered until garbage collection); and a failed write to a server that is
still running now surfaces as a closed connection (`CONNECTION_CLOSED`) on the read
side instead of leaving requests waiting indefinitely.

### Removed type aliases and classes

The following deprecated type aliases and classes have been removed from `mcp.types`:
Expand Down
365 changes: 278 additions & 87 deletions src/mcp/client/stdio.py

Large diffs are not rendered by default.

110 changes: 74 additions & 36 deletions src/mcp/os/posix/utilities.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,49 +9,87 @@

logger = logging.getLogger(__name__)

# How often to probe for surviving process-group members between SIGTERM and SIGKILL.
_GROUP_POLL_INTERVAL = 0.01


async def terminate_posix_process_tree(process: Process, timeout_seconds: float = 2.0) -> None:
"""Terminate a process and all its children on POSIX systems.
"""Terminate a process and all its descendants on POSIX systems.

The process was spawned with `start_new_session=True`, so it leads its own
process group and its pgid equals its pid. `os.killpg` on that group reaches
every descendant in one atomic call — including descendants whose parent (even
the group leader itself) has already exited, which a walk of the process tree
would miss.

Uses os.killpg() for atomic process group termination.
Sends SIGTERM to the group, waits up to `timeout_seconds` for the group to
disappear, then SIGKILLs whatever remains.

Args:
process: The process to terminate
timeout_seconds: Timeout in seconds before force killing (default: 2.0)
Descendants that move themselves into a new session or process group
(daemonizers) escape a group kill by design. And a process group only
disappears once every member is dead *and reaped*: if this client runs as
PID 1 or a subreaper without reaping orphans, dead descendants reparent to
it as zombies that keep the group occupied, so the wait below always runs
to its full timeout. Run such clients under an init shim (e.g.
`docker run --init`) to get the fast path back.
"""
pid = getattr(process, "pid", None) or getattr(getattr(process, "popen", None), "pid", None)
if not pid:
# No PID means there's no process to terminate - it either never started,
# already exited, or we have an invalid process object
return
# start_new_session=True at spawn makes the leader's pid the pgid; do not ask the
# OS via getpgid(), which fails with ProcessLookupError once the leader has been
# reaped even while other group members are still alive.
pgid = process.pid

try:
pgid = os.getpgid(pid)
os.killpg(pgid, signal.SIGTERM)
except ProcessLookupError:
# The entire group is already gone; nothing to terminate.
return
except PermissionError:
# What EPERM proves differs by platform. Linux killpg(2): no "permission to
# send the signal to any of the target processes" — every member was denied,
# but those members are still alive. macOS kill(2): "when signaling a process
# group, this error is returned if any members of the group could not be
# signaled" — one foreign-euid member is enough, and the rest of the group may
# well have been signalled (current XNU also raises it when only unreaped
# zombies remain, where Linux would succeed). On no platform does it mean the
# group is gone, so fall through to the grace wait and SIGKILL escalation —
# both tolerate EPERM — instead of giving up: members that exit (or get
# reaped) end the wait early, and permitted members still get the KILL
# wherever the platform delivers it.
logger.exception("No permission to signal some of process group %d; waiting for it to exit anyway", pgid)

with anyio.move_on_after(timeout_seconds):
while True:
try:
# Check if process group still exists (signal 0 = check only)
os.killpg(pgid, 0)
await anyio.sleep(0.1)
except ProcessLookupError:
return

try:
os.killpg(pgid, signal.SIGKILL)
except ProcessLookupError:
pass

except (ProcessLookupError, PermissionError, OSError) as e:
logger.warning(f"Process group termination failed for PID {pid}: {e}, falling back to simple terminate")
try:
process.terminate()
with anyio.fail_after(timeout_seconds):
await process.wait()
except Exception:
logger.warning(f"Process termination failed for PID {pid}, attempting force kill")
with anyio.move_on_after(timeout_seconds):
while True:
try:
process.kill()
except Exception:
logger.exception(f"Failed to kill process {pid}")
# Probe for surviving group members (signal 0 checks without
# signalling). Only ESRCH proves the group is gone: on Linux the
# probe keeps succeeding while live members or unreaped zombies
# remain (so it waits out reaping rather than racing it), and EPERM
# is ambiguous on every platform.
os.killpg(pgid, 0)
except ProcessLookupError:
return
except PermissionError:
# Live members we may not signal (Linux), or a group with foreign
# members or nothing but zombies (macOS). Keep waiting: reaping
# turns an all-zombie group into ESRCH above, and unsignalable
# survivors may still exit on their own within the timeout.
pass
# Touching returncode reaps the leader on trio (the property calls
# Popen.poll()); without it nothing reaps during this loop and the
# leader's zombie keeps the group alive for the full timeout. On
# asyncio it is a cheap attribute read. Dead non-leader descendants
# are reaped by init once orphaned — except under a non-reaping
# PID-1/subreaper client, where their zombies hold the group here for
# the full timeout (see docstring).
_ = process.returncode
await anyio.sleep(_GROUP_POLL_INTERVAL)

try:
os.killpg(pgid, signal.SIGKILL)
except ProcessLookupError:
# The group died between the last probe and the kill.
pass
except PermissionError:
# Same per-platform ambiguity as the SIGTERM above: whatever the platform
# let us signal has now been KILLed; the rest is not ours to touch.
pass
Loading
Loading