Summary
Client::stop() in src/lib.rs SIGKILLs the agent runtime CLI immediately after session.destroy returns, with no SIGTERM grace period. This races with the runtime's own MCP cleanup and causes orphaned MCP stdio child processes to accumulate across normal app restarts in every downstream consumer (notably the GitHub Copilot Tauri app).
Current behavior
// crates/copilot-sdk/src/lib.rs:1894 (vendored into github/github-app)
if let Some(mut child) = child
&& let Err(e) = child.kill().await // tokio Child::kill = SIGKILL on Unix
{
errors.push(Error::Io(e));
}
Tokio's Child::kill() is unconditionally SIGKILL on Unix. SIGKILL is uncatchable, so the runtime's MCP cleanup (which calls a synchronous pgrep -P enumeration inside the Node process before signaling descendants via process.kill) is interrupted mid-flight whenever cleanup takes longer than the few ms between session.destroy returning and SIGKILL landing.
This nullifies the protections from:
copilot-agent-runtime PR #7517 (killProcessTree on transport close)
copilot-agent-runtime PR #8103 (fire-and-forget MCP shutdown in dispose())
Both shipped, both verified present in the running runtime, leak still happens.
Evidence
On my machine running github-app 0aa3a6b41 (May 21 build) + copilot-agent-runtime built from HEAD (May 20), after 3 days of normal Copilot usage:
11 orphaned `uv tool uvx microsoft-fabric-rti-mcp` processes
~220 MB resident total
oldest 3 days, newest 1 hour old (well after #8103 merged)
all ppid=1 (reparented to init)
each carries a live python child also orphaned (22 leaked processes total)
Detection:
ps -Ao pid,ppid,etime,rss,command | awk '$2==1 && /uvx|fabric-rti/'
The leak is reliably reproducible by quitting GitHub Copilot. Node-based MCP servers (npm-exec'd) clean up correctly via stdin-EOF voluntary exit; uvx-launched servers do not because uv tool uvx is a Rust supervisor that holds the child's stdin pipe open from its own side, so the python child never sees EOF. The only reliable cleanup path for uvx-launched servers is the runtime's killProcessTree — which is racing the SDK's SIGKILL and losing.
Proposed fix
In Client::stop(), replace child.kill().await with a SIGTERM-then-SIGKILL escalation:
#[cfg(unix)]
{
use nix::sys::signal::{self, Signal};
use nix::unistd::Pid;
if let Some(pid_raw) = child.id() {
let _ = signal::kill(Pid::from_raw(pid_raw as i32), Signal::SIGTERM);
}
match tokio::time::timeout(std::time::Duration::from_secs(3), child.wait()).await {
Ok(_) => {} // graceful exit, killProcessTree had time to run
Err(_) => {
if let Err(e) = child.kill().await {
errors.push(Error::Io(e));
}
}
}
}
#[cfg(not(unix))]
{
if let Err(e) = child.kill().await {
errors.push(Error::Io(e));
}
}
Pair this with a runtime-side fix that installs process.on("SIGTERM"|"SIGINT") handlers in the CLI entrypoint and awaits session.dispose() (also needed because today the runtime has no signal handlers at all — grep process.on.*SIG src --include="*.ts" → zero hits). Tracked at github/copilot-agent-runtime#8598.
References
Summary
Client::stop()insrc/lib.rsSIGKILLs the agent runtime CLI immediately aftersession.destroyreturns, with no SIGTERM grace period. This races with the runtime's own MCP cleanup and causes orphaned MCP stdio child processes to accumulate across normal app restarts in every downstream consumer (notably the GitHub Copilot Tauri app).Current behavior
Tokio's
Child::kill()is unconditionally SIGKILL on Unix. SIGKILL is uncatchable, so the runtime's MCP cleanup (which calls a synchronouspgrep -Penumeration inside the Node process before signaling descendants viaprocess.kill) is interrupted mid-flight whenever cleanup takes longer than the few ms betweensession.destroyreturning and SIGKILL landing.This nullifies the protections from:
copilot-agent-runtimePR #7517 (killProcessTree on transport close)copilot-agent-runtimePR #8103 (fire-and-forget MCP shutdown indispose())Both shipped, both verified present in the running runtime, leak still happens.
Evidence
On my machine running github-app
0aa3a6b41(May 21 build) + copilot-agent-runtime built from HEAD (May 20), after 3 days of normal Copilot usage:Detection:
The leak is reliably reproducible by quitting GitHub Copilot. Node-based MCP servers (npm-exec'd) clean up correctly via stdin-EOF voluntary exit; uvx-launched servers do not because
uv tool uvxis a Rust supervisor that holds the child's stdin pipe open from its own side, so the python child never sees EOF. The only reliable cleanup path for uvx-launched servers is the runtime'skillProcessTree— which is racing the SDK's SIGKILL and losing.Proposed fix
In
Client::stop(), replacechild.kill().awaitwith a SIGTERM-then-SIGKILL escalation:Pair this with a runtime-side fix that installs
process.on("SIGTERM"|"SIGINT")handlers in the CLI entrypoint and awaitssession.dispose()(also needed because today the runtime has no signal handlers at all —grep process.on.*SIG src --include="*.ts"→ zero hits). Tracked at github/copilot-agent-runtime#8598.References