Skip to content

Orphaned simctl log-stream processes when MCP server exits abnormally #382

@cameroncooke

Description

@cameroncooke

Summary

Long-lived simctl spawn … log stream child processes spawned by the simulator log-capture flow can outlive the MCP server and become permanent orphans (reparented to launchd, PPID=1) when the server is killed or crashes via a path not covered by the existing shutdown coordinator. The processes are intentionally detached so tool calls don't block, which is correct, but the cleanup story is incomplete: log_capture.ts sessions live only in an in-memory Map, and there is no process.on('exit')/beforeExit last-chance hook on either the MCP server or the daemon.

Evidence: real orphan observed in the wild

After a launch_app_logs_sim-style call from a previous session, the following processes were found alive ~9.5 hours later:

PID    PPID  ELAPSED   %CPU   COMMAND
17111  1     09:36:37  0.0    simctl launch --console-pty --terminate-running-process <udid> com.sentry.weather.Weather
17163  1     09:36:35  0.0    simctl spawn <udid> log stream --level=debug --predicate subsystem == "com.sentry.weather.Weather"
17167  16813 09:36:35  0.0    log stream --level=debug --predicate subsystem == "com.sentry.weather.Weather"   (in-sim, child of launchd_sim)
17138  16813 09:36:36  12.3   Weather.app/Weather                                                              (in-sim app, stuck in CPU loop)
  • 17111 and 17163 had PPID=1 — i.e., genuinely orphaned, reparented to launchd after their original spawning xcodebuildmcp tool process exited without cleaning them up.
  • 17167 and 17138 were correctly parented to launchd_sim (16813) and were collected automatically when the user shut the simulator down.
  • No xcodebuild, swift-frontend, or xctest was running anywhere — the MCP servers were idle (verified via sample: main thread parked on kevent, all workers parked on condvars).

The --predicate subsystem == "<bundleId>" shape matches both code paths (simulator-steps.ts and log_capture.ts), so the orphan is consistent with either flow.

How log streams are spawned (intentional detach — this part is fine)

Simulator-launch OSLog flowsrc/utils/simulator-steps.ts:298-335

const child = spawner('xcrun', ['simctl','spawn', simulatorUuid,'log','stream','--level=debug','--predicate',`subsystem == \"${bundleId}\"`], {
  stdio: ['ignore', fd, fd],
  detached: true,        // <-- line 312
});
await registerSimulatorLaunchOsLogSession({ process: child, simulatorUuid, bundleId, logFilePath });  // <-- line 317
child.unref();           // <-- line 332

Log-capture flowsrc/utils/log_capture.ts:186-233

const osLogResult = await executor(osLogCommand, 'OS Log Capture', false, undefined, true);  // detached=true (line 200)
// …
process.unref?.();
(process.stdout as any)?.unref?.();
(process.stderr as any)?.unref?.();
// …
activeLogSessions.set(logSessionId, { processes, logFilePath, simulatorUuid, bundleId, logStream, releaseActivity });  // line 226

Both detach via detached: true + unref(). The fire-and-forget executor (src/utils/command.ts:181-211) resolves ~100ms after spawn so the tool call returns immediately. This is the correct design.

Cleanup paths that exist

src/server/mcp-shutdown.ts:198-208 — the shutdown coordinator runs all three stops:

{ operation: () => stopAllLogCaptures(STEP_TIMEOUT_MS),}                  // line 198
{ operation: () => stopOwnedSimulatorLaunchOsLogSessions(STEP_TIMEOUT_MS) }  // line 203
{ operation: () => stopAllDeviceLogCaptures(STEP_TIMEOUT_MS),}            // line 208

src/server/mcp-lifecycle.ts:379-386 — the MCP server registers handlers covering most abnormal-exit cases:

processRef.once('SIGTERM', handleSigterm);
processRef.once('SIGINT', handleSigint);
processRef.stdin.once('end', handleStdinEnd);
processRef.stdin.once('close', handleStdinClose);
processRef.stdout?.once('error', handleStdoutError);
processRef.stderr?.once('error', handleStderrError);
processRef.once('uncaughtException', handleUncaughtException);
processRef.once('unhandledRejection', handleUnhandledRejection);

src/utils/simulator-steps.ts sessions are also persisted to a filesystem registry (registerSimulatorLaunchOsLogSession at line 317) with owner instanceId, so they can be reconciled across crashes.

Gaps

1. log_capture.ts sessions are in-memory only

src/utils/log_capture.ts:73

export const activeLogSessions: Map<string, LogSession> = new Map();

There's no filesystem registry equivalent to simulator-launch-oslog-registry. If the MCP server is SIGKILL'd, the OS hard-kills the host, or the process exits via a path that doesn't run the shutdown coordinator, the PIDs are lost forever — they cannot be reconciled on next startup the way simulator-launch sessions can.

2. No process.on('exit') / beforeExit last-chance hook

Neither mcp-lifecycle.ts nor daemon.ts registers a synchronous 'exit' (or 'beforeExit') handler that walks the active session Maps and process.kills tracked PIDs. Anything that reaches process.exit(...) without going through the coordinator (third-party library calls, native exits, certain failure paths) leaks. 'exit' is the only listener guaranteed to fire on every Node exit including manual process.exit().

3. Daemon path has thinner signal coverage than the MCP lifecycle path

src/daemon.ts:396-397

process.on('SIGTERM', shutdown);
process.on('SIGINT', shutdown);

Compared with mcp-lifecycle.ts:379-386, the daemon does not register uncaughtException or unhandledRejection handlers. Crashes inside the daemon (where simulator-launch-oslog and log_capture PIDs are tracked) skip the shutdown cascade entirely.

Suggested fixes (not prescriptive — owner's call)

  1. Mirror the simulator-launch registry pattern for log_capture.ts: write {pid, udid, predicate, ownerInstanceId, logFilePath} to disk on start, remove on stop. On startup, scan the registry and kill any sessions whose owner instance is no longer alive.
  2. Add a process.on('exit', ...) (and/or beforeExit) last-chance hook in both mcp-lifecycle.ts and daemon.ts that walks activeLogSessions and the simulator-launch sessions Map and synchronously sends SIGTERM to tracked PIDs. 'exit' handlers must be sync, so this is best-effort, but it covers the process.exit() and unhandled-shutdown paths.
  3. Bring daemon.ts signal coverage to parity with mcp-lifecycle.ts by also registering uncaughtException and unhandledRejection.

Repro / verification

After running a simulator log-capture tool call and then killing the MCP server with SIGKILL (instead of letting it shut down cleanly), check for orphans:

ps -ef | grep -E "simctl spawn .* log stream" | grep -v grep

Any rows with PPID=1 are leaked. They will keep running until the simulator is shut down or they are killed manually.

Environment

  • macOS 26.3.1 (25D2128)
  • Observed in a long-running session against getsentry/XcodeBuildMCP main branch
  • Affected tools: anything that uses log_capture.ts (e.g., start_sim_log_cap-style flows) and, to a lesser extent, simulator-steps.ts (which is registry-backed but still vulnerable to the missing 'exit' hook).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions