Skip to content

integration-test skill exercises stale binary; stdio MCP server and daemon are split-brain #147

@obj-p

Description

@obj-p

Problem

The integration-test skill (.claude/skills/integration-test/SKILL.md) does swift build in step 2 and then validates behavior via mcp__previewsmcp__* tool calls. But those calls hit the stdio MCP server that Claude Code spawned at session start — not the freshly built binary. The build step is essentially decorative from the test's perspective: any code change made during the session is invisible to the tools the skill is using to validate it.

Mechanism:

  • .mcp.json declares command: ".build/debug/previewsmcp" with args: ["serve"].
  • Claude Code spawns that process once, at session start, and keeps it resident for the whole session.
  • swift build overwrites the on-disk binary, but the resident process keeps running the old code.
  • All mcp__previewsmcp__* calls go to the resident process.

So a contributor can edit code, run the skill, see green, and ship a regression — the test never exercised the change.

Architectural footgun (broader than the skill)

PreviewsMCP has two completely separate server processes with independent session state:

  1. Stdio MCP serverpreviewsmcp serve (no flag). Self-contained: own PreviewHost, IOSSessionManager, ConfigCache (Sources/PreviewsCLI/ServeCommand.swift:57-73). This is what .mcp.json spawns.
  2. Unix-socket daemonpreviewsmcp serve --daemon at ~/.previewsmcp/serve.sock. Auto-spawned by every CLI subcommand other than serve. Persists across Claude Code restarts.

A session started via the MCP tools lives in process (1) and is invisible to previewsmcp list, previewsmcp snapshot --session-id …, etc. (which talk to process (2)). A contributor debugging an MCP-tool issue with the CLI will see an empty session list and waste time figuring out why.

This split-brain isn't documented in AGENTS.md beyond the bullet that says CLI subcommands talk to a daemon. The implication for testing and debugging is non-obvious.

Relationship to #142

#142 covers CLI-vs-daemon version mismatch on upgrade (Homebrew/source). This issue is adjacent but distinct: it's about the stdio MCP server going stale within a single Claude Code session, and about the two-process architecture being a footgun beyond just upgrades. A version-handshake fix shaped like the one proposed in #142 would not catch this case, because the stdio server has no peer to handshake with — the staleness is purely between the resident process and the on-disk binary.

Proposed fixes

1. Skill changes (small, immediate)

Update .claude/skills/integration-test/SKILL.md:

  • Step 0: previewsmcp kill-daemon || true — hermetic reset of the unix-socket daemon. Strictly speaking the skill currently only uses MCP tools, so this is belt-and-suspenders, but it prevents surprises if any future step shells out to a CLI subcommand.
  • After step 2 (swift build): explicit instruction to /exit and relaunch Claude Code so the stdio MCP server respawns from the new binary. Without this, every subsequent step is testing the wrong code.
  • Loud failure mode: add a build-version check — see (2).

2. Build-hash MCP tool (recommended)

Expose a tiny MCP tool, e.g. preview_build_info, that returns the binary's build identity (commit SHA + dirty bit, baked in at build time via GenerateVersionTool or equivalent). The skill calls this as the first MCP call after swift build and compares against git rev-parse HEAD. If they differ, the skill aborts with a clear message:

Stdio MCP server is running build abc1234 but repo HEAD is def5678. Restart Claude Code so it respawns the server from the new binary, then re-run.

This makes the staleness impossible to miss and self-diagnosing.

3. Document the two-process model

Add a short section to AGENTS.md covering:

  • The stdio server (used by Claude Code / Cursor via .mcp.json) and the unix-socket daemon (used by CLI subcommands) are separate processes with separate session state.
  • A session created via MCP tools is not visible to CLI subcommands like list / snapshot --session-id, and vice versa.
  • For integration testing or debugging, restart Claude Code to refresh the stdio server; kill-daemon to refresh the unix-socket daemon. Both, if both paths are in play.

4. (Stretch) Unify the two backends

Longer-term: have the stdio serve mode forward to the unix-socket daemon instead of running its own self-contained engine. Then there's one source of truth for sessions, one binary to keep current, and #142's version-handshake fix covers everything. Bigger lift; out of scope for this issue but worth flagging as the durable answer.

Suggested scope

  • PR 1: skill edits in (1). Trivial, ships the safety today.
  • PR 2: preview_build_info tool + skill check in (2).
  • PR 3: doc update in (3).
  • (4) tracked separately if/when the team wants to tackle it.

🤖 Filed from a Claude Code session after noticing that the integration-test skill silently exercises whatever binary was current at session start, regardless of intervening swift builds.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions