integration-test skill exercises stale binary; stdio MCP server and daemon are split-brain

## Problem

The `integration-test` skill ([.claude/skills/integration-test/SKILL.md](.claude/skills/integration-test/SKILL.md)) does `swift build` in step 2 and then validates behavior via `mcp__previewsmcp__*` tool calls. But those calls hit the **stdio MCP server that Claude Code spawned at session start** — not the freshly built binary. The build step is essentially decorative from the test's perspective: any code change made during the session is invisible to the tools the skill is using to validate it.

Mechanism:
- `.mcp.json` declares `command: ".build/debug/previewsmcp"` with `args: ["serve"]`.
- Claude Code spawns that process **once**, at session start, and keeps it resident for the whole session.
- `swift build` overwrites the on-disk binary, but the resident process keeps running the old code.
- All `mcp__previewsmcp__*` calls go to the resident process.

So a contributor can edit code, run the skill, see green, and ship a regression — the test never exercised the change.

## Architectural footgun (broader than the skill)

PreviewsMCP has **two completely separate server processes** with **independent session state**:

1. **Stdio MCP server** — `previewsmcp serve` (no flag). Self-contained: own `PreviewHost`, `IOSSessionManager`, `ConfigCache` (`Sources/PreviewsCLI/ServeCommand.swift:57-73`). This is what `.mcp.json` spawns.
2. **Unix-socket daemon** — `previewsmcp serve --daemon` at `~/.previewsmcp/serve.sock`. Auto-spawned by every CLI subcommand other than `serve`. Persists across Claude Code restarts.

A session started via the MCP tools lives in process (1) and is invisible to `previewsmcp list`, `previewsmcp snapshot --session-id …`, etc. (which talk to process (2)). A contributor debugging an MCP-tool issue with the CLI will see an empty session list and waste time figuring out why.

This split-brain isn't documented in `AGENTS.md` beyond the bullet that says CLI subcommands talk to a daemon. The implication for testing and debugging is non-obvious.

## Relationship to #142

#142 covers CLI-vs-daemon version mismatch on upgrade (Homebrew/source). This issue is adjacent but distinct: it's about the **stdio MCP server** going stale within a single Claude Code session, and about the two-process architecture being a footgun beyond just upgrades. A version-handshake fix shaped like the one proposed in #142 would not catch this case, because the stdio server has no peer to handshake with — the staleness is purely between the resident process and the on-disk binary.

## Proposed fixes

### 1. Skill changes (small, immediate)

Update `.claude/skills/integration-test/SKILL.md`:

- **Step 0:** `previewsmcp kill-daemon || true` — hermetic reset of the unix-socket daemon. Strictly speaking the skill currently only uses MCP tools, so this is belt-and-suspenders, but it prevents surprises if any future step shells out to a CLI subcommand.
- **After step 2 (`swift build`):** explicit instruction to `/exit` and relaunch Claude Code so the stdio MCP server respawns from the new binary. Without this, every subsequent step is testing the wrong code.
- **Loud failure mode:** add a build-version check — see (2).

### 2. Build-hash MCP tool (recommended)

Expose a tiny MCP tool, e.g. `preview_build_info`, that returns the binary's build identity (commit SHA + dirty bit, baked in at build time via `GenerateVersionTool` or equivalent). The skill calls this as the *first* MCP call after `swift build` and compares against `git rev-parse HEAD`. If they differ, the skill aborts with a clear message:

> Stdio MCP server is running build `abc1234` but repo HEAD is `def5678`. Restart Claude Code so it respawns the server from the new binary, then re-run.

This makes the staleness impossible to miss and self-diagnosing.

### 3. Document the two-process model

Add a short section to `AGENTS.md` covering:

- The stdio server (used by Claude Code / Cursor via `.mcp.json`) and the unix-socket daemon (used by CLI subcommands) are separate processes with separate session state.
- A session created via MCP tools is **not** visible to CLI subcommands like `list` / `snapshot --session-id`, and vice versa.
- For integration testing or debugging, restart Claude Code to refresh the stdio server; `kill-daemon` to refresh the unix-socket daemon. Both, if both paths are in play.

### 4. (Stretch) Unify the two backends

Longer-term: have the stdio `serve` mode forward to the unix-socket daemon instead of running its own self-contained engine. Then there's one source of truth for sessions, one binary to keep current, and #142's version-handshake fix covers everything. Bigger lift; out of scope for this issue but worth flagging as the durable answer.

## Suggested scope

- PR 1: skill edits in (1). Trivial, ships the safety today.
- PR 2: `preview_build_info` tool + skill check in (2).
- PR 3: doc update in (3).
- (4) tracked separately if/when the team wants to tackle it.

🤖 Filed from a Claude Code session after noticing that the integration-test skill silently exercises whatever binary was current at session start, regardless of intervening `swift build`s.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integration-test skill exercises stale binary; stdio MCP server and daemon are split-brain #147

Problem

Architectural footgun (broader than the skill)

Relationship to #142

Proposed fixes

1. Skill changes (small, immediate)

2. Build-hash MCP tool (recommended)

3. Document the two-process model

4. (Stretch) Unify the two backends

Suggested scope

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

integration-test skill exercises stale binary; stdio MCP server and daemon are split-brain #147

Description

Problem

Architectural footgun (broader than the skill)

Relationship to #142

Proposed fixes

1. Skill changes (small, immediate)

2. Build-hash MCP tool (recommended)

3. Document the two-process model

4. (Stretch) Unify the two backends

Suggested scope

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions