Skip to content

docs: design spec for MCP support#4277

Open
balegas wants to merge 5 commits intomainfrom
balegas/mcp-support
Open

docs: design spec for MCP support#4277
balegas wants to merge 5 commits intomainfrom
balegas/mcp-support

Conversation

@balegas
Copy link
Copy Markdown
Contributor

@balegas balegas commented May 5, 2026

Summary

Adds an internal design spec (PRD) for MCP support in Electric Agents. Documentation only — no runtime or package changes.

Reviewer guidance

What this is

A new design spec at docs/superpowers/specs/2026-05-05-mcp-support-design.md capturing requirements, scope, and architecture for MCP support. Distinct from the 2026-04-24 design and implementation on #4165 — this is a fresh design pass focused on aligning scope before iterating on implementation.

Approach

Standard internal PRD covering goals, non-goals (with rationale), three software-factory user stories (incident response, coding agent, continuous knowledge), architecture (Registry / Vault / OAuth Coordinator / Bridge), credential model (apiKey / clientCredentials / authorizationCode with browser and device flows), SDK shape (mcp.json + code escape hatch + per-agent allowlist + KeyVault interface), Connected Services UI, MCP spec conformance (OAuth 2.1, PKCE, DCR, RFC 8628), failure handling, phased rollout, and a prior-art appendix from a research scan of ten popular coding agents.

Key invariants (deliberate scope decisions)

  • App-scoped credentials only in v1; no user identity layer.
  • Auth failures resolve as structured errors to the agent's model — no durable pause-on-reauth in v1.
  • Tool calls are synchronous within the wake; no wake-level suspend in v1. Per-call timeouts prevent a misbehaving server from hanging a wake.
  • Two transports: stdio (runtime subprocess) and Streamable HTTP. Legacy SSE and WebSocket are out.
  • Two retained differentiators: Connected Services catalog (no popular agent has one — see Claude Code #30272, #18442) and runtime-owned credentials with serialized refresh (avoids the Claude Code #24317 single-use-refresh-token race).

Trade-offs

  • Pause-on-reauth. Considered, rejected: complicates the runtime and risks agents blocked on humans who don't return.
  • Wake-level suspend for long-running calls. Considered, deferred: realistic v1 MCP servers (Sentry, Honeycomb, GitHub, Linear, Notion, codebase stdio) all return in seconds; the durable runtime makes adding suspend cheap when a real long-running MCP server appears.
  • Declaration surface. Chose mcp.json (committed, file-watched) + code escape hatch over either alone, mirroring Claude Code's .mcp.json / ~/.claude.json split.
  • Tool naming. Chose always-prefixed names (sentry.search) for stable identifiers over terseness.

Non-goals

Pause-on-reauth, wake-level suspend, user identity, spawn-scoped credentials, active background token refresher, in-process resource limits for stdio.

Test plan

  • Read the spec end-to-end.
  • Confirm the user stories describe workflows we want to build.
  • Confirm the non-goals' rationale stands up.
  • Confirm the rollout phases are deliverable.

Files changed

  • docs/superpowers/specs/2026-05-05-mcp-support-design.md — new spec (~315 lines).

🤖 Generated with Claude Code

balegas and others added 3 commits May 6, 2026 00:05
Standard internal PRD covering MCP transport (stdio + Streamable HTTP),
key vault, OAuth flows including device-code, durable pause-on-reauth,
Connected Services UI, hot-reload semantics, and a phased rollout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Split runtime failure behavior from project-level risks. Adds explicit
sections for transport/process, authentication (pause-on-reauth),
tool-call results, vault, durable pause/resume, hot-reload, security,
and concurrency. Establishes the rule: only auth failures pause; other
failures return errors to the model.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reframe user stories around software-factory patterns (incident response,
coding agent, continuous knowledge) instead of OAuth mechanics. Move
pause-on-reauth and wake-level suspend to non-goals: auth failures resolve
as structured errors the model handles; tool calls are synchronous with
per-call timeouts. Simplify failure handling, collapse rollout to four
phases, and drop the differentiators framing from the summary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@KyleAMathews KyleAMathews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have some questions but in general, looks good & I'm excited to see a prototype!


### Per-call timeouts

Every MCP tool call has a timeout (default 30s, overridable per server in `mcp.json`). When exceeded, the bridge cancels the call (JSON-RPC cancellation for stdio servers; HTTP request abort for HTTP servers) and resolves it with a `timeout` error result. The agent's model decides what to do.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

30 seconds seems low as a default — I'd start much higher

Copy link
Copy Markdown
Contributor Author

@balegas balegas May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I thought the same, but my second thought was that this might be okay for time for first tokenbyte, so I was going to revisit the number after testing it.

## Non-goals (v1)

- **Wake-level suspend for long-running tool calls.** Tool calls are synchronous within the wake; if a call exceeds the per-call timeout it fails with `timeout`. The MCP servers in the v1 use cases (Sentry, Honeycomb, GitHub, Linear, Notion, internal docs, codebase stdio servers) all return in seconds. Genuinely long-running MCP servers (CI orchestrators, deep-research, LLM-wrapping servers) are rare/emerging; we'll add wake suspension when a concrete use case shows up. The durable runtime makes it cheap to add later.
- **User identity / per-user credentials.** Electric Agents has no user record today. Credentials are app-scoped: one set per registered server.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you'd just make a bot user account or something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, for remote environments I think you'd use bots/service tokens. In local deployment, you use your own creds.

Once we have user support we can extend the model further.

|---|---|
| Refresh-token race across wakes | Per-`(server, scope)` mutex around the refresh exchange; runtime owns the credential. |
| Vault file leaks if file permissions wrong | Default implementation enforces `chmod 600`; refuses to read wider modes; encryption-at-rest where OS keychain is available. |
| Hot-reload causes user confusion when tool list changes mid-wake | Manifest snapshot at compose time records what the agent saw; catalog shows live truth. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "compose time"?


## Connected Services UI

A new page in agents-server-ui listing all registered servers. Each row shows:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lots of UI examples of course to copy


| Transport | Where runs | Notes |
|---|---|---|
| **stdio** | Subprocess of the agents-server runtime, on the runtime host | Lazy-spawned on first tool call; one process per server; multiplexed via JSON-RPC `id`; restarted on crash |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's MCP code generic? Like nothing needs installed in an agents-server? An agents-server can be running lots of agent spaces — my assumption has been the agents server is just a REST API. If it's running suprocesses, etc. that complicates it quite a bit. I think I'd prefer server MCP servers to run elsewhere.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, users have to deploy their servers somewhere or call remote serve servers.

stdio would be useful for local environments.

balegas and others added 2 commits May 6, 2026 08:26
Greenfield TDD plan covering all four phases of the spec: registry+bridge
with apiKey auth, OAuth (clientCredentials + authorizationCode browser),
RFC 9728 discovery + RFC 7591 DCR, Connected Services UI, and device-code
flow. Per-task TDD with verification commands and commit cadence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds tasks 15a-15i: a hermetic mock MCP server fixture (stdio + HTTP modes
with scenarios), resources/prompts bridges exposed as agent tools, progress
notification passthrough, cancellation, capability negotiation checks, and
E2E suites against both transports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants