Skip to content

Add MCP support to Electric Agents#4289

Merged
balegas merged 95 commits into
mainfrom
balegas/mcp-impl-v2
May 8, 2026
Merged

Add MCP support to Electric Agents#4289
balegas merged 95 commits into
mainfrom
balegas/mcp-impl-v2

Conversation

@balegas
Copy link
Copy Markdown
Contributor

@balegas balegas commented May 7, 2026

Summary

Adds Model Context Protocol (MCP) support to Electric Agents so agents can call tools and read data from external MCP servers — both locally-spawned stdio servers and remote Streamable HTTP servers, with credentials managed by the runtime. Targets the Electron desktop app as the v1 host; the registry and bridges are embedder-agnostic for future deployed-runtime use.

For the architectural overview, see the spec at docs/superpowers/specs/2026-05-05-mcp-support-design.md. User-facing documentation lives under website/docs/agents/usage/mcp-servers.md and website/docs/agents/reference/mcp-{registry,server-config}.md.

How apps use MCP

Registry is the primary API. mcp.json and the desktop's settings.json mcp.servers block are file-based convenience layers that the runtime turns into the same Registry.applyConfig() calls.

Programmatic — Registry.addServer() / applyConfig()

BuiltinAgentsServer.mcpRegistry exposes the registry. Agent hosts add servers from code wherever it's the right shape — at boot from their own config source, in response to user actions, or per-session for tools an agent should only see during a specific task:

import { BuiltinAgentsServer } from "@electric-ax/agents"

const server = new BuiltinAgentsServer({
  agentServerUrl: "http://localhost:4437",
  port: 4448,
  workingDirectory: process.cwd(),
})

await server.start()

await server.mcpRegistry?.addServer({
  name: "stripe",
  transport: "http",
  url: "https://mcp.stripe.com/mcp",
  auth: {
    mode: "apiKey",
    headerName: "Authorization",
    key: process.env.STRIPE_MCP_KEY!,
  },
})

addServer returns a discriminated AddServerResult (ready / authenticating / error). applyConfig(cfg) is the bulk version — idempotent on unchanged entries, removes anything not in the supplied config; this is what file-based layers compile down to. subscribe(handler) exposes the live state stream, and reauthorize / disable / enable / removeServer cover single-server lifecycle.

File-based — mcp.json

For static, project-scoped configuration the runtime auto-loads <workingDirectory>/mcp.json on boot, watches it for changes, and hot-reloads adds, removes, and reconfigurations through applyConfig. mcp.json carries structural shape only — no secrets:

{
  "servers": {
    "honeycomb": {
      "transport": "http",
      "url": "https://mcp.honeycomb.io/mcp",
      "auth": { "mode": "authorizationCode", "scopes": ["mcp:read"] }
    },
    "git-local": {
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-git", "--repository", "${workspaceRoot}"]
    }
  }
}

For authorizationCode servers loaded from mcp.json, the runtime auto-wires keychainPersistence so OAuth tokens survive process restarts via the OS keychain.

Desktop settings layer

The Electron desktop adds a global mcp.servers block in its settings.json, applied to every workspace and shaped exactly like mcp.json (keyed by server name) so entries copy-paste between the two files. It composes with the workspace mcp.json instead of replacing it: non-conflicting servers from both files load together; on a name collision, the workspace mcp.json wins. Programmatic embedders (other than the desktop) pass the resolved array via BuiltinAgentsServer({ extraMcpServers }) — that's the in-memory shape settings.json is rewritten into when the desktop loads it.

Per-agent allowlist

Entity definitions opt into MCP servers explicitly via mcp.tools() from @electric-ax/agents-runtime:

ctx.useAgent({
  systemPrompt: "...",
  tools: [
    ...ctx.electricTools,
    ...mcp.tools(["sentry", "github"]), // explicit list
    // or: ...mcp.tools("*")              // every registered server
  ],
})

Tools are exposed to the model with always-prefixed names: mcp__<server>__<tool>. Built-in horton and worker opt in to all registered servers via mcp.tools("*").

Test plan

  • pnpm --filter @electric-ax/agents-mcp test — 82 tests (subscribe + reauthorize coverage, stdio + HTTP e2e against @modelcontextprotocol/server-everything).
  • pnpm --filter @electric-ax/agents test — 58 tests (covers the extraMcpServers + workspace-mcp.json merge in bootstrap-mcp.test.ts).
  • pnpm --filter @electric-ax/agents-mcp typecheck; same for agents, agents-desktop, agents-server-ui.
  • Smoke the desktop app: pnpm --filter @electric-ax/agents-desktop dev, configure an mcp.json (or settings.json mcp.servers) with an authorizationCode server, click Authorize → BrowserWindow opens → sign in → page flips to ready. Verify Disable + Enable round-trips without losing tokens.
  • Verify the Settings → MCP Servers entry is hidden in the web build (no electronAPI).
  • Confirm mcp.json hot-reload: edit the file, server is added/removed/reconfigured without a restart; in-flight tool calls finish on the old config.

Notes for reviewers

  • Spec is the source of truth, not the early commit messages. docs/superpowers/specs/2026-05-05-mcp-support-design.md reflects the shipped architecture (desktop-IPC + operator-owned persistence). An earlier HTTP-mounted runtime + public CredentialStore was prototyped on this branch and removed; that history lives in git but isn't load-bearing for review.
  • Public surface is in @electric-ax/agents-mcp — registry interface, transports, persistence helpers, bridges. @electric-ax/agents re-exports the types embedders need (McpRegistry, RegistrySnapshot, McpServerConfig, etc.).
  • mcp.json resolution is relative to workingDirectory, not process.cwd(). This matters for the desktop, where the chosen workspace folder is the natural place to look. Headless embedders that don't pass a workingDirectory fall back to process.cwd().
  • macOS keychain known limitation: keychainPersistence shells out to /usr/bin/security add-generic-password -w <value>, which puts the secret in argv (visible briefly via ps). Same posture as Claude Code's keychain integration. Documented inline in packages/agents-mcp/src/persistence/keychain.ts. The hardening path is a Rust sidecar over stdin using the keyring crate; not in scope here.
  • Settings.json plaintext. API keys and the new mcp.servers block are stored unencrypted in <userData>/settings.json (mode 0644). Same posture as Claude Code's Linux fallback. Worth flagging if you want to push this through safeStorage.encryptString later — out of scope for this PR.

🤖 Generated with Claude Code

@netlify
Copy link
Copy Markdown

netlify Bot commented May 7, 2026

Deploy Preview for electric-next ready!

Name Link
🔨 Latest commit 4f996e5
🔍 Latest deploy log https://app.netlify.com/projects/electric-next/deploys/69fe07e42471f300084a8b64
😎 Deploy Preview https://deploy-preview-4289--electric-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

balegas and others added 29 commits May 7, 2026 22:51
…l design)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tions + apiKey)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ancel/caps) + E2E

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…horizationCode)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…olling)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… scope notes

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…, consolidated interfaces, T41 ordering

- Add Task 16b: mcp.tools(allowlist|'*') sentinel factory (spec §"Per-agent allowlist")
- Update Task 17 to expand sentinels at wake-time and tag BridgedTool with .server
- Extend Task 33 to wire authorize + disable + enable (was 501 forever)
- Add consolidated RegistryOpts/Entry/ListedEntry/Registry block at top of Task 31
- Enumerate transportFactoryOverride callsite updates in Task 31 step 6
- Reorder Task 41 steps so deviceFlowFetch opt is added before the test that uses it

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ntion)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lt + idempotency

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…gent allowlists

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…mposition

Implements Task 17: registerToolProvider process-global registry in agents-runtime,
sentinel-aware flatMap in runAgent (isMcpToolsSentinel + filterByAllowlist), and
BridgedTool.server field so provider tools carry their originating server name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… composeToolsWithProviders helper

Extracts the inline sentinel-expansion flatMap from context-factory.ts into a
new exported composeToolsWithProviders helper in tool-providers.ts, then adds
three focused integration tests covering explicit allowlist, wildcard, and
no-sentinel cases to satisfy the Task 17 plan requirements (lines 2584–2589).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ntimes (experimental)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
balegas and others added 6 commits May 8, 2026 12:09
- Drop the dead `keytar` peer-dependency block. The keychain backend
  shells out to /usr/bin/security (macOS) and secret-tool (Linux);
  keytar was removed during the persistence-helper refactor and the
  declaration has been misleading anyone who reads peer-deps. Lockfile
  loses the keytar entry as a result.
- Add description, keywords, repository (with monorepo `directory`),
  and license fields to match @electric-ax/agents-runtime + agents.
  These show up on the package's npm landing page.

The package was already publish-ready (no `private: true`, scoped
name, `files: ["dist"]`, exports block); this fills in the metadata
gaps so the next ci:publish:npm run produces a clean release.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the new @electric-ax/agents-mcp package (minor) plus the
agents / agents-desktop / agents-server-ui changes that wire it
through (minor) and the small agents-runtime / agents-server touches
(patch). Required by the Check Changeset CI job.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1) packages/agents-mcp/vitest.config.ts: add the JUnit reporter +
   outputFile so codecov's test-results upload finds
   junit/test-report.junit.xml. Without it the action's
   `fail_ci_if_error: true` failed the job after the test run had
   succeeded.

2) packages/agents/test/bootstrap-mcp.test.ts: stand up a tiny
   no-op http.createServer in beforeEach and feed its URL through
   `agentServerUrl`. BuiltinAgentsServer.start() POSTs entity-type
   registration to the agents-server during bootstrap; CI has
   nothing on :4437, so all four merge tests were ECONNREFUSED-ing.
   Locally the tests had been passing because we have a dev
   agents-server running.

Both fixes verified locally (agents: 58/58, agents-mcp: 82/82,
JUnit XML now produced for both).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sitions

Two changes that together kill the visible flicker when an entry
transitions through connecting / authenticating / ready / error:

1) Always render an info line under each row instead of conditionally
   mounting the chevron toggle. Content swaps with status:
     - error    → existing inline `kind: message`
     - connecting → "Loading tools…"
     - authenticating → "Sign in to load tools"
     - disabled → "Disabled — click Enable to resume"
     - ready, tools.length > 0 → existing chevron toggle
     - ready, no tools → "No tools advertised"
   The expanded tool list still shows below when the user toggles.

2) Always render the action buttons in the same slot, toggling
   `disabled` rather than mounting/unmounting:
     - Authorize: rendered only for authorizationCode servers,
       enabled only when status is `authenticating` (kept dimmed
       outside of that state so the row width doesn't jump).
     - Reconnect: always rendered, disabled while connecting,
       authenticating, or disabled.
     - Enable/Disable: still swap on the disabled flag, with Disable
       dimmed during the in-flight close transition.

The user-visible result is that reauth, hot-reload, and toggle
sequences keep the row at a constant height + width.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two `entries.delete(name) + entries.set(name, ...)` patterns in
addServer (existing-entry branch) and enable were moving rows to
the end of the Map's iteration order on every reconfigure /
hot-reload / Disable+Enable cycle. The Settings → MCP Servers page
renders snapshots in `list()` order, so the user saw rows visibly
reorder themselves after any state-changing action.

JS Maps preserve insertion order under `set` for an existing key,
so dropping the leading delete is enough — the entry stays put.
removeServer's delete is unchanged (legitimately remove).

New regression test in registry-subscribe.test.ts: register three
servers, reconfigure the middle one, then disable + enable it.
Order must still be [alpha, beta, gamma].

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tates

When an OAuth server's refresh-token exchange fails (revoked,
expired, client mismatch, etc.) the SDK throws without producing a
fresh authorize URL, so the registry lands in `error` state with
the stale tokens still cached. Reconnect would just retry against
the same broken refresh; the only escape is a fresh OAuth flow,
which is exactly what Authorize triggers via `Registry.reauthorize`
(closes transport → clears tokens + DCR client → rebuilds → SDK
produces a new authorize URL).

Authorize is now enabled when status is `authenticating` OR `error`
(for `authorizationCode` servers). Other states still keep it
dimmed so the row layout stays stable.

Note: auto-recovering from `invalid_grant` in the registry itself
would save a click but adds loop / down-server edge cases; manual
Authorize keeps the user in control and surfaces the error message
so they understand why reauth is needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@samwillis samwillis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed with GPT-5.5 in an interactive session with me.

Overall, I’m happy with the architecture and design split here. The main tradeoff looks right to me: agent definitions declare the MCP capabilities they want, while the runner owns the concrete MCP server connections, config, auth, and environment-specific implementation. That keeps agent definitions portable across runners and lets different embedders provide different MCP implementations.

I also like that agents-mcp is a fairly distinct package. Keeping the core agents-runtime free of a direct dependency on MCP/SDK transport code feels like the right boundary, especially given browser/UI bundling concerns. The sentinel/tool-provider approach seems like a reasonable composition point.

A few things I think we should tighten before merging, or at least track clearly:

  1. mcp.json should be explicit opt-in for BuiltinAgentsServer.

    Right now the built-in runner automatically looks for <workingDirectory>/mcp.json and applies it if present. Since stdio MCP servers can spawn local commands, I think project-scoped mcp.json loading should require an explicit flag or explicit path from the embedder/runner. I’m happy with supporting project config, but I don’t think simply selecting/running in a working directory should automatically execute MCP server config from that directory.

  2. BuiltinAgentsServer.stop() should tear down MCP resources.

    The server creates an MCP registry, starts MCP transports, starts a config watcher, and registers the process-global mcp tool provider. On stop, it currently aborts wakes/closes HTTP/nulls refs, but I don’t think it closes MCP transports, disposes the watcher, or unregisters the tool provider. This could leak stdio child processes/watchers and leave stale provider state across desktop runtime restarts.

  3. /api/runtimes appears to drop earlier entity types for a runtime.

    RuntimeRegistry.register() replaces by runtime name, while entity-type registration calls it once per type with types: [normalized.name]. Since the built-in runtime registers horton and worker separately under the same runtime name, the last registration can overwrite the earlier type. We should merge/accumulate types for a runtime or register the full type set in one go.

  4. MCP config application errors should be surfaced/caught more reliably.

    In BuiltinAgentsServer, applyMerged fire-and-forgets mcpRegistry.applyConfig(...) inside a .then, so failures from applyConfig won’t be caught by the surrounding catch. That can produce unhandled rejections and leave the UI/embedder without a useful structured error state.

  5. Missing requested MCP servers are currently silent.

    If an agent declares mcp.tools(['github']) but the runner has no ready github server, the sentinel expands to no tools and the agent runs under-equipped. I don’t think we need a full “required MCP dependency” model right now, but a warning for named unavailable servers would make this much easier to debug.

  6. Minor follow-ups:

    • hashConfig() should probably include timeoutMs, otherwise timeout-only edits may not reconnect/reconfigure.
    • I’m not completely sold on mcp.tools('*') as the public syntax for “all MCP tools”; mcp.tools() or another default might read better. Not blocking.
    • Longer term, it would be nice to expose a helper for custom runners so embedders don’t have to copy the for (const entry of mcpRegistry.list()) ... bridgeMcpTool(...) loop.

None of the above changes my view that the main architecture is sound. I’m positive on the package split and on the decision for the runner, not the agent definition, to own MCP server connections/config/auth.

balegas and others added 9 commits May 8, 2026 14:12
Embedders that vary the runtime port across sessions (the desktop
listens on port 0 → OS-assigned) ended up with a persisted DCR
client whose redirect_uris recorded an old port. Re-seeding from
keychain on the next session pointed the SDK at a client the auth
server only knew under a different URI, so the next refresh /
authorize / token-exchange came back as `invalid_grant`. Users
saw the row stuck in `error` after every restart.

addServer now compares the cached client's redirectUris against
the redirect_uri it would build right now (`publicUrl + /oauth/
callback/<server>`, mirroring sdk-provider.ts). When they don't
match — including the legacy case where redirectUris isn't
recorded — we skip seeding both the client and the tokens. The
connect that follows triggers fresh DCR + an authorize URL, the
operator's onClientRegistered / onTokensChanged hooks overwrite
the stale keychain entries, and subsequent silent refresh works.

Inline operator tokens still seed when there's no cached client
to compare against (the operator owns reset semantics in that
path).

3 new tests in registry-oauth.test.ts cover: matching URI seeds
normally, mismatched URI drops both, and the legacy
no-redirectUris-recorded case is conservatively cleared.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reactive blackboard architecture for shepherding GitHub PRs through
gates (template, CI, conflicts, threads, docs) using independent
observer agents that wake on signals. Phase 1 polls; phase 2 will
swap in webhooks without changing the observer contracts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-authorize replaces Authorize once the server is ready (still the
full OAuth code flow — refresh-token rotation stays transparent inside
the SDK). When a server is disabled, hide Authorize/Reconnect/Disable
entirely and surface Enable as the sole action — they were permanently
dead in that state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the dedicated pr-observer entity with the existing generic
worker entity. Each role becomes a skill under skills/pr/ that the
worker loads via use_skill on spawn. pr-manager owns the only
persistent subscription to the signals collection and dispatches a
fresh worker per signal+role pair, with iteration counters persisted
in agent_state so caps work across spawns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Consolidate seven roles into three long-lived worker entities
(pr-reviewer, pr-build-doctor, pr-doc-editor) plus pr-manager that
absorbs the four mechanical roles (sync, description, gates,
lifecycle). All five entities are hybrid: small TS shell wires
subscriptions/tools/prelude, agent reasoning lives in a markdown
skill at packages/agents/skills/pr/<role>.md. Each entity has its
own persistent timeline so it can reason about prior work on the PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Drop historical references to prior design drafts.
- Add per-watcher state schema (managed_prs ledger).
- Clarify reviewer skill decision tree to handle new_human_comment
  and continue_granted signals correctly (review pass and address
  pass decided independently).
- Document signal payload shapes (head_sha_changed, ci_failed,
  new_human_comment, commits_pushed, human_input_required,
  continue_granted).
- Fix role-naming convention: entity = pr-<role>, role name in
  state/payloads/slash-commands = short form.
- Detect human-authored pushes by checking head_sha against the
  agent-authored commits table.
- Consistency: 'observers' -> 'workers'; 'emit/write a signal' ->
  'insert' where appropriate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- §15 Templates defines the six concrete artifacts agents produce or
  consume: gate computation rules, PR description template (default
  + repo override path), review-thread comment, status comment,
  commit message, thread reply, slash-command grammar.
- Spell out what ready_to_merge does: apply 'agents:ready' label,
  update status comment. No auto-merge, no LGTM, no draft removal.
- Add 2s wake debounce to the manager so chatty PRs don't burn
  through one agent run per signal.
- Add mergeable + status_comment_id to pr_meta schema since they're
  now referenced.
- Drop the now-defined items from §14 Items deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#1 — `mcp.json` is now explicit opt-in. New `loadProjectMcpConfig`
option (boolean | string) on `BuiltinAgentsServer`; defaults to off
because stdio servers spawn local commands. The Electron desktop and
the `electric-ax` CLI opt in; library embedders get the safe default.

#2 — `stop()` tears down MCP resources. Added `Registry.close()`
(closes every transport, forgets auth state, emits a final empty
snapshot). `BuiltinAgentsServer.stop()` now also disposes the
`mcp.json` chokidar watcher and unregisters the `mcp` tool provider.

#3 — `RuntimeRegistry.register()` accumulates types per runtime
instead of last-write-wins, fixing `/api/runtimes` losing earlier
types when entity-type registration POSTs land in parallel.

#4 — `applyMerged` is async/await so `mcpRegistry.applyConfig`
rejections actually reach the catch (previously voided inside a
`.then`, escaping as unhandled rejections). Optional
`onConfigError` callback exposed for embedders.

#5 — `composeToolsWithProviders` warns when a named MCP server in
`mcp.tools(['x'])` is unavailable (unknown or not yet ready). Wildcard
sentinels stay silent; missing names dedupe within a single call.

#6a — `hashConfig()` includes `timeoutMs` so timeout-only edits no
longer skip the reconfigure path and leave the entry's stale.

#6b — `mcp.tools()` (no arg) is the canonical "every registered
server" form. `mcp.tools('*')` kept for back-compat. Built-ins
`horton`/`worker` and the docs use the no-arg form.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The primary goal is exercising the Electric Agents framework on a
real workload to find platform bugs, not building a fully autonomous
PR-shipping system. Spell out the deliberate limits up front:
no coding agent, one repo per watcher, no webhooks, no automerge.

Also drop redundant non-goals now covered by §1's limits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@balegas
Copy link
Copy Markdown
Contributor Author

balegas commented May 8, 2026

Thanks for the thorough review @samwillis. All six points addressed in c71b96a:

1. mcp.json is now explicit opt-in. New loadProjectMcpConfig?: boolean | string option on BuiltinAgentsServer, defaulting to off. The Electron desktop and the electric-ax CLI opt in (their use cases assume project scope); library embedders that construct BuiltinAgentsServer directly get the safer default. true reads <workingDirectory>/mcp.json; a string reads that exact path.

2. stop() tears down MCP resources. Added Registry.close() (closes every transport in parallel, forgets auth state, emits one final empty snapshot). BuiltinAgentsServer.stop() now also disposes the chokidar watcher (the closer was previously dropped on the floor by .catch(...)) and unregisters the mcp tool provider. Order is watcher → tool provider → registry → bootstrap → HTTP, so nothing fires against torn-down state during shutdown.

3. RuntimeRegistry.register() accumulates types. Producer-side stays as-is — entity-type POSTs run with concurrency, and any ordering-dependent fix would still race. The registry now merges types for an existing runtime (deduped, first-seen order) while letting the latest publicUrl win so a port-changed restart still updates correctly. The previous test that documented "replaces on re-registration" has been flipped.

4. applyMerged failures are now caught. Refactored to async/await so mcpRegistry.applyConfig rejections actually reach the catch (they were previously voided inside a .then callback, escaping as unhandled-rejection warnings while looking handled). Also exposed an opt-in onConfigError?: (error: unknown) => void callback for embedders to surface failures programmatically — the desktop can hook this if/when we want banner UI.

5. Missing requested MCP servers now warn. composeToolsWithProviders emits a single runtimeLog.warn per call listing any non-'*' allowlist names that aren't currently available (unknown or not yet ready). Wildcard sentinels stay silent; missing names dedupe within a call. Per-wake warning, self-correcting once config is fixed or servers come online.

6.

  • a. hashConfig() now includes timeoutMs so timeout-only edits force a reconfigure instead of leaving entry.config.timeoutMs stale (the bridges read it per-call).
  • b. mcp.tools() (no arg) is the canonical "every registered server" form. mcp.tools('*') kept as a back-compat alias. Built-ins horton/worker and the docs use the no-arg form.
  • c. Helper for custom runners — deferred to a follow-up.

Test deltas:

  • agents-mcp: +2 (Registry.close() happy path & error swallow, timeoutMs reconfigure, mcp.tools() default)
  • agents-runtime: +5 (missing-server warnings)
  • agents: +3 (bootstrap-mcp.test.ts — default-off ignores mcp.json, full start/stop/start round-trip, explicit-path opt-in)
  • agents-server: +3 (runtime-registry merge, dedupe, port-change cases)

balegas and others added 9 commits May 8, 2026 16:10
The runtime listens on `port: 0`, so its publicUrl-derived redirect
URI changes every launch. The stale-DCR-client detector then sees
the cached `redirectUris` no longer match the current redirect URI
and drops the cached client AND tokens — forcing the user to
re-authorize on every desktop launch.

Decouple the redirect URI from the listening port: new
`mcpOAuthRedirectBase` option on `BuiltinAgentsServer`, set by the
desktop to a stable `http://localhost`. The BrowserWindow that hosts
the OAuth flow intercepts the URI by prefix — nothing real listens
there — so the URI just needs to be consistent across launches and
between DCR registration and the interceptor.

Existing keychain entries with the old port-bearing redirect URI
will trip the stale detector once more and re-authorize on next
launch; from then on it stays stable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch from `http://localhost` to `http://127.0.0.1:53117` — the
RFC 8252 §7.3 recommended loopback-literal shape. Avoids edge cases
where stricter DCR validators reject hostname-form or no-port URIs.
Nothing listens at the port; the BrowserWindow still intercepts the
redirect before any network resolution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three races on the boundary between start, watcher reload, and stop:

1. `watchConfig` did its own eager reload, which raced with the
   caller's explicit initial load: two `applyConfig` calls landed
   on a fresh registry simultaneously, triggering close-and-reopen
   mid-handshake. Drop the eager reload — callers own the initial
   apply, the watcher only fires on subsequent file changes.

2. The watcher closer was captured via `.then(...)`, so a fast
   `start()` → `stop()` could miss it: the watcher kept firing,
   `applyConfig` ran against a torn-down registry, stdio children
   leaked. Await the watcher setup so the closer is always held
   before `start()` returns.

3. `stop()` cleared the registry while a fire-and-forget
   `applyMerged` was mid-flight (slow keychain reads, biometric
   prompts). Track in-flight applies, latch a `mcpStopping` flag
   that short-circuits new ones, and drain the set during
   teardown before closing the registry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop back-compat affordances and historical narration in comments;
the feature is shipping fresh and shouldn't carry incremental-edit
artifacts:

- `mcp.tools('*')` removed in favour of the no-arg form. Sentinel
  type narrowed to `string[] | undefined`.
- Stale-DCR redirect-URI detection removed. The redirect base is
  now stable per `mcpOAuthRedirectBase`, so the safety net (and
  its `redirectUriFor` helper + tests) was guarding a condition
  that can no longer arise.
- `MCP_RUNTIME_PUBLIC_URL` env fallback removed — undocumented and
  unused; the option-then-publicBaseUrl chain is enough.
- `loadProjectMcpConfig` collapsed from `boolean | string` to
  `boolean`. No caller used the string form.
- "War story" comments stripped: layout-stability rationale,
  insertion-order rationale, the `// no longer …` block, the
  earlier `.then(...)` story, etc.
- JSDoc trimmed to one-liners where the option name already says
  what the value is for.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Docs changes:
- mcp-servers usage: drop the `loadProjectMcpConfig: '/abs/path'`
  string form (collapsed to boolean), document the stable
  `mcpOAuthRedirectBase` requirement for embedders on `port: 0`,
  and update the Authorize/Re-authorize button description.
- mcp-registry reference: clarify `publicUrl` must be stable across
  restarts and link the redirect-URI shape.
- embedded-builtins: add `extraMcpServers`, `loadProjectMcpConfig`,
  `mcpOAuthRedirectBase`, `openAuthorizeUrl`, `onConfigError` to
  the BuiltinAgentsServerOptions table.

Settings UI: render the server's transport / auth / tool-count
meta inline next to the name (was a separate `description` line).

Also drop the leftover PR Shepherds spec — that work lives on a
different branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CLI gained a `loadProjectMcpConfig: true` opt-in for MCP server
loading; needs to be released alongside the other MCP packages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's "Test packages/agents-mcp" job runs `pnpm run coverage --run`,
which all the other workspace packages have. Without it the step
errors with "Missing script: coverage" and the JUnit upload step has
no report to ship.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's tsdown DTS bundle step couldn't resolve `src/credentials/types.d.ts`
under parallel-workspace build conditions, even though the file
existed and the local single-package build succeeded. Moving the
shared `AuthStore` / `AuthStoreHooks` types alongside their only
implementation (`auth-store.ts`) sidesteps the cross-file DTS
resolution and keeps the surface tighter — only one importer
(`auth/sdk-provider`) needed updating.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@balegas balegas merged commit 1df7cce into main May 8, 2026
57 checks passed
@balegas balegas deleted the balegas/mcp-impl-v2 branch May 8, 2026 16:55
robacourt added a commit that referenced this pull request May 12, 2026
## Problem

`@electric-ax/agents-mcp` was added as a workspace dependency of
`agents-runtime` in #4289, but was never added to the `agents-server`
Dockerfile. This caused the Docker build to fail with:

```
ERR_PNPM_WORKSPACE_PKG_NOT_FOUND  In packages/agents-runtime: "@electric-ax/agents-mcp@workspace:*" is in the dependencies but no package named "@electric-ax/agents-mcp" is present in the workspace
```

This broke the `publish-agent-server-to-dockerhub` job in the Changesets
release run triggered by #4252.

## Fix

Add `agents-mcp/package.json` to the manifest-copy stage and add a
`COPY` + build step for `agents-mcp` before `agents-runtime` (which
depends on it).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants