Skip to content

Qurator: wire in Platform MCP Server tools#4840

Merged
nl0 merged 53 commits into
masterfrom
feat/qurator-mcp
Apr 29, 2026
Merged

Qurator: wire in Platform MCP Server tools#4840
nl0 merged 53 commits into
masterfrom
feat/qurator-mcp

Conversation

@nl0
Copy link
Copy Markdown
Member

@nl0 nl0 commented Apr 21, 2026

Summary

Wires the Quilt Platform MCP Server into Qurator. Catalog users get the same ~25 platform tools (packages, search, S3, Athena, tabulator, get_resource) that external MCP clients see, with full lifecycle handling and a persistent status surface tied to the chat input.

What changes for users

The catalog's prior in-app catalog_global_getObject is reshaped into catalog_preview (single s3_uri input), and the platform server's read surface (platform__object_read, platform__s3_object_info, platform__package_browse, etc.) is added alongside it. Two read paths, two purposes:

  • catalog_preview — for content the model needs to interpret. Returns thumbnail-resized images (via the /thumbnail API gateway, ≤1024×768), native Bedrock Document blocks for ≤500 KiB PDFs / Office docs, and language-tagged text. Use when the goal is summarize / describe / extract.
  • platform__object_read — for raw bytes, scripting, or any case where the model doesn't need to interpret content. Returns text decoded UTF-8 (capped at 16 KiB) or base64-encoded binary.

A <reading-objects> guidance message is added to the prompt context: call platform__s3_object_info first to inspect ContentLength / ContentType, then pick the right tool from the metadata.

Trade-offs vs. the prior in-app tool

  • Two read paths to choose from. The model now picks; the tool descriptions and the guidance message steer it. Risk: it picks the wrong one (e.g. raw-bytes a PDF instead of preview-rendering it). Mitigation: explicit when-to-use language on both tools, and the metadata-first guidance.
  • catalog_preview is catalog-only. External MCP clients (non-catalog) only see the platform server, so they get platform__object_read but not the rich preview path. Native-document support on the platform server is a separate follow-up.
  • >500 KiB documents still can't go through Bedrock natively. Same as before this PR. Not a new regression.
  • Image formats outside the thumbnail lambda's coverage (jpg|jpeg|png|gif|webp|bmp|tiff|tif|czi) won't preview-resize and will fall through to the platform path. Same as before this PR.

Other UX changes

  • Destructive tool calls execute without an in-chat confirmation step. Unchanged in mechanism from before, but the platform server's surface includes more mutating tools than the prior in-app surface (S3 puts, package create, tabulator set/rename), so the blast radius is wider. A confirmation UX using MCP's destructiveHint / readOnlyHint annotations is a separate follow-up.
  • Connector status appears in the chat input's helper-text region (severity-tinted, with inline reconnect / continue-without / acknowledge actions). No persistent banner. Detail in the UI section below.

Architecture

Model/
├── Connectors/
│   ├── Connectors.ts       Backend interface, ConnectorState machine,
│   │                       lifecycle (manageConnector), service builder,
│   │                       contextContribution, React bridges
│   ├── Mcp.ts              MCP wire client + bearerPassthru adapter
│   ├── Connectors.spec.ts  30 lifecycle / service-primitive tests
│   ├── Mcp.spec.ts         15 wire-level tests
│   └── index.ts            barrel
└── Tool.ts                 (unchanged shape; Bedrock toolSpec built here)

A connector pairs a configuration (id, title, hint) with a Backend — the protocol contract (initialize, listTools, callTool, ping). The lifecycle (Connecting → Ready → Disconnected → Failed, heartbeat, reconnect budget) is generic in Backend; the platform-specific bits live in one factory call: Mcp.bearerPassthru({ url, getToken }). Adding a different backend (an OAuth-flavored MCP, an in-process tool source) is a new factory; the lifecycle doesn't move.

Per-connector lifecycle

Connecting ──success──→ Ready
Connecting ──failure──→ Failed{acked:false}
Ready ──health-flip───→ Disconnected{retrying:true}
Disconnected ──recovery→ Ready
Disconnected ──exhaust→ Failed{acked:false}
Failed ──user retry───→ Connecting
Failed{!acked} ──ack──→ Failed{acked:true}
  • Heartbeat: ping every 30 s with a 5 s timeout. Two consecutive transport failures (from heartbeat or tool calls — they share a Health ref) flip the state to Disconnected. The threshold-crossing watcher subscribes to the SubscriptionRef stream so cross-source crossings fire immediately.
  • Reconnect: probe-driven. While Disconnected, the lifecycle pings at escalating cadence (5 s → 10 s → 20 s → 30 s cap); each successful ping triggers a bootstrap (cap: 3 attempts). The whole window is bounded by an outer 60 s timeout so the loop can't stay disconnected forever.
  • Read-only-tool retry-once: tools annotated readOnlyHint: true retry once on transport error before counting toward the threshold. Destructive / unannotated tools fail immediately.
  • No silent degradation: when the budget is exhausted, the conversation gates on Failed{!acked} until the user clicks reconnect or "continue without". Failed{acked:true} is sticky — the connector reads as unavailable in the prompt and the LLM stops trying its tools.
  • Force-reconnect from healthy states: connector.retry from Ready or Disconnected degrades health to threshold so the lifecycle naturally falls into the reconnect probe loop. From Failed it drains the controls queue. From Connecting it's a no-op (bootstrap is already in flight).
  • Sole writer: each lifecycle fiber owns one SubscriptionRef<ConnectorState>. Aggregate predicates (isBlocked, awaitUnblocked, contextContribution) merge across all per-connector refs. Lifecycle fibers are scope-bound and interrupted on Assistant unmount.

Conversation integration

Two new state-machine elements:

  • State.AwaitingConnector { events, timestamp, waiter } — only the waiter fiber handle, no per-connector snapshot.
  • Action.ConnectorReady.

A small advanceFromEvents helper centralizes the gating decision at both entry points (Idle.Ask and ToolUse.ToolResult last-call):

  1. Reads connectors.isBlocked synchronously (handlers don't observe asynchronously).
  2. If blocked, forks connectors.awaitUnblocked; that fires ConnectorReady and the handler re-runs.
  3. Otherwise forks the LLM request as before.

llmRequest folds connectors.contextContribution into the prompt context at request time. No React-side bridge.

Tool dispatch

Each connector tool's executor goes through connectors.byId[id].callTool(name, input):

  1. Reads the connector's state — fast-fails with a Tool.fail text block if not Ready.
  2. Fires the wire call. On a transport error from a readOnlyHint tool, the call is retried once.
  3. Updates the shared Health ref on transport-level outcome (success → reset, failure → bump). Non-transport errors (server-side application errors, schema decode failures) don't disturb health.

Wire client (Mcp.ts)

  • @effect/platform HttpClient over FetchHttpClient.layer. Cancellation via fiber interrupt aborts the underlying fetch.
  • Eff.Schema decode at every response boundary; failures map to McpProtocolError.
  • Tagged errors via _tag discriminator: McpAuthError, McpTransportError, McpProtocolError, McpRpcError. The connector layer maps these onto an abstract BackendError (Transport/Auth/Protocol/Application); the wire-level tag is preserved on cause for telemetry but the lifecycle and UI never speak MCP vocabulary.
  • Methods: initialize + notifications/initialized, tools/list, tools/call, resources/read, ping.
  • tools/list.inputSchema is forwarded to Bedrock as-is. Server is authoritative on its own schema's draft and structure; we don't reshape.
  • SSE handling for FastMCP stateless_http=True mode (text/event-stream responses with a single message event).
  • Trace-context propagation explicitly disabled on each request (HttpClient.withTracerPropagation(false)). Effect's HttpClient auto-injects W3C traceparent + Zipkin b3 headers from the active span; the platform MCP server's CORS allow-list rejects unknown headers (preflight returns 400 "Disallowed CORS headers"), and we don't have a tracing collector on the receiving end either.
  • A small dedup in Tool.make lifts description out of the JSON schema body so Bedrock doesn't ship it twice (once on toolSpec.description, once in inputSchema.json.description). The MCP path already keeps these separate.

UI

The user-facing surface is the chat input's helper-text region. When all connectors are Ready, the helper falls back to the default disclaimer. When any connector is non-Ready, the helper-text region carries:

  • <title>: connecting… / reconnecting… (warning, no actions — already auto-progressing)
  • <title>: couldn't connect • reconnect • continue without (error)
  • <title>: unavailable • reconnect (warning, input re-enabled — acked is sticky)

Aggregate severity picks the worst category across connectors. Inline actions use the existing MessageAction style (clickable span, opacity transition on hover).

The chat panel is otherwise empty at the top — there's no persistent banner. The prior banner pushed history down and competed with the absolute-positioned hamburger menu; consolidating into the helper-text region makes the "input is disabled because…" relationship spatial.

The DevTools panel exposes a per-connector subsection: heading (Connector: <title>), hint (muted), state line + inline reconnect/acknowledge actions, error detail row, and a live contextContribution preview that hides when nothing is contributable.

Auth

Mcp.bearerPassthru({ getToken }) is the 1st-party adapter — caller resolves the bearer on every call (the adapter never manages refresh, expiry, or session state). Catalog wires getToken to re-read the redux session token via a memoized selector, so token rotation is handled without explicit plumbing. A future Mcp.oauth(...) factory (or any other backend) would be a sibling, not a fork of the lifecycle.

Test plan

  • ESLint / Prettier — clean.
  • LSP workspace diagnostics — zero errors / warnings across all touched files.
  • Vitest under Node 20 — 68 Assistant tests pass:
    • Mcp.spec.ts (17) — wire-level make() integration, content mapping, parseSseToJson.
    • Connectors.spec.ts (36) — state predicates, lifecycle transitions, retry/ack from Failed, callTool gating, transport-error → threshold-crossing, TestClock-driven heartbeat / probe-driven reconnect, read-only retry-once, layer-level service primitives + multi-connector aggregation, stale-control drain on Failed entry, autoload of reference-grade resources.
    • Conversation.spec.ts (8) — Idle / AwaitingConnector / Abort / Discard transitions, ToolUse → blocked → AwaitingConnector path, AwaitingConnector waiter freshness across cycles.
    • navigation.spec.ts (7) — unchanged; snapshot still valid.
  • Manual verification on the qur dev stack (local catalog at :3000, deployed platform MCP server) — Ready, Disconnected → reconnect → Ready, Failed{!acked} → reconnect / continue-without, Failed{acked} unavailable; input gating; helper-text severity; DevTools panel.
  • End-to-end smoke through 9 design job stories on the qur dev stack — list/count, search packages, object search, package browse, read text, read image, athena (databases + tables), destructive write, catalog link.
  • catalog_preview end-to-end (local catalog → dev stack PMS) — model follows the metadata-first guidance: platform__search_objectsplatform__s3_object_infocatalog_preview for an image; catalog_preview returns a metadata text block + Image content block; the model renders a description of the rendered image. Error fallback path also exercised (graceful text-block error when the thumbnail fetch fails).
  • End-to-end on nightly post-merge — search/browse/read (incl. image and a PDF), one read-only and one destructive platform op, catalog_preview on text + image + small PDF.

Depends on

  • The Quilt Platform MCP Server's CORS + HTTP entry-point release — already deployed to the qur dev stack.
  • A deployment-side change that routes registry-<stack>.domain.com/mcp/platform/* and sets CORS_ORIGINS on the MCP task — open, dev-stack-deployed; auto-deploys to nightly on merge.

Greptile Summary

This PR wires the Quilt Platform MCP Server into Qurator, adding ~25 platform tools (packages, search, S3, Athena, tabulator) to the catalog's chat UI. It introduces a complete connector abstraction layer (Connectors.ts, Mcp.ts) with lifecycle state machine, heartbeat, probe-driven reconnect, and an AwaitingConnector conversation state that gates LLM turns when connectors are unresolved. The catalog_global_getObject tool is reshaped into catalog_preview with a two-path read strategy guided by a new system message.

Confidence Score: 4/5

PR is safe to merge; no P0/P1 findings. All P2 issues are accounting subtleties and documentation gaps that don't break any current code path.

No critical or blocking bugs found. The four P2 comments cover: a nuanced health-accounting difference for read-only retries, a missing WEBP case in the thumbnail format switch that causes a graceful text fallback rather than a crash, a documentation gap on why HTTP 5xx errors are not retry-eligible, and a frozen-URL note for the dev override. All lifecycle, state-machine, auth, and wire-protocol paths look correct.

catalog/app/components/Assistant/Model/GlobalContext/preview.ts (WEBP format switch), catalog/app/components/Assistant/Model/Connectors/Connectors.ts (retry/health accounting), catalog/app/components/Assistant/Model/Assistant.tsx (fiber-leak on aborted render)

Important Files Changed

Filename Overview
catalog/app/components/Assistant/Model/Connectors/Connectors.ts New file: full connector lifecycle (Connecting → Ready → Disconnected → Failed), heartbeat, probe-driven reconnect, callTool gating, health ref, React bridges. Well-structured; one minor retry-health accounting subtlety.
catalog/app/components/Assistant/Model/Connectors/Mcp.ts New file: MCP wire client over Effect HttpClient; SSE parsing, error taxonomy, bearerPassthru adapter. Solid schema validation at every response boundary; tracerPropagation disabled to avoid CORS rejection.
catalog/app/components/Assistant/Model/Conversation.ts Adds AwaitingConnector state + ConnectorReady action; centralizes gating in advanceFromEvents; merges connector contextContribution into every LLM prompt at request time.
catalog/app/components/Assistant/Model/Assistant.tsx Wires platform connector config, allocates ConnectorsService via useConst with documented fiber-leak risk on aborted renders, provides it to ConversationActor layer.
catalog/app/components/Assistant/UI/Chat/Chat.tsx Adds AwaitingConnectorState renderer, connector helper-text with severity coloring, per-connector state subscriptions via Actor.useState in a map (eslint suppressed, documented as stable).
catalog/app/components/Assistant/UI/Chat/DevTools.tsx Adds ConnectorsPanel with per-connector state display, manual retry/ack actions, and live contextContribution preview. Effect stream cleanup is correct.
catalog/app/components/Assistant/Model/GlobalContext/preview.ts Renames getObject → useCatalogPreview; adds single s3_uri input schema; keeps two-path read logic (image thumbnail vs. native Document block). Missing WEBP case in thumbnail format switch.
catalog/app/components/Assistant/UI/Chat/Input.tsx Adds helperText/helperSeverity props; severity-tints FormHelperText via MUI JSS specificity trick. Clean change.
catalog/app/utils/Actor.ts Implements the previously TODO'd actor listener interrupt on unmount; exports Actor interface. Correct use of useEffect cleanup.
catalog/app/containers/Auth/selectors.js Adds token selector reading the bearer string directly; consistent with existing exp selector pattern; PFSCookieManager refactored to use it.

Sequence Diagram

sequenceDiagram
    participant UI as Chat UI
    participant Conv as ConversationActor
    participant CS as ConnectorsService
    participant LC as Lifecycle Fiber
    participant MCP as Platform MCP Server

    UI->>Conv: Action.Ask
    Conv->>CS: isBlocked
    alt blocked
        CS-->>Conv: true
        Conv->>CS: fork awaitUnblocked
        Conv-->>UI: State.AwaitingConnector
        LC-->>CS: connector flips Ready
        CS->>Conv: Action.ConnectorReady
    end
    Conv->>CS: contextContribution
    CS-->>Conv: tools + connector overview messages
    Conv->>MCP: LLM request via Bedrock
    MCP-->>Conv: LLMResponse with toolUses
    loop tool calls
        Conv->>CS: byId[connId].callTool
        CS->>LC: gate on Ready state
        LC->>MCP: POST bearer token
        MCP-->>LC: McpToolResult
        LC-->>Conv: Tool.Result
    end
    Conv->>MCP: LLM request with tool results
    MCP-->>Conv: final text response
    Conv-->>UI: State.Idle
Loading

Comments Outside Diff (1)

  1. catalog/app/components/Assistant/UI/Chat/Chat.tsx, line 519-533 (link)

    P1 Conditional Actor.useState calls violate the Rules of Hooks

    The comment acknowledges this but marks it safe because "connectors.byId is built once … and never re-keyed." That guarantee holds today, but the comment is the only guard. The ESLint rule is suppressed rather than satisfied. If connectors (the ConnectorsService) ever comes from a re-run of useConnectors (e.g., after a future hot-reload, StrictMode double-invoke, or a different connector list), the hook call-count will differ between renders and React will throw. Object.values(connectors.byId).map(c => Actor.useState(c.state)) is the concrete violation line.

    A safe alternative is to lift the per-connector subscription into its own small component (<ConnectorRow connector={c} />) so each hook call is unconditionally inside its own render, or to aggregate state via the existing useIsBlocked + a single merged stream hook, avoiding the per-connector useState calls at the Chat level entirely.

Reviews (6): Last reviewed commit: "Connectors.spec: restore precondition as..." | Re-trigger Greptile

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 67.06282% with 194 lines in your changes missing coverage. Please review.
✅ Project coverage is 46.37%. Comparing base (cea497e) to head (34c862a).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...alog/app/components/Assistant/UI/Chat/DevTools.tsx 2.00% 45 Missing and 4 partials ⚠️
...g/app/components/Assistant/Model/Connectors/Mcp.ts 74.83% 37 Missing and 2 partials ⚠️
catalog/app/components/Assistant/UI/Chat/Chat.tsx 5.12% 31 Missing and 6 partials ⚠️
...talog/app/components/Assistant/Model/Assistant.tsx 11.53% 20 Missing and 3 partials ⚠️
...omponents/Assistant/Model/Connectors/Connectors.ts 92.57% 17 Missing and 2 partials ⚠️
...omponents/Assistant/Model/GlobalContext/preview.ts 13.33% 10 Missing and 3 partials ⚠️
...log/app/components/Assistant/Model/Conversation.ts 88.23% 4 Missing ⚠️
catalog/app/components/Assistant/UI/Chat/Input.tsx 25.00% 2 Missing and 1 partial ⚠️
...app/components/Assistant/UI/Chat/MessageAction.tsx 25.00% 3 Missing ⚠️
catalog/app/utils/Actor.ts 0.00% 3 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4840      +/-   ##
==========================================
+ Coverage   45.69%   46.37%   +0.68%     
==========================================
  Files         829      832       +3     
  Lines       33532    34097     +565     
  Branches     5698     5824     +126     
==========================================
+ Hits        15323    15814     +491     
- Misses      16212    16277      +65     
- Partials     1997     2006       +9     
Flag Coverage Δ
api-python 93.14% <ø> (ø)
catalog 21.26% <67.06%> (+1.70%) ⬆️
lambda 96.63% <ø> (ø)
py-shared 98.18% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread catalog/app/components/Assistant/Model/PlatformMcpContext/client.ts Outdated
Comment thread catalog/app/components/Assistant/Model/PlatformMcpContext/client.ts Outdated
@nl0 nl0 force-pushed the feat/qurator-mcp branch from ba80253 to 65db5f5 Compare April 23, 2026 05:30
Adds a `PlatformMcpContext` provider that connects to the Quilt Platform
MCP Server (PMS), discovers its tools via `tools/list`, and exposes them
to Qurator's existing tool system. Each MCP tool is namespaced
`mcp__platform__<name>` to avoid collisions with catalog-native tools.

This makes PMS's 25 tools (packages, search, S3, Athena, tabulator)
available inside the catalog chat experience, matching what external MCP
clients like Claude Web get via the Connect Server.

Implementation notes:
- Thin JSON-RPC client over Effect. MCP's official JS SDK doesn't
  officially support browsers (README lists Node/Bun/Deno) and the
  protocol over stateless HTTP is trivial enough to implement directly.
  client.ts + bridge.ts collapsed into a single Mcp.ts; tagged errors
  (McpAuthError, McpTransportError, McpProtocolError, McpRpcError)
  drive per-kind handling via Effect.catchTags. index.ts uses
  runtime.runFork to drive bootstrap and Fiber.interrupt for cleanup —
  in-flight fetches are actually aborted on unmount.
- MCP server runs stateless HTTP, so there's no persistent connection —
  each tool call is an independent POST carrying the current session
  token.
- Auth: catalog session token (HS256 JWT) is forwarded as
  `Authorization: Bearer <token>`. PMS doesn't validate — it passes
  through to Registry, which authenticates on every call, same as
  `/auth/get_credentials` for Bedrock creds.
- Bridge bypasses `Tool.make()`'s Effect-Schema decoding and passes
  MCP's JSON Schema straight through to Bedrock; server-side FastMCP +
  Pydantic is authoritative on input validation.
- MCP URL: `${cfg.registryUrl}/mcp/platform/mcp`, overridable via
  `localStorage.QUILT_MCP_URL` (same pattern as `QUILT_BEDROCK_MODEL_ID`).
- Reads `quilt-platform://search_syntax` as a context message to help
  the LLM construct better search queries.
- SSE parser normalizes CRLF → LF before splitting on blank lines
  (FastMCP emits `\r\n\r\n` as event separator).
- tsconfig.target=ES5 downlevel emit breaks
  `class X extends Eff.Data.TaggedError(...)(...)` — plain classes with
  `_tag` field are the Effect-compatible workaround (catchTags only
  inspects `_tag`).
- Drops `catalog_global_getObject` (superseded by MCP's `object_read`)
  and deletes the now-unused `preview.ts`.

Tests: 14 cases in Mcp.spec.ts against stub McpClient records — covers
mapContent, buildTool, loadContext success/error paths.

Part of qhq-5d0 (Qurator ↔ Platform MCP Server integration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nl0 nl0 force-pushed the feat/qurator-mcp branch from cf0804f to 867dd36 Compare April 27, 2026 10:33
nl0 and others added 21 commits April 27, 2026 13:49
The module's semantic is "platform tools and context for the assistant",
mirroring `GlobalContext`. MCP is the wire transport — its prefix on the
folder name leaks an implementation detail.

Wire-level types and files in `Mcp.ts` keep the `Mcp*` prefix because they
really do describe the MCP transport (`McpClient`, `McpToolDescriptor`,
`McpContent`, `Mcp*Error`).

- Folder rename via git mv.
- Public hook: usePlatformMcpContext → usePlatformContext (re-exported as `use`).
- Module logger tag updated to 'PlatformContext'.
- Header comment trimmed.

Tracked in qhq-382.1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three coordinated refinements that share the same files; ship together.

1. `getToken` is now `() => Effect<string, McpAuthError>`. The hook reads
   the live token from `redux.useStore()` per call via a memoized
   `selectToken = createSelector(authSelectors.tokens, t => t?.token)`,
   following the pattern in `utils/PFSCookieManager.tsx`. The
   `tokenRef` indirection and stale-closure-by-effect-deps risk are
   gone; `Mcp.ts` uses `yield* options.getToken()` directly.

2. JSON-RPC IDs come from `uuid.v4` (already a catalog dep, used in
   `Conversation.ts:16`). Drops the in-closure `let nextId = 0`. IDs
   are strings — JSON-RPC permits this.

3. `McpContextState` → `PlatformContextState` as a `_tag`-discriminated
   union (Loading / Ready / Error). Eliminates the optional-fields
   shape. Constructors via plain object factories rather than
   `Eff.Data.taggedEnum` because the babel ES5 transform that broke
   `Eff.Data.TaggedError` for our error classes has the same problem
   here. `Match.tag` and conditional narrowing both work.
   `Context.usePushContext` only fires non-empty payloads on `Ready`.

Tests: Mcp.spec.ts grows from 14 to 21 cases. New coverage:
- Wire-level `make()` behavior via stubbed global fetch:
  - Two consecutive RPCs produce different IDs (uniqueness, not format).
  - Bearer token sourced from the supplied `getToken` effect.
  - `McpAuthError` short-circuits without firing fetch.
- `PlatformContextState` constructor shape per tag.
- Existing `loadContext` cases updated to assert `_tag` instead of
  optional fields, with type narrowing via `if (state._tag !== 'Ready')`.

Tracked in qhq-382.2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces raw `fetch` with `@effect/platform`'s HttpClient and validates
every wire response through `Eff.Schema` decoders.

Wire transport
- `HttpClientRequest.post` + `bearerToken` + `setHeaders` + `bodyText`
  builds the request; `httpClient.execute` sends it. Cancellation and
  AbortSignal still work — they're integrated by the layer.
- `Eff.Effect.provide(FetchHttpClient.layer)` is scoped to `post()`
  inside `make()` rather than baked into the runtime; PlatformContext
  is the only consumer of HttpClient in catalog so far. Internal-only;
  the public `McpClient` interface stays `R = never`.
- `HttpClientError`s map to `McpTransportError` with the underlying
  cause merged into `detail` for triage.

Wire validation
- Schemas for the JSON-RPC envelope, `tools/list` result,
  `tools/call` result, `resources/read` result, and the discriminated
  `McpContent` union (Text / Image / Resource + permissive `{ type }`
  fallback for forward compat with future content types).
- `inputSchema` on `McpToolDescriptor` stays `Schema.Unknown` —
  passthrough to Bedrock as designed.
- `decodeWith(schema, label)` helper maps any `ParseError` to
  `McpProtocolError` carrying the label so failures are
  source-identified ("envelope: …", "tools/list: …", etc.).
- `rpc()` no longer takes a generic `<T>` and casts; each operation
  decodes its own result envelope.

SSE
- Pulled into a pure `parseSseToJson(text)` function. Whether the
  response comes back as JSON or `text/event-stream`, we collect the
  full body via `resp.text` then route through the same Schema decode.
- Removed the streaming reader that owned the `Response` directly —
  HttpClient gives us `resp.text` and that's enough for the one-shot
  SSE responses FastMCP emits under `stateless_http=True`.

Tests: Mcp.spec.ts grows from 21 to 28. New coverage:
- Malformed envelope → `McpProtocolError` with `envelope` label.
- `tools/list` shape mismatch (tools: string instead of array) →
  `McpProtocolError` with `tools/list` label.
- SSE happy path: `Content-Type: text/event-stream`, CRLF-separated
  `data:` line, decodes through the schema.
- `parseSseToJson` standalone: CRLF parsing, multi-line data, empty
  stream returns null, malformed JSON throws.
- Stub fetch updated to decode `Uint8Array` request bodies (the form
  `bodyText` produces) before JSON.parse — this caught a real-world
  papercut for anyone wiring fetch stubs against this layer.

Tracked in qhq-382.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tool collection keys go from `mcp__platform__<name>` to
`platform__<name>`. MCP is the wire protocol, not a name the LLM (or
the chat UI showing tool calls) needs to see. Future MCP servers
would use their own service prefix, e.g. `catalog__<name>`.

Single-line catalog change: extract the prefix as `TOOL_PREFIX`
constant exported from `Mcp.ts` so tests reference the same source
of truth.

Tracked in qhq-9sh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a status banner that renders inline at the top of the conversation,
alongside the welcome message:

- Loading: "Loading platform tools…"
- Ready:   "Platform tools ready · N tools available"
- Error:   "Couldn’t reach platform tools" + retry action + collapsible
  Details (the tagged error's `.message`).

Wrapped in `aria-live="polite"` so screen readers announce state
transitions without interrupting the user.

Hook & retry plumbing
- `usePlatformContext` now returns a `PlatformContextHandle` =
  `{ state, retry }` rather than the bare state. Manual retry bumps
  an internal token; the existing `useEffect` keys on it, killing
  the in-flight fiber and re-running `loadContext` from scratch.
  Setting state back to `Loading` at the start of each effect run
  gives the user a consistent loading affordance after retry.
- Assistant API exposes the handle as `platform`. Threaded through
  `UI.tsx` → `Chat.tsx` → `<PlatformStatus>`.

Auto-retry inside `loadContext`
- Transport-class failures (`McpTransportError`) auto-retry up to
  twice with `Schedule.exponential('500 millis')` — total ~1.5s
  before surfacing as Error so the UI lands within a reasonable
  window for a chat session.
- Auth / Protocol / RPC errors do NOT auto-retry. Auth needs user
  action (re-login); the others are code/server mismatches that
  won't change on a re-attempt.

Helper
- `matchState(state, { Loading, Ready, Error })` exported from
  `Mcp.ts` (and re-exported via `PlatformContext`) for ergonomic
  pattern-match in render paths. Pure data dispatch — no Effect
  import overhead at the call site.

Tests: Mcp.spec.ts grows from 28 to 34. New coverage:
- Retry-recovery: stub initialize fails twice with `McpTransportError`
  then succeeds → `loadContext` returns `Ready`, observed 3 attempts.
- No retry on `McpAuthError` — exactly 1 attempt.
- No retry on `McpProtocolError` — exactly 1 attempt.
- `matchState` dispatch for each branch with payload assertions.

Manual verification still applies for the chat-UI rendering itself:
- Force the three states on the qur dev stack (normal load, stop
  local PMS for transport-error, log out for auth-error) and
  confirm the inline banner renders + retry button refires bootstrap.

Tracked in qhq-382.4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…esource

Three things that were wrong or partial in the previous pass:

1. **`PlatformContextState` now uses `Eff.Data.taggedEnum`.**
   The earlier pass sidestepped `taggedEnum` on the (incorrect) belief that
   the catalog's ES5 target broke the same way it does for
   `Data.TaggedError`. Wrong: catalog uses `ts-loader` (no babel), and
   `taggedEnum`'s runtime is a `Proxy` over plain functions — no class
   extension. `Data.taggedEnum` is used extensively elsewhere in the
   Assistant module (Content.ts, Conversation.ts, LLM.ts).

   Constructors and `$match` come for free; the manual factories +
   `matchState` helper are gone. UI consumers use
   `PlatformContextState.$match(state, { Loading, Ready, Error })`.

2. **Tests inject fetch only via `FetchHttpClient.Fetch` service tag.**
   The previous "belt + suspenders" (service tag AND `vi.stubGlobal`) was
   a misdiagnosis — the actual bug was the test stub not decoding
   Uint8Array request bodies before `JSON.parse`. Once that was fixed,
   the canonical Effect injection works on its own. No globalThis
   pollution, no `afterEach` cleanup needed.

3. **`quilt-platform://athena` surfaced as Qurator context.**
   Bootstrap now reads both `search_syntax` and `athena` resources
   (concurrent, best-effort — failures fall through to no message).
   Audit decisions per the qhq-5d0 design: `buckets` is dropped
   (already in `GlobalContext/stack.ts`), `me` is dropped (already in
   Qurator core context). XML tag dropped the redundant `mcp-` token:
   `platform-search-syntax` / `platform-athena`.

Tests: 35 pass (was 34). New case for the athena resource. Renamed the
existing `matchState` describe to `PlatformContextState.$match`.

Tracked in qhq-382.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mechanical first step of the Connectors refactor (qhq-5d0.2 step 1):
the wire client moves out of the soon-to-be-deleted PlatformContext
folder into its new home. PlatformContext/index.ts re-points its imports
at ../Connectors/Mcp; tests still green. Subsequent steps build the
Connectors service around the wire client and delete PlatformContext
entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 2 of qhq-5d0.2. Defines the agent-level abstraction surface:
ConnectorId, TransportConfig, ConnectorConfig, ConnectorState, the
ConnectorRuntime + ConnectorsService interfaces, and a Connectors Tag.

ConnectorState is a tagged enum (Connecting / Ready / Disconnected /
Failed) covering both transient (auto-retry) and ack-needed
transitions; helper predicates (stateIsTransient, stateRequiresAck,
stateIsBlocked) give callers the same view aggregate methods will use.

Layer factory is a no-op placeholder. Step 3 fills in the per-connector
lifecycle fiber; step 4 wires service primitives.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Runtime)

Step 3 of qhq-5d0.2. Adds the long-running per-connector state machine
the layer needs to flip Connecting → Ready → Disconnected → Failed.

Wire client (Mcp.ts) gains a ping() method (MCP base protocol). The
Connectors module then owns:

- bootstrap (initialize + tools/list, no resource autoload — D21).
  Tools are namespaced `<id>__<name>` and built with executors that
  route through the connector's gated callTool (D27).
- manageConnector — the lifecycle effect. Sole writer to the
  per-connector state ref (D23). Loops Connecting → Ready (heartbeat
  loop with reconnect budget) → Failed → wait-for-retry → repeat.
- runHeartbeat — pings every 30s with 5s timeout, updates a shared
  Health ref. Threshold of 2 consecutive failures (from ping or tool
  calls) flips state to Disconnected via a SubscriptionRef.changes
  watcher (D24).
- runReconnectBudget — 3 attempts at 1s/3s/9s; success returns to
  Ready, exhaustion → Failed{acked:false} (no silent degradation,
  D22).
- makeConnectorCallTool — gates on Ready, fast-fails otherwise. On
  transport error bumps the same Health counter heartbeat watches; on
  success resets it. Server-application errors don't disturb health.
- buildConnectorRuntime — factory that allocates state/health/controls
  refs, wires callTool, forks the lifecycle fiber under the surrounding
  Scope. Optional client param for tests.

Connectors.spec.ts covers state predicates, bootstrap success/failure,
retry from Failed, acknowledge → Failed{acked:true}, callTool gating
in non-Ready states, mapping of tool-side errors, and TestClock-driven
heartbeat / reconnect / threshold-crossing scenarios.

The Layer factory is still the no-op placeholder; step 4 wires aggregate
service primitives across all per-connector runtimes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 4 of qhq-5d0.2. The empty Layer placeholder is replaced with the
real composition:

- Allocate one ConnectorRuntime per ConnectorConfig (fork lifecycle
  fibers under the layer scope; D32).
- isTransient / requiresAck / isBlocked: read all per-connector states
  via SubscriptionRef.get and aggregate with the matching predicates.
- awaitUnblocked: merge state.changes streams across all connectors;
  re-evaluate isBlocked on each emission; complete on first false.
  Zero-config layer short-circuits to Effect.void.
- contextContribution (D28): walk byId, contribute Ready connectors'
  tools, render an XML <connectors> overview message with
  <connector state="ready|unavailable"> per stable connector
  (D29). Transient / acked-false states are skipped — the
  Conversation actor blocks on AwaitingConnector before reaching this.

Layer factory takes an optional clientById map (test seam) so the
spec can drive the real layer end-to-end with stub wire clients.
buildConnectorRuntime grew the same seam.

Connectors.spec.ts gains 6 layer-level tests: initial blocked →
unblocked transition, awaitUnblocked completion, contextContribution
shape for Ready and Failed{acked:true}, multi-connector OR
aggregation, zero-config short-circuit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 5 of qhq-5d0.2. The Connectors module replaces PlatformContext at
every catalog wiring point.

Connectors (index.ts):
- buildService extracted from layer so React can build the service via
  runtime.runSync into a manually-managed Scope (the Layer.scoped wrapper
  is kept for tests / pure-Effect callers).
- Re-exports McpAuthError, McpError + family so the catalog wires auth
  without importing the wire module directly.

Mcp.ts: drops PlatformContextState, loadContext, RESOURCES, TOOL_PREFIX,
buildTool, makeExecutor, AnyRecord, the bootstrap retry policy, and the
XML/Tool imports they pulled in. The Connectors lifecycle owns
bootstrap and the Connectors module owns tool-descriptor construction.
mapContent stays — Connectors uses it. Tests for the removed code go
with it; the wire-level make() integration / parseSseToJson / mapContent
suites remain.

Assistant.tsx:
- usePlatformConnectorConfig: the platform connector config built at
  React level (URL from cfg.registryUrl with localStorage override; auth
  Effect that re-reads the redux session token on every invocation per
  D33).
- useConnectors: allocates a manual Scope, runSync's buildService into
  it, returns the ConnectorsService. Scope.close on unmount interrupts
  every per-connector lifecycle fiber (D32).
- Connectors.layer is composed alongside Bedrock + Context layers via
  Layer.succeed(Connectors, service) so the actor sees the service. The
  same service object is exposed on AssistantAPI as `connectors`
  (replacing the previous `platform` PlatformContextHandle).
- The PlatformContext folder is deleted; index.ts re-export drops it
  and adds Connectors.

UI: Chat.tsx no longer renders the inline PlatformStatus banner — the
persistent fixed status region (D30) lands in step 7. UI.tsx stops
threading platform through. Aside from the missing status banner, chat
behavior is unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 6 of qhq-5d0.2.

State + Action additions:
- State.AwaitingConnector { events, timestamp, waiter } — D25 + D26.
  Carries no per-connector snapshot; UI reads connector states live
  from the SubscriptionRefs in connectors.byId.
- Action.ConnectorReady — fired by the waiter fiber on awaitUnblocked.

advanceFromEvents helper centralizes the gating decision used at the
two decision points (D25):
1. Reads connectors.isBlocked synchronously (handler discipline — no
   async observation in handlers).
2. If blocked, forks a waiter that yields connectors.awaitUnblocked
   then dispatches ConnectorReady. Stores the fiber on AwaitingConnector
   so Abort can interrupt it.
3. Otherwise forks the LLM request and transitions to
   WaitingForAssistant.

Wired into:
- Idle.Ask (existing append-event-then-forkRequest flow goes through
  advanceFromEvents instead).
- ToolUse.ToolResult when the last call resolves (replaces the inline
  forkRequest branch).
- AwaitingConnector handlers: ConnectorReady fires advanceFromEvents
  again (state has typically transitioned to unblocked by the time
  this lands); Abort interrupts the waiter and returns to Idle;
  Discard mirrors Idle's behavior.

llmRequest folds in connectors.contextContribution at request time
(D28): no React-side bridge, no Context.usePushContext from
Connectors.

Conversation.spec.ts covers all five paths under a stub connectors
service whose isBlocked is a SubscriptionRef tests can flip mid-run.
Confirms LLM is NOT invoked after Abort even if the connectors
unblock.

Actor R type extended to include Connectors.Connectors.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 7 of qhq-5d0.2. The previous in-chat `<PlatformStatus>` banner
(removed in step 5) is replaced by a persistent fixed status region in
the chat layout (D30).

Connectors module gains two React bridges over its SubscriptionRefs:
- useConnectorState(connector): subscribes to one connector's state
  ref via Stream.runForEach, mirrors changes into React state.
- useIsBlocked(service): re-evaluates the aggregate isBlocked
  predicate whenever any connector's state changes.

Chat.tsx gains:
- ConnectorStatusRow: per-connector row that pattern-matches state and
  renders the appropriate affordance — spinner for transient, ready /
  unavailable badges, Retry / Continue-without buttons for failed-not-
  acked, Retry-only for failed-acked. Buttons fork connector.retry /
  connector.acknowledge.
- ConnectorStatusRegion: stacks rows above the message history, always
  visible regardless of scroll. Renders nothing when no connectors are
  configured.
- Input gating: inputDisabled = state !== 'Idle' || connectors.isBlocked.
  Failed{acked:true} unblocks the input so the user can ask questions
  even though tools from that connector won't run (D29 'unavailable'
  state — they reach the prompt and the LLM is told they're out).

UI.tsx threads connectors through to Chat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 8 of qhq-5d0.2. The Assistant DevTools section grows a Connectors
panel between Recording controls and the Context/State/Prompt JSON
displays (D31).

Per-connector row: id + a one-line state summary (Failed shows
acked + tagged-error detail; Disconnected shows retrying flag +
error; Ready shows tool count) + manual Retry / Ack buttons that fork
the corresponding effects on the runtime.

Live preview of `connectors.contextContribution` rendered as a JSON
tree. Re-evaluates whenever any connector's state changes via the same
mergeAll pattern useIsBlocked uses, so the panel matches what the LLM
would see if a turn fired right now.

DevToolsProps gains `connectors`; Chat threads it through.

Closes the UI work scope of qhq-5d0.3 (the bead is subsumed by this
panel; the dedicated 'platform-tools state in DevTools' work no longer
needs its own tracking).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final type-safety pass — discovered via tsc audit beyond what the LSP
incremental cache surfaced.

Chat.tsx: add an AwaitingConnector branch to Conversation.State.$match
in the message history. Renders a 'Waiting for connectors…'
MessageContainer with an abort action, mirroring the WaitingState shape
the user sees while the LLM is mid-call. The persistent connector status
region (D30) handles the actual connector status; this entry is just
the in-history placeholder so the abort action stays where the user
expects it.

utils/Actor.ts: export the Actor interface so Conversation.spec.ts can
type the actor handle returned by Actor.start. The interface was
already part of the module's public shape via start's return type;
exporting it makes that explicit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses BLOCKERs surfaced by the fresh-eyes review of qhq-5d0.2.

1. Stale-control drain (D22 user-action gate). Retry / acknowledge
   buttons in DevTools and elsewhere can be clicked while the lifecycle
   is in Connecting / Ready / Disconnected. The unbounded controls
   queue used to buffer those tokens and consume them on the NEXT
   Failed entry — meaning a stray click during Ready could short-circuit
   the next user-action gate (e.g. exiting Failed{acked:false} without
   the user actually clicking Retry; or skipping the gate entirely with
   a buffered acknowledge).
   Fix: takeAll-drain the queue at entry to awaitRetryControl. Tokens
   offered while parked at Failed are processed normally; tokens from
   prior cycles get cleared.

2. Read-only-tool auto-retry (D24). Read-only tools auto-retry once on
   transport error before the failure counts toward the health
   threshold; destructive (or unannotated) tools fail immediately.
   - Mcp.McpToolDescriptor + schema gain optional `annotations`
     (readOnlyHint, destructiveHint).
   - ConnectorRuntime.callTool grows an optional CallToolOptions arg
     with `retryOnTransport`.
   - makeConnectorCallTool re-attempts once on McpTransportError when
     the option is set; only the second attempt's outcome touches
     health.
   - buildConnectorTool reads mcp.annotations.readOnlyHint and passes
     retryOnTransport accordingly.

Tests added to Connectors.spec.ts (27 total, +5 new):
- retryOnTransport on a single transport error → second attempt wins,
  status=success, attempts=2.
- retryOnTransport off (default) → single transport error fails,
  attempts=1.
- readOnlyHint annotation routes through buildConnectorTool to set
  retryOnTransport — verifies the wiring end-to-end through a tool's
  Tool.Descriptor executor.
- stale retry/acknowledge offered while Ready do NOT prematurely apply
  on transition (state stays Ready, no extra bootstrap).
- stale retry/acknowledge before first Failed entry are drained on
  Failed entry — connector parks at Failed{acked:false} and waits
  rather than auto-exiting.

Conversation.spec.ts (6 total, +1):
- AwaitingConnector re-enters fresh on each Ask cycle: aborting and
  re-asking yields a different waiter handle (no stale-fiber reuse).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Promotes the implicit McpClient interface to a named Backend contract
in Connectors/index.ts. Lifecycle becomes generic in Backend; concrete
backends adapt their wire protocol into it.

The TransportConfig single-variant taggedEnum was theatre — the real
abstraction was always McpClient (production via Mcp.make, tests via
stubClient). Names it correctly and drops the indirection.

Backend (Connectors/index.ts):
- BackendError: { _tag, message, transient? } — plain interface, no
  classes. transient drives the read-only-tool retry-once branch and
  the health-threshold bump (D24).
- BackendToolDescriptor: name, description, inputSchema, readOnly?
  (generic version of MCP's readOnlyHint).
- Backend: { initialize, listTools, callTool, ping } — callTool returns
  Tool.Result directly (mapContent moves to backend impl).

Mcp.bearerPassthru (Connectors/Mcp.ts):
- 1st-party MCP backend with catalog auth passthrough. Caller resolves
  the bearer token (typically a catalog session JWT) on every call;
  backend forwards as Authorization: Bearer, no token management.
- Distinct from a hypothetical Mcp.oauth(...) (would manage the OAuth
  flow) along the auth-scheme variation axis.
- adaptDescriptor lifts McpToolDescriptor → BackendToolDescriptor.
- adaptResult maps wire content via mapContent and folds isError into
  Tool.fail / Tool.succeed.
- adaptError preserves the wire error _tag and marks
  McpTransportError as transient (only).
- getToken: () => Effect<string|null> — null short-circuits to
  internal McpAuthError; cleaner than nested suspend-with-fail-class
  at the catalog wiring site.

Drops:
- TransportConfig taggedEnum + makeTransportClient + $match.
- clientById parameter on buildConnectorRuntime / buildService / layer.
- All Mcp.* runtime references in lifecycle code.
- Connectors.McpAuthError re-export and the Assistant.tsx
  construction site.
- McpError / McpTransportError / McpProtocolError / McpRpcError type
  re-exports from Connectors (no consumers).

Touches:
- Assistant.tsx: TransportConfig.Mcp({ url, auth }) → Mcp.bearerPassthru
  ({ url, getToken }). Adds a direct Mcp import — backend factories
  are concrete by definition; that's the right abstraction line.
- Connectors.spec.ts: stubClient → stubBackend (Backend shape, drops
  readResource); transportError / authError become plain BackendError
  literals; tests construct config-with-backend per-test (no clientById
  parameter); replaces "delegates to client.callTool / maps content"
  with "delegates to backend.callTool / forwards content" (mapContent
  moved out).
- New test: "non-transient backend errors do NOT bump health" — covers
  the auth-vs-transport split now that BackendError.transient is
  explicit instead of the McpTransportError _tag check.
- Mcp.spec.ts: unchanged (still tests the wire client via Mcp.make,
  Mcp.mapContent, error classes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fresh-eyes review of qhq-5d0.4 (5decfa3) flagged three concrete issues
plus two known-limitation items. Addresses what's clearly in-scope;
the rest filed as qhq-5d0.5 and qhq-5d0.6.

BackendError: abstract _tag, wire detail in `cause`.
- BackendError._tag is now BackendErrorTag = 'Transport' | 'Auth' |
  'Protocol' | 'Application' — connector-layer vocabulary, not wire.
- `cause?: string` carries the wire-level tag (McpTransportError, etc.)
  for telemetry / DevTools detail.
- Mcp.bearerPassthru's adaptError remaps via ERROR_TAG_MAP and sets
  cause = e._tag.
- Lifecycle's transientError factory uses _tag: 'Transport' with
  cause = the lifecycle-internal kind (PingTimeout, HealthDegraded,
  ReconnectExhausted).
- Tests updated to assert abstract tags ('Transport', 'Auth').

Conversation.spec.ts: ToolUse → blocked → AwaitingConnector.
- Covers the second `advanceFromEvents` gate point (ToolResult last
  call), which was previously only exercised at the Idle.Ask gate.
- Gates the LLM stub on a deferred so test sequencing is deterministic:
  Ask → WaitingForAssistant → block → release LLM → ToolUse →
  Tool.execute (unknown tool → fail) → ToolResult → AwaitingConnector.

manageConnector: reset health BEFORE flipping to Ready.
- Removes a microtask hazard where a tool call observing the Ready
  transition could see a stale `consecutiveFailures` from a prior
  Disconnected cycle. Free fix; preserves same observable behavior.

Documented limitations (filed as follow-ups):
- qhq-5d0.5 — useConnectors allocates lifecycle fibers during render.
  React abort-before-commit (Suspense unwind / ErrorBoundary catch /
  concurrent-mode discard) skips the cleanup useEffect and leaks
  fibers. Mitigated today by AssistantProvider mounting at app root
  above Suspense boundaries; proper fix defers allocation into
  useEffect at the cost of a Loading state on AssistantAPI.
- qhq-5d0.6 — D24 escalating heartbeat (5s → 30s during Disconnected)
  not implemented. Only the 3-attempt bootstrap budget runs; no
  transport-only probe between attempts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three parallel reviewers (reuse / quality / efficiency) of the qhq-5d0
diff converged on a small set of clear wins. Applied:

- Drop `useConnectorState` — `Actor.useState(connector.state)` already
  does the SubscriptionRef-into-React-state subscription. Two call
  sites (Chat.tsx ConnectorStatusRow, DevTools.tsx ConnectorPanelRow)
  switch over.
- Drop dead `effectJsonSchema` build in `buildConnectorTool`. Effect's
  permissive AnyRecord schema was generated on every descriptor build,
  then immediately overwritten by `descriptor.inputSchema` spread; only
  `description` survived. Inline directly. Drops the AnyRecord constant
  too (unused).
- `resetHealth` helper: skip SubscriptionRef emission when health is
  already INITIAL_HEALTH. Used in the hot paths (per-call success in
  `makeConnectorCallTool`, per-tick success in `runHeartbeat`); the
  state-transition sites in `manageConnector` keep `set` since they're
  at-most-once-per-cycle.
- Slim `manageConnector(config, backend, …)` and `bootstrap(config,
  backend, …)` and `runReconnectBudget(config, backend, …)` — `backend`
  was always `config.backend`, since the qhq-5d0.4 Backend extraction
  collapsed transport into the config field.
- Flatten `ConnectorStatusRow`'s 5-branch JSX (Connecting / Ready /
  Disconnected / Failed-acked / Failed-unacked) into data-driven
  variant + single render. Five copies of `<div className={classes.row}>
  <icon><title><text></div>` → one. Saves ~20 LOC; same a11y semantics.
- Drop orphan jsdoc + stale `// Layer (skeleton)` divider in index.ts.

Tests: 57/57 green under Node 20.

Skipped from review:
- `selectToken` promotion to `containers/Auth/selectors` — touches an
  unrelated module; out of PR scope.
- `BackendErrorTag` collapse — was just added to fix the Mcp-tag leak
  flagged in qhq-5d0.4; collapsing would re-introduce it.
- `Stream.map(stateIsBlocked)` for aggregate predicates — incorrect:
  blocked aggregates across all connectors, not the emitting one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the fixed-schedule reconnect budget (1s/3s/9s, no probing
between attempts) with a probe-driven loop. During Disconnected, ping
at escalating cadence (5s start, ×2 backoff, 30s cap); each successful
ping resets cadence and triggers a bootstrap attempt. Cap at 3
bootstraps. Outer 60s timeout when ping never succeeds → Failed with
synthetic `Transport` error, cause: 'DisconnectedTimeout'.

Why this changes things:
- Before: 13s wall-clock window, no probing, full bootstrap on 1s/3s/9s
  schedule regardless of network state.
- After: probe-driven, so a network blip recovering at e.g. 4s triggers
  a bootstrap immediately rather than waiting for the next sleep tick.
  Up to 60s window if ping stays down (vs. 13s before).

Cadence resets to 5s after each bootstrap attempt (success OR failure)
since the immediately-preceding ping confirmed transport was up;
escalating doesn't reflect reality once that signal exists.

Constants:
- PROBE_CADENCE_INITIAL = 5s
- PROBE_CADENCE_MAX = 30s
- RECONNECT_MAX_ATTEMPTS = 3
- DISCONNECTED_MAX_DURATION = 60s
- HEARTBEAT_TIMEOUT (5s) reused for probe pings

Tests:
- Replaced "reconnect succeeds" → "reconnect succeeds via probe":
  asserts probe-driven path (60s heartbeat → 5s probe → success).
- Replaced "reconnect exhaustion" → split into two tests:
  - "ping never succeeds → DisconnectedTimeout → Failed{!acked}":
    verifies outer 60s timeout fires with the synthetic cause and
    correct ping count under the escalating schedule (2 heartbeat
    + 3 probes at clock 65/75/95).
  - "3 bootstrap failures after successful pings → Failed{!acked}":
    verifies budget exhaustion with the LAST bootstrap error
    preserved (not the synthetic timeout).
- New: "probe cadence escalates 5s → 10s → 20s → 30s" — direct
  schedule verification via interleaved TestClock.adjust + ping count
  assertions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ith `as const`

Webpack/ts-loader inferred each $match callback's `role` literal as
`string` rather than the `'status' | 'alert'` union RowVariant
declares, so the union of branch returns failed to assign to the
declared variable type. Vitest's esbuild path doesn't type-check, so
the regression slipped past local tests.

`as const` on each branch's return narrows role to its literal and
makes the union assign cleanly. Comment notes the divergence so future
edits know to keep the annotation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nl0
Copy link
Copy Markdown
Member Author

nl0 commented Apr 28, 2026

@greptileai have a look pls

nl0 added 3 commits April 28, 2026 13:19
…calls

Effect's HttpClient auto-injects W3C `traceparent` + Zipkin `b3` headers
from the active span (httpClient.js:154; gated on the
`currentTracerPropagation` FiberRef, default true). The platform MCP
server's CORS allow-list doesn't include these, so the browser preflight
returns 400 "Disallowed CORS headers" and the actual POST never fires.

`HttpClient.withTracerPropagation(false)` on the `execute` Effect locally
sets the FiberRef to false for the request, dropping the headers. We
don't have a tracing collector on the receiving end either, so this is
strictly subtractive.
… from healthy states

Two small model-layer fixes surfaced during DevTools polish:

* `buildConnectorTool` was setting `description` on the Tool.Descriptor
  AND splatting it into the inputSchema. Bedrock then sent the same
  string twice — once as `toolSpec.description`, once inside
  `inputSchema.json` — pure context pollution. The schema is now passed
  through raw.

* `connector.retry` only offered to the controls queue, which is only
  drained from `Failed{!acked}` via `awaitRetryControl`. Clicking
  reconnect from Ready or Disconnected went silently nowhere. Now,
  Ready / Disconnected force a reconnect by degrading health to the
  threshold; the Ready loop's heartbeat-vs-threshold race takes the
  threshold branch and `runReconnectWithProbe` runs. Connecting is
  still a no-op (already in flight); Failed stays on the queue path.
…ne link actions

Replaces the single-line `<id> | state | [Retry][Ack]` row with a flat,
header-led layout that scales to multiple connectors and reads cleanly
when the connector is healthy:

  Connector: <title>
  <hint in muted caption>
  <state> · reconnect · acknowledge
  <error detail, if any>

Specifics:
- Drop the "Connectors" group heading and the bordered card around each
  connector — each connector is now a top-level section in the panel.
- Drop the `(<id>)` suffix on the heading; `<title>` carries the
  user-facing label, `<id>` is dev-prefix detail (still visible via the
  `contextContribution` block).
- No "Hint:" / "State:" labels — content is self-explanatory.
- Inline action affordances styled like the existing `MessageAction`
  pattern (cursor:pointer span, opacity 0.7→1 on hover) — grey + 500
  weight, separated from state with `·`. Avoids `M.Link`'s blue
  default and the baseline misalignment of `component="button"`.
- "Retry" → "reconnect" (matches the lifecycle's actual semantics now
  that retry forces a reconnect from healthy states).
- `reconnect` hidden during transient states (Connecting,
  Disconnected{retrying}); `acknowledge` hidden unless `Failed{!acked}`.
- `Disconnected` rendered with `warning.dark` instead of `warning.main`
  for legibility on the panel's paper background.
- `contextContribution` block hidden when nothing to contribute (no
  Ready or Failed{acked} connectors), and no longer auto-expanded.
- Bottom padding belongs at the end of the section (under the
  contextContribution box), not between state and contribution.
fiskus
fiskus previously approved these changes Apr 29, 2026
Comment thread catalog/app/components/Assistant/UI/Chat/Input.tsx Outdated
@nl0
Copy link
Copy Markdown
Member Author

nl0 commented Apr 29, 2026

Thanks @fiskus, @drernie, @sir-sigurd for the reviews. Here's the disposition.

Landing in this PR

  • drernie's Qurator MCP: tighten retry and bootstrap failure handling #4864 — pulling in the retry/bootstrap fix (split transient from retryable; bound initial Connecting bootstrap; retry() from Connecting is a true no-op). Will square the lifecycle-synthesized retryable flag so it matches adaptError's explicit false.
  • drernie's Docs: CHANGELOG + Qurator.md entries for #4840 #4865 — pulling in the CHANGELOG + Qurator.md docs, with the missing "acknowledge" control mentioned in the Connector Status section.
  • fiskus inline (Input.tsx helper severity) — applying as a class-lookup, restructured so warning/error stack on top of the base hint class (specificity via &$hint).
  • fiskus minor (Mcp.ts:421) — defensive ?.toLowerCase() on the content-type lookup.
  • sir-sigurd Tier-1 Add the .pylintrc file #2useActorLayer listener fiber leak on unmount (mirror the cleanup pattern from Actor.useState).
  • sir-sigurd Tier-1 Improve sessions #5 — JSON-RPC response id echo-validation in Mcp.ts rpc().

Explicitly skipping

  • sir-sigurd Tier-1 Print out the server's error messages #8 (kill-switch / feature flag for the platform connector) — the platform connector is the integration layer between Qurator and the stack. Operating without a kill-switch is the intended posture; falling back to "Qurator drives the catalog only" isn't useful.

Deferred — tracked in qhq-5d0.16

Everything else from the three reviews is captured in the post-merge follow-up epic, with reviewer attribution and pointers to existing siblings (qhq-ekc perf, qhq-mfo destructive UX, qhq-5d0.5 scope leak, qhq-5d0.8 merge dup, qhq-5d0.10 PM truncation, qhq-5d0.1 PMS preview parity).

High-level groups deferred:

Greptile threads (the 04-21 P2 inline pair) are obsolete — both targeted PlatformMcpContext/client.ts which was deleted in the Connectors refactor (the rewrite uses @effect/platform/HttpClient + Eff.Schema envelope decoding).

nl0 added 2 commits April 29, 2026 17:51
Pulls in @drernie's child PR addressing two concerns from the multi-agent
review:

- Split BackendError.transient (health signal) from .retryable (retry-once
  eligibility). HTTP non-2xx: transient:true, retryable:false. Network
  failures: both true. read-only-tool retry now checks .retryable.
- Bound initial Connecting bootstrap with a 60s timeoutFail; on hang,
  transition to Failed{acked:false} with cause ConnectingTimeout.
- retry() during Connecting is a true no-op (no stale control token).

Co-Authored-By: drernie <ernest.prabhakar@gmail.com>

* origin/codex/qurator-mcp-risk-fixes:
  Document retryable default for connector errors
  Fix Qurator MCP retry and bootstrap timeout risks
Pulls in @drernie's docs PR with the user-visible documentation that was
missing from #4840: catalog/CHANGELOG.md entries (Added/Changed) and a
new 'Platform Tools via MCP' section + 'Connector Status' subsection in
docs/Catalog/Qurator.md.

Co-Authored-By: drernie <ernest.prabhakar@gmail.com>

* origin/codex/qurator-mcp-changelog:
  Docs: drop nonexistent "acknowledge" control; clarify continue-without flow
  Docs: CHANGELOG + Qurator.md entries for #4840
drernie
drernie previously approved these changes Apr 29, 2026
Copy link
Copy Markdown
Member

@drernie drernie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

nl0 and others added 4 commits April 29, 2026 18:01
The listener fiber is forkDaemon'd inside `start` (decoupled from the
calling fiber's scope), so its lifetime has to be owned explicitly by
the React hook. Without a cleanup, every Assistant remount accumulates
a daemon. Mirrors the existing cleanup pattern in `useState`. Resolves
the long-standing TODO at the top of useActorLayer.

Surfaced by sir-sigurd's multi-agent review of #4840 (Tier-1 #2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Defensive lowercase on the response content-type lookup. @effect/platform
  lowercases header keys, but the value can still arrive in mixed case.
- Validate the JSON-RPC response id against the request id (spec §5).
  Stateless HTTP makes this unlikely to trip in practice, but a proxy or
  buggy server emitting a stale id silently returned the wrong result
  before — now fails as McpProtocolError.
- Map HTTP 401 / 403 to the abstract `Auth` tag (transient: false) instead
  of `Transport`. Token rotation is handled at the redux/auth layer; we
  don't want auth failures to bump the connector health counter and push
  us into a reconnect loop that won't help.

Test fixtures updated to echo the request id rather than hardcoding 'echo'
or 'x', plus two new wire-level tests (id mismatch, 401→Auth).

From the multi-agent review of #4840 (sir-sigurd Tier-1 #5, fiskus minor,
sir-sigurd Tier-2 #16).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a tab is hidden or the laptop is closed, browser timers freeze.
The 30s heartbeat sleep that was scheduled before suspend can fire much
later than its nominal duration on resume — so a connector that lost
the server during suspend stays Ready (and gates blocked tool calls)
for up to 30s + 5s timeout × 2 thresholds + 60s probe before the user
sees it as Disconnected.

Add a wake stream fed by `document.visibilitychange → visible` and
`window.online`. Heartbeat and reconnect-probe sleeps are race-cancelled
on wake so connectivity is re-checked immediately on resume. Single
PubSub per Connectors layer fans out via `Stream.fromPubSub` to
per-connector consumers; DOM listeners attached via `acquireRelease`
and detached on layer release (Assistant unmount).

`wake` is threaded through manageConnector / runHeartbeat /
runReconnectWithProbe; defaults to `Stream.never` so existing tests
(and any direct callers) keep their behavior. `buildService` allocates
the wake once and passes it to each `buildConnectorRuntime`. SSR-safe:
falls back to `Stream.never` if `document` / `window` aren't in scope.

Test added: a Queue-backed wake stream with stub backend; firing the
queue causes the heartbeat to ping immediately, well before the 30s
cadence would elapse.

Surfaced by sir-sigurd's review (Tier-2 #11 — was deferred to qhq-ekc;
brought forward as part of this connectivity-on-resume work).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the nested ternary on `helperSeverity` with
`cx(classes.hint, classes[severity])`. The severity styles use `&$hint`
parent selectors so the override compiles to `.warning.hint` (specificity
0,2,0) and wins over the base `.hint` (0,1,0) without resorting to
`extend`. Adding a third severity later is one rule, no new branch.

From fiskus's review of #4840 (also flagged as a nit in sir-sigurd's
multi-agent pass).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nl0 nl0 dismissed stale reviews from drernie and fiskus via a8c6673 April 29, 2026 16:08
@nl0
Copy link
Copy Markdown
Member Author

nl0 commented Apr 29, 2026

Status update — pushed 95f2148c..a8c6673f (10 commits ahead, all CI should re-run).

Landed

Commit Source
ead37665 Merge #4864 drernie's retry/bootstrap fix
296c6734 Merge #4865 drernie's CHANGELOG + Qurator.md docs
208ace11 useActorLayer interrupts listener fiber on unmount sir-sigurd Tier-1 #2
db43b2e0 Mcp: content-type lowercase + JSON-RPC id echo + 401/403→Auth fiskus minor + sir-sigurd Tier-1 #5 + Tier-2 #16
6e231266 Connectors: race heartbeat + probe sleeps against page-activity wake sir-sigurd Tier-2 #11 (pulled forward from qhq-ekc)
a8c6673f Chat/Input: helper severity classes via lookup fiskus inline

The wake-stream change is broader than originally planned for this session — it addresses the laptop-closed / tab-hidden case where browser timers freeze and the heartbeat sleep can fire arbitrarily late on resume, leaving Qurator gated for up to ~95s before reflecting reality. Now wakes immediately on visibilitychange→visible / online.

Tests

npm run test:only -- app/components/Assistant/74 / 74 green (was 65 pre-session). Added: HTTP id-mismatch (Mcp.spec.ts), 401→Auth mapping (Mcp.spec.ts), wake-cancels-heartbeat (Connectors.spec.ts). Plus the 4 from #4864.

Still deferred → tracked in qhq-5d0.16

Updated the follow-up bead with done-marks on the items landed today. Notable remaining: Tier-1 #4 (READ_GUIDANCE conditional), Tier-2 #9-10 (Sentry/transition logs), #13-14 (a11y / hooks-in-loop), #17-18 (Bedrock schema sanity, image size cap), Tier-3 module restructure / test gaps. None are merge-blockers.

Greptile's two 04-21 inline P2s remain obsolete — they targeted the now-deleted PlatformMcpContext/client.ts.

nl0 and others added 2 commits April 29, 2026 18:26
…ents

Strip references to internal design-document anchors (D1..D33) and bead
IDs (qhq-*) from code comments and docstrings. These were author
navigation hints into a private design doc; they don't help external
contributors reading the catalog source. The remaining prose stands
on its own — the WHY is preserved, the link to the design just isn't.

Also trims a few comments that just restate the code (e.g. "loop back
to Ready") and consolidates over-long doc blocks.

No behavior change. 74/74 Assistant tests green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… comments

Multi-agent simplify-pass over the PR. Scoped, low-risk fixes:

- Mcp.ts adaptError: drop the special case for HTTP 401/403 → Auth.
  The wire layer now raises McpAuthError directly when post() sees
  401/403, and adaptError stays a clean tag-table mapping. McpAuthError
  gains an optional `detail` so the wire-side instance carries the
  HTTP status; the existing "no session token" path keeps its default
  message.

- Connectors.ts: extract pingOrTimeout helper (heartbeat + reconnect
  probe shared the same ping → timeoutFail → either block).

- Connectors.ts: bumpHealth helper caps consecutiveFailures at
  HEALTH_THRESHOLD + 1 and skips emission when nothing changed —
  prevents a sustained outage from spamming subscribers tick after
  tick. Replaces the inline SubscriptionRef.update at heartbeat and
  callTool sites.

- Connectors.ts: expose stateIsUnready predicate (s._tag !== 'Ready')
  alongside stateIsBlocked. Chat.tsx now uses it instead of the raw
  _tag literal at two call sites (helperSeverityFor + helperLines
  filter).

- Comment trims across Connectors.ts / Mcp.ts / Tool.ts / Input.tsx /
  DevTools.tsx / Actor.ts: shorter docstrings on retry, buildConnectorTool,
  rpc-id check, withTracerPropagation, severity classes, and the
  useActorLayer cleanup. Drops what restated the code; keeps the WHY.

74/74 Assistant tests still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nl0
Copy link
Copy Markdown
Member Author

nl0 commented Apr 29, 2026

Cleanup pass — pushed a8c6673f..5a97edc2 (2 commits).

d1a795a9 — strip internal design-doc anchors and bead refs from comments. They were author navigation hints into a private design doc; not useful for catalog contributors. Trimmed a few "narrate the code" comments in the same pass. -41 lines net, no behavior change.

5a97edc2 — multi-agent simplify pass:

  • Mcp.ts adaptError: stop special-casing HTTP 401/403 in the connector adapter. The wire layer now raises McpAuthError directly on 401/403, so adaptError collapses back to a clean tag-table mapping. McpAuthError gains an optional detail field; existing "no session token" message preserved as the default.
  • Extract pingOrTimeout helper — heartbeat and reconnect-probe shared the same ping → timeoutFail → either block.
  • Add bumpHealth helper that caps consecutiveFailures at HEALTH_THRESHOLD + 1 and skips emission when nothing changed. Prevents a sustained outage from spamming subscribers every tick.
  • Expose stateIsUnready(s) = s._tag !== 'Ready' predicate; replaces two raw _tag literals in Chat.tsx.
  • Comment trims (~22 lines) across Connectors.ts, Mcp.ts, Tool.ts, Input.tsx, DevTools.tsx, Actor.ts.

Tests: 74/74 Assistant green throughout.

Skipped (per agent triage): wake-stream → Context.Tag (over-engineering for one ambient stream), contextContribution memoization + useIsBlocked dedupe + DevTools debounce (all qhq-ekc perf-hardening territory), connector-id constant extraction (cross-file, deferred), FetchHttpClient.layer lift-out (cost is negligible — layer is a static Layer.succeed).

@nl0
Copy link
Copy Markdown
Member Author

nl0 commented Apr 29, 2026

@greptileai one more time pls

Comment thread catalog/app/components/Assistant/Model/Connectors/Connectors.ts
nl0 and others added 3 commits April 29, 2026 19:45
Greptile P1 on commit 5a97edc flagged a "race in awaitUnblocked":
the conversation actor reads `isBlocked` synchronously and forks a
waiter; if connectors flip to a non-blocking state between the read
and the waiter subscribing to `state.changes`, the suggestion was
that the waiter could miss the transition.

Sir-sigurd's earlier multi-agent review already triaged this as
"did not survive verification" — `SubscriptionRef.changes` replays
the current value on subscribe, so the merged stream's first
emission re-evaluates `isBlocked` against the post-flip state and
take(1) exits immediately. Adding:

- An inline comment on the `awaitUnblocked` definition documenting
  the no-race rationale.
- A spec that pins the scenario explicitly: bring a connector to
  Ready first, then call `awaitUnblocked` and verify it returns.

40/40 Connectors tests green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cut the multi-line rationale to one line and dropped the verbose
test scaffolding — the test name + the inline replay-semantics line
are enough.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without `expect(isBlocked).toBe(false)` before the awaitUnblocked
call, the test only proves "doesn't throw" — making the precondition
explicit guards against an awaitState mismatch silently passing the
test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nl0
Copy link
Copy Markdown
Member Author

nl0 commented Apr 29, 2026

@greptileai reassess

@nl0 nl0 enabled auto-merge April 29, 2026 17:59
@nl0 nl0 requested review from drernie and fiskus April 29, 2026 17:59
@nl0 nl0 added this pull request to the merge queue Apr 29, 2026
Merged via the queue into master with commit 958d310 Apr 29, 2026
46 checks passed
@nl0 nl0 deleted the feat/qurator-mcp branch April 29, 2026 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants