feat(agents-server-ui): stream model reasoning into the UI by kevin-dp · Pull Request #4508 · electric-sql/electric

kevin-dp · 2026-06-04T13:36:02Z

Summary

While the model is "thinking" (Anthropic extended thinking, DeepSeek-R1 reasoning, Moonshot K2, OpenAI Responses summaries) the agent response now shows the reasoning text faded above the answer, with the existing Thinking shimmer heading plus elapsed-time ticker. Once the reasoning settles it collapses to ▸ Thought for 12s — click to expand. Multiple reasoning rows per run render independently in order (one per LLM step in tool-using turns). UX intentionally mirrors Claude Code + OpenCode patterns.

Implementation (end-to-end)

Schema — reasoning row gains run_id, encrypted (Anthropic redacted-thinking opaque payload, must round-trip back to the model verbatim), and summary_title (extracted at write time). New reasoningDeltas collection mirrors textDeltas. Strictly additive.
Bridge — OutboundBridge gains onReasoningStart / onReasoningDelta / onReasoningEnd, parallel to the text path. Reasoning counter added to OutboundIdSeed.
Adapter — pi-adapter.ts routes pi-ai's thinking_start / thinking_delta / thinking_end events to the bridge. Parses a **Title**\n\n<body> heading once at write time (OpenAI Responses; no-op for Anthropic / DeepSeek / Moonshot). Defensive: handles late thinking_delta without a preceding thinking_start, and closes an open reasoning row on message_end (e.g. provider abort).
Timeline — Live reasoning: Collection<EntityTimelineReasoningItem> on EntityTimelineRunRow, content built via the same delta-join pattern as EntityTimelineTextItem.content.
UI — New <ReasoningSection> renders above items in AgentResponseLive:
- Live: faded markdown via Streamdown with ThinkingIndicator heading + summary title + elapsed-time ticker
- Settled: ▸ Thought for Ns with click-to-expand. Closure duration snapshotted from Date.now() - timestamp using the same sawStreamingRef trick from the elapsed-time PR — accurate for in-session settles, stays a bare Thought for rows already settled on first mount (no real end timestamp available client-side).
- Redacted: Anthropic safety-filter payloads render ⊘ Reasoning redacted by provider safety filters. The encrypted payload is still persisted server-side so the model gets it back on the next turn.

Reference

Patterns informed by reading OpenCode's reasoning implementation:

3-event streaming protocol (reasoning-start / reasoning-delta / reasoning-end)
ReasoningPart storage shape including encrypted for Anthropic round-trip
reasoningSummary() headline parser (5-line regex, OpenAI Responses only)
Collapsed-by-default UX with click-to-expand

Test plan

pnpm typecheck clean in agents-runtime + agents-server-ui
pnpm test outbound-bridge pi-adapter entity-timeline in agents-runtime (95 passed: 18 bridge + 21 adapter + 56 timeline)
pnpm test in agents-server-ui (66 passed)
pnpm -C packages/agents-runtime build — dist artifacts emit cleanly
Manual: prompt Anthropic Claude with extended-thinking enabled; verify streaming reasoning appears faded above the answer with elapsed ticker, then collapses to Thought for Ns on settle
Manual: multi-step tool-using turn; verify each step's reasoning renders as a separate collapsible row

Notes

Cached AgentResponse (the non-Live path used for old scrollback sections) doesn't yet surface reasoning — historical rows recorded before this PR lack the data anyway. Follow-up if we discover sessions where this matters.
The pre-existing runtime-dsl.test.ts 401 failures (and dispatch-policy-routing.test.ts 500 failures) reproduce identically on clean main and were not introduced by this PR.

🤖 Generated with Claude Code

While the model is "thinking" (Anthropic extended thinking, DeepSeek-R1 reasoning_content, Moonshot K2, OpenAI Responses summaries) the agent response now shows the reasoning text faded above the answer, with the existing `Thinking` shimmer heading + elapsed-time ticker. Once the reasoning settles, it collapses to `▸ Thought for 12s` — click to expand. Multiple reasoning rows per run render independently in order (one per LLM step in tool-using turns). End-to-end plumbing: - Schema: `reasoning` row gains `run_id`, `encrypted` (Anthropic redacted blocks must round-trip back to the model), and `summary_title` (extracted at write time). New `reasoningDeltas` collection mirrors `textDeltas` for streamed content. - Bridge: `OutboundBridge` gains `onReasoningStart` / `onReasoningDelta` / `onReasoningEnd`, parallel to text. - Adapter: `pi-adapter.ts` routes `thinking_start` / `thinking_delta` / `thinking_end` from pi-ai. Parses a `**Title**\n\n<body>` heading once at write time (OpenAI Responses; no-op for others). - Timeline: live `reasoning: Collection<EntityTimelineReasoningItem>` on `EntityTimelineRunRow`, content built via delta-join. - UI: new `<ReasoningSection>` renders above items in `AgentResponseLive`. Streamdown body, click-to-expand on settle, redacted-block placeholder for opaque Anthropic payloads.

github-actions · 2026-06-04T13:36:48Z

Electric Agents Desktop Builds

Build artifacts for commit aef3aab.

Platform	Status	Artifact
macOS Apple Silicon	Passed	DMG
macOS Intel	Passed	DMG
Windows x64	Passed	Installer
Linux x64	Passed	AppImage / deb

Workflow run

codecov · 2026-06-04T13:38:13Z

Codecov Report

❌ Patch coverage is 46.00000% with 135 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@12f1d17). Learn more about missing BASE report.
⚠️ Report is 28 commits behind head on main.

Files with missing lines	Patch %	Lines
packages/agents-runtime/src/outbound-bridge.ts	24.13%	44 Missing ⚠️
packages/agents-runtime/src/pi-adapter.ts	10.86%	41 Missing ⚠️
...ents-server-ui/src/components/ReasoningSection.tsx	0.00%	39 Missing ⚠️
.../agents-server-ui/src/components/AgentResponse.tsx	0.00%	9 Missing ⚠️
packages/agents/src/model-catalog.ts	93.54%	2 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #4508   +/-   ##
=======================================
  Coverage        ?   56.86%           
=======================================
  Files           ?      359           
  Lines           ?    39304           
  Branches        ?    11049           
=======================================
  Hits            ?    22351           
  Misses          ?    16882           
  Partials        ?       71

Flag	Coverage Δ
packages/agents	`71.14% <93.54%> (?)`
packages/agents-mcp	`77.54% <ø> (?)`
packages/agents-mobile	`66.92% <ø> (?)`
packages/agents-runtime	`80.88% <50.29%> (?)`
packages/agents-server	`73.98% <ø> (?)`
packages/agents-server-ui	`6.19% <0.00%> (?)`
packages/electric-ax	`46.42% <ø> (?)`
packages/experimental	`87.73% <ø> (?)`
packages/react-hooks	`86.48% <ø> (?)`
packages/start	`82.83% <ø> (?)`
packages/typescript-client	`91.83% <ø> (?)`
packages/y-electric	`56.05% <ø> (?)`
typescript	`56.86% <46.00%> (?)`
unit-tests	`56.86% <46.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2026-06-04T13:40:19Z

Electric Agents Mobile Build

Local mobile checks ran for commit aef3aab.

The EAS Android preview build was skipped because the mobile-eas-build label is not present.
Add the mobile-eas-build label to this PR to produce an installable preview build.

Workflow run

Previously `withProviderPayloadDefaults` short-circuited for any provider other than OpenAI / OpenAI-Codex, so picking Claude with a `reasoningEffort` higher than `auto` produced no effect — no `thinking` parameter was added to the request, so Anthropic ran in standard mode and the model emitted no `thinking_delta` events. The inbound reasoning plumbing landed in the same PR was correct but unreachable from Anthropic without this. Now: when the chosen model is Anthropic-capable for reasoning AND `reasoningEffort` is explicit (minimal/low/medium/high), inject thinking: { type: "enabled", budget_tokens: <by effort> } into the payload. Budgets follow Anthropic's docs (≥ 1024 floor): minimal=1024, low=2048, medium=8192, high=24576. `auto` stays opt-out of thinking so default sessions don't silently incur the extra reasoning tokens.

KyleAMathews

Lovely! Could you add a screenshot of the UI to the PR body?

Three latent bugs in the reasoning-content branch that together made extended thinking and the assistant's answer text fail to render: 1. **Alias collision in the timeline live query** — `entity-timeline.ts` had two correlated sub-queries (one for `items.text.content`, one for `reasoning.content`) both using `chunk` as the `from({...})` alias. TanStack DB silently mis-bound the correlation when both were active in the same run projection, so `items.text.content` came back as an empty string even though the deltas were present in `db.collections.textDeltas`. Reasoning won the binding; the answer didn't render at all. Fix: rename the inner alias to `textChunk`, and hoist the union row's text fields to top-level scalars (`text_key`, `text_run_id`, …) so the correlation references a top-level field instead of a nested `item.text.key` (also a source of empty joins). 2. **Anthropic thinking always-on instead of opt-in** — `withProviderPayloadDefaults` short-circuited for Anthropic when `reasoningEffort` was `auto`, so no `thinking` parameter ever reached the API. The OpenAI branch already defaulted `auto` to `minimal`; Anthropic now does the same (1024-token budget). `low` / `medium` / `high` scale the budget exactly as before. 3. **Anthropic `thinking` merge order** — pi-ai writes `thinking: { type: "disabled" }` into the request body by default. Our `onPayload` was merging `existingThinking` _last_, so the default `type: "disabled"` clobbered our `type: "enabled"` and the API rejected `budget_tokens` with `thinking.disabled.budget_tokens: Extra inputs are not permitted`. Spread `existingThinking` first now, then `type` + `budget_tokens`. Tests: - `entity-timeline.test.ts` — regression test exercises `createEntityTimelineQuery` end-to-end with text and reasoning rows in the same run; fails on the alias collision, passes with the rename + flat-field projection. - `model-catalog.test.ts` — adds Anthropic-side coverage that mirrors the existing OpenAI tests: always-on minimal budget on `auto`, scaled budget on explicit effort, and `type: disabled` override for pre-existing `thinking` in the payload. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eltas The reasoning sub-collection's `content` field — projected via `concat(toArray(<correlated delta-join>))` — went stale in the running app after the row's status flipped to `completed`, surfacing `content: null` in the live query even though the deltas were still present in the local DB. The expand-thought-block view rendered an empty body until the user navigated away and back (forcing a fresh live-query subscription), at which point the join evaluated cleanly. Unit tests for the same projection pattern all pass — the bug only reproduces in the running app, against an established live-query graph with overlapping text/reasoning subscriptions. The sub-query itself is correct (data is there after a fresh subscription), but something about the long-lived subscription state makes the correlated row binding stale. Sidestep the unreliable projection entirely: - **Timeline query** — drop the `content` field from `EntityTimelineReasoningItem`. Expose `run.reasoningDeltas` as a parallel sub-collection (mirroring `run.reasoning`), surfacing the raw deltas keyed by `reasoning_id`. - **UI** — `AgentResponseLive` subscribes to both `run.reasoning` and `run.reasoningDeltas`, builds a `Map<reasoning_id, content>` from the deltas client-side, and merges it onto the reasoning rows before handing them to `<ReasoningSection>`. Reactive on every delta arrival, no stale state. - **State lift** — `expanded` for the collapsed "Thought for Ns" toggle moves from `ReasoningEntryView` (per-entry) up to `ReasoningSection` (keyed by `entry.key`), so the user's choice survives any spurious unmount of the entry view (virtualizer measurement passes, brief entries-empty states, etc.). Tests: - New regressions in `entity-timeline.test.ts` exercise the deltas sub-collection with the same shape as the failing production scenario: reasoning + text together, multi-step run-row updates, status transitions. Follow-up: investigate why the original correlated sub-query goes stale only against long-lived live-query graphs (passes in tests). The `content` projection has been left commented-out in case we want to restore it after fixing the underlying TanStack DB issue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The original `reasoning.content` projection used `concat(toArray(<correlated delta-join>))`, which TanStack DB compiles to a `buildIncludesSubquery(..., 'concat')` node — a specialized differential-dataflow operator that incrementally maintains a string-concatenation of a child query's projection. Unit tests of the same projection shape pass cleanly: a fresh `createLiveQueryCollection` evaluates the join correctly on initial preload, and again after status flips. Tests do not reproduce the production failure mode (long-lived subscription where `content` silently goes from populated → null after the row's status flips, recovering only after a full live-query teardown). Leaving a placeholder test as a marker — when we have a repro, drop the body in here and restore the `content` field in `entity-timeline.ts:buildEntityTimelineQuery`. The current fix sidesteps the issue by exposing `run.reasoningDeltas` and assembling content client-side, which is reliable but bypasses what should be a working server-side projection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Restore the original nested-text shape on \`runItemsSource\` — \`text: caseWhen(text.key, {...})\` and \`textContent: concat(toArray(...))\` projected together on the union row — and undo the flat-scalar hoist (\`text_key\`, \`text_run_id\`, \`text_order\`, \`text_status\`). The \`textChunk\` alias on the delta-join stays, since that's the load-bearing change that actually fixed the original \`chunk\` alias collision with the reasoning sub-query. When fixing the original alias-collision bug I made two changes in one commit: 1. Renamed the text delta-join alias \`chunk\` → \`textChunk\` so it no longer collided with the \`chunk\` used in reasoning content. 2. Hoisted text fields to flat scalars on the union row so the join could move out of \`runItemsSource\`'s select and into the items consumer's select. I never bisected the two. Turns out (1) alone is sufficient — the nested \`text: caseWhen(text.key, {...})\` + co-located \`textContent\` projection works fine once the alias collision is gone. The flat- scalar hoist was unnecessary churn that just made the code harder to read for no behavioral benefit. Tested by reverting (2), running unit tests (60 still pass), and verifying in the running app that text content still streams in and renders correctly through a full Claude exchange. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ection Reverts the client-side `run.reasoningDeltas` workaround in favor of the server-side `concat(toArray(...))` projection on `run.reasoning.content`. Currently broken in production against `@tanstack/db@0.6.7` — documented in `packages/agents-runtime/test/entity-timeline.test.ts`'s `reasoning content remains populated after status flips to completed` and friends. Unit tests against the projection pass cleanly; the bug only surfaces in a long-lived stream-backed live query after the parent row's `.update()`, with the field silently becoming `null` even though deltas are present in the local DB. A fresh subscription (navigate-away + back, or reload) recovers. Holding this branch as a draft PR so the work isn't lost. Merge once TanStack DB ships an upstream fix that makes the placeholder tests pass against a long-lived production live query. Diff vs `kevin/reasoning-content`: - `entity-timeline.ts` — add `content: concat(toArray(<delta-join>))` back to `reasoning.select(...)`, drop the parallel `reasoningDeltas` sub-collection. Alias stays `reasoningChunk` (not the generic `chunk`) to avoid the alias-collision class of bug. - `EntityTimelineReasoningItem` — `content: string` reinstated; `EntityTimelineReasoningDeltaItem` removed. - `client.ts` — drop `EntityTimelineReasoningDeltaItem` export. - `AgentResponseLive` — drop the `run.reasoningDeltas` subscription + client-side concat; `reasoningEntries` reads `content` straight off the projected row. - Tests — three reasoning-content tests assert `reasoning[0].content` (rather than concatenating raw deltas). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

netlify · 2026-06-09T08:01:41Z

✅ Deploy Preview for electric-next ready!

Name	Link
🔨 Latest commit	`7d8ef81`
🔍 Latest deploy log	https://app.netlify.com/projects/electric-next/deploys/6a27c7654807820008d20557
😎 Deploy Preview	https://deploy-preview-4508--electric-next.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Tracks down and fixes the bug that's been driving the client-side-concat workaround in #4508 and blocking #4532. ## Root cause TanStack DB's "includes" — fields whose value is a sub-query like \`concat(toArray(...))\` — are deferred. A row carrying an include arrives with the field set to \`null\` and a hidden \`Symbol(includesRouting)\` marker describing how to compute it. The include is only materialized when something downstream reads it *in the right way*. The empirical rule (figured out via DevTools probes — \`.toArray\` on the sub-collection always showed the populated string, \`useLiveQuery\` output had \`content: null\`): **An include is materialized only when it's referenced inside a \`caseWhen\` object body in a downstream \`.select(...)\`. A bare top-level reference doesn't trigger it — the include is just aliased forward, still deferred.** This is why \`items.text.content\` has always worked and reasoning hasn't. The items consumer derefs \`item.textContent\` inside the \`text: caseWhen(item.text.key, { ..., content: item.textContent })\` body. The reasoning consumer had \`content: concat(toArray(...))\` (or, after the source/consumer split, \`content: r.reasoningContent\`) at the top level of its select. useLiveQuery handed the row to React with \`content: null\`. ## Fix Wrap the include reference inside a \`caseWhen\` object body, mirroring items: \`\`\`ts reasoning: q .from({ r: runReasoningSource }) ... .select(({ r }) => ({ key: r.key, run_id: r.run_id, order: r.order, status: r.status, body: caseWhen(r.key, { content: r.reasoningContent, }), summary_title: r.summary_title, encrypted: r.encrypted, })) \`\`\` \`r.key\` is always truthy on a real row, so the caseWhen is effectively unconditional — its only purpose is being an object body that forces the include reference to materialize. UI reads \`entry.body?.content\` (via the type) and \`AgentResponseLive\` maps it back into a flat \`content: string\` on \`ReasoningEntry\` so \`ReasoningSection\`'s API is unchanged. This drops the need for the client-side concat workaround that was the original target of #4532. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kevin-dp · 2026-06-09T10:13:41Z

@KyleAMathews here are some screenshots showing how it displays while it's thinking and how it displays when it's done thinking (the "Thought for 2s" block is expandable on click).

The entity-stream-db mock omitted the reasoning and reasoningDeltas collections, so loadOutboundIdSeed crashed when reading db.collections.reasoning.toArray under three process-wake scenarios. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

KyleAMathews approved these changes Jun 4, 2026

View reviewed changes

kevin-dp and others added 3 commits June 8, 2026 14:53

kevin-dp mentioned this pull request Jun 8, 2026

Restore server-side reasoning content projection (pending upstream TanStack DB fix) #4532

Merged

3 tasks

kevin-dp and others added 2 commits June 9, 2026 09:57

kevin-dp commented Jun 9, 2026

View reviewed changes

Comment thread packages/agents-runtime/src/outbound-bridge.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agents-server-ui): stream model reasoning into the UI#4508

feat(agents-server-ui): stream model reasoning into the UI#4508
kevin-dp wants to merge 9 commits into
mainfrom
kevin/reasoning-content

kevin-dp commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

KyleAMathews left a comment

Uh oh!

netlify Bot commented Jun 9, 2026

Uh oh!

Uh oh!

kevin-dp commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kevin-dp commented Jun 4, 2026

Summary

Implementation (end-to-end)

Reference

Test plan

Notes

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Electric Agents Desktop Builds

Uh oh!

codecov Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Electric Agents Mobile Build

Uh oh!

KyleAMathews left a comment

Choose a reason for hiding this comment

Uh oh!

netlify Bot commented Jun 9, 2026

✅ Deploy Preview for electric-next ready!

Uh oh!

Uh oh!

kevin-dp commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 4, 2026 •

edited

Loading

codecov Bot commented Jun 4, 2026 •

edited

Loading

github-actions Bot commented Jun 4, 2026 •

edited

Loading