feat(agents-server-ui): stream model reasoning into the UI#4508
feat(agents-server-ui): stream model reasoning into the UI#4508kevin-dp wants to merge 9 commits into
Conversation
While the model is "thinking" (Anthropic extended thinking, DeepSeek-R1 reasoning_content, Moonshot K2, OpenAI Responses summaries) the agent response now shows the reasoning text faded above the answer, with the existing `Thinking` shimmer heading + elapsed-time ticker. Once the reasoning settles, it collapses to `▸ Thought for 12s` — click to expand. Multiple reasoning rows per run render independently in order (one per LLM step in tool-using turns). End-to-end plumbing: - Schema: `reasoning` row gains `run_id`, `encrypted` (Anthropic redacted blocks must round-trip back to the model), and `summary_title` (extracted at write time). New `reasoningDeltas` collection mirrors `textDeltas` for streamed content. - Bridge: `OutboundBridge` gains `onReasoningStart` / `onReasoningDelta` / `onReasoningEnd`, parallel to text. - Adapter: `pi-adapter.ts` routes `thinking_start` / `thinking_delta` / `thinking_end` from pi-ai. Parses a `**Title**\n\n<body>` heading once at write time (OpenAI Responses; no-op for others). - Timeline: live `reasoning: Collection<EntityTimelineReasoningItem>` on `EntityTimelineRunRow`, content built via delta-join. - UI: new `<ReasoningSection>` renders above items in `AgentResponseLive`. Streamdown body, click-to-expand on settle, redacted-block placeholder for opaque Anthropic payloads.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4508 +/- ##
=======================================
Coverage ? 56.86%
=======================================
Files ? 359
Lines ? 39304
Branches ? 11049
=======================================
Hits ? 22351
Misses ? 16882
Partials ? 71
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Electric Agents Mobile BuildLocal mobile checks ran for commit The EAS Android preview build was skipped because the |
Previously `withProviderPayloadDefaults` short-circuited for any
provider other than OpenAI / OpenAI-Codex, so picking Claude with a
`reasoningEffort` higher than `auto` produced no effect — no
`thinking` parameter was added to the request, so Anthropic ran in
standard mode and the model emitted no `thinking_delta` events. The
inbound reasoning plumbing landed in the same PR was correct but
unreachable from Anthropic without this.
Now: when the chosen model is Anthropic-capable for reasoning AND
`reasoningEffort` is explicit (minimal/low/medium/high), inject
thinking: { type: "enabled", budget_tokens: <by effort> }
into the payload. Budgets follow Anthropic's docs (≥ 1024 floor):
minimal=1024, low=2048, medium=8192, high=24576. `auto` stays opt-out
of thinking so default sessions don't silently incur the extra
reasoning tokens.
KyleAMathews
left a comment
There was a problem hiding this comment.
Lovely! Could you add a screenshot of the UI to the PR body?
Three latent bugs in the reasoning-content branch that together made
extended thinking and the assistant's answer text fail to render:
1. **Alias collision in the timeline live query** —
`entity-timeline.ts` had two correlated sub-queries (one for
`items.text.content`, one for `reasoning.content`) both using
`chunk` as the `from({...})` alias. TanStack DB silently
mis-bound the correlation when both were active in the same run
projection, so `items.text.content` came back as an empty string
even though the deltas were present in `db.collections.textDeltas`.
Reasoning won the binding; the answer didn't render at all.
Fix: rename the inner alias to `textChunk`, and hoist the union
row's text fields to top-level scalars (`text_key`, `text_run_id`,
…) so the correlation references a top-level field instead of a
nested `item.text.key` (also a source of empty joins).
2. **Anthropic thinking always-on instead of opt-in** —
`withProviderPayloadDefaults` short-circuited for Anthropic when
`reasoningEffort` was `auto`, so no `thinking` parameter ever
reached the API. The OpenAI branch already defaulted `auto` to
`minimal`; Anthropic now does the same (1024-token budget). `low`
/ `medium` / `high` scale the budget exactly as before.
3. **Anthropic `thinking` merge order** — pi-ai writes
`thinking: { type: "disabled" }` into the request body by default.
Our `onPayload` was merging `existingThinking` _last_, so the
default `type: "disabled"` clobbered our `type: "enabled"` and
the API rejected `budget_tokens` with
`thinking.disabled.budget_tokens: Extra inputs are not permitted`.
Spread `existingThinking` first now, then `type` + `budget_tokens`.
Tests:
- `entity-timeline.test.ts` — regression test exercises
`createEntityTimelineQuery` end-to-end with text and reasoning rows
in the same run; fails on the alias collision, passes with the
rename + flat-field projection.
- `model-catalog.test.ts` — adds Anthropic-side coverage that mirrors
the existing OpenAI tests: always-on minimal budget on `auto`,
scaled budget on explicit effort, and `type: disabled` override
for pre-existing `thinking` in the payload.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eltas The reasoning sub-collection's `content` field — projected via `concat(toArray(<correlated delta-join>))` — went stale in the running app after the row's status flipped to `completed`, surfacing `content: null` in the live query even though the deltas were still present in the local DB. The expand-thought-block view rendered an empty body until the user navigated away and back (forcing a fresh live-query subscription), at which point the join evaluated cleanly. Unit tests for the same projection pattern all pass — the bug only reproduces in the running app, against an established live-query graph with overlapping text/reasoning subscriptions. The sub-query itself is correct (data is there after a fresh subscription), but something about the long-lived subscription state makes the correlated row binding stale. Sidestep the unreliable projection entirely: - **Timeline query** — drop the `content` field from `EntityTimelineReasoningItem`. Expose `run.reasoningDeltas` as a parallel sub-collection (mirroring `run.reasoning`), surfacing the raw deltas keyed by `reasoning_id`. - **UI** — `AgentResponseLive` subscribes to both `run.reasoning` and `run.reasoningDeltas`, builds a `Map<reasoning_id, content>` from the deltas client-side, and merges it onto the reasoning rows before handing them to `<ReasoningSection>`. Reactive on every delta arrival, no stale state. - **State lift** — `expanded` for the collapsed "Thought for Ns" toggle moves from `ReasoningEntryView` (per-entry) up to `ReasoningSection` (keyed by `entry.key`), so the user's choice survives any spurious unmount of the entry view (virtualizer measurement passes, brief entries-empty states, etc.). Tests: - New regressions in `entity-timeline.test.ts` exercise the deltas sub-collection with the same shape as the failing production scenario: reasoning + text together, multi-step run-row updates, status transitions. Follow-up: investigate why the original correlated sub-query goes stale only against long-lived live-query graphs (passes in tests). The `content` projection has been left commented-out in case we want to restore it after fixing the underlying TanStack DB issue. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The original `reasoning.content` projection used `concat(toArray(<correlated delta-join>))`, which TanStack DB compiles to a `buildIncludesSubquery(..., 'concat')` node — a specialized differential-dataflow operator that incrementally maintains a string-concatenation of a child query's projection. Unit tests of the same projection shape pass cleanly: a fresh `createLiveQueryCollection` evaluates the join correctly on initial preload, and again after status flips. Tests do not reproduce the production failure mode (long-lived subscription where `content` silently goes from populated → null after the row's status flips, recovering only after a full live-query teardown). Leaving a placeholder test as a marker — when we have a repro, drop the body in here and restore the `content` field in `entity-timeline.ts:buildEntityTimelineQuery`. The current fix sidesteps the issue by exposing `run.reasoningDeltas` and assembling content client-side, which is reliable but bypasses what should be a working server-side projection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restore the original nested-text shape on \`runItemsSource\` —
\`text: caseWhen(text.key, {...})\` and \`textContent: concat(toArray(...))\`
projected together on the union row — and undo the flat-scalar
hoist (\`text_key\`, \`text_run_id\`, \`text_order\`, \`text_status\`).
The \`textChunk\` alias on the delta-join stays, since that's the
load-bearing change that actually fixed the original \`chunk\`
alias collision with the reasoning sub-query.
When fixing the original alias-collision bug I made two changes in
one commit:
1. Renamed the text delta-join alias \`chunk\` → \`textChunk\` so it
no longer collided with the \`chunk\` used in reasoning content.
2. Hoisted text fields to flat scalars on the union row so the join
could move out of \`runItemsSource\`'s select and into the items
consumer's select.
I never bisected the two. Turns out (1) alone is sufficient — the
nested \`text: caseWhen(text.key, {...})\` + co-located \`textContent\`
projection works fine once the alias collision is gone. The flat-
scalar hoist was unnecessary churn that just made the code harder
to read for no behavioral benefit.
Tested by reverting (2), running unit tests (60 still pass), and
verifying in the running app that text content still streams in
and renders correctly through a full Claude exchange.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ection Reverts the client-side `run.reasoningDeltas` workaround in favor of the server-side `concat(toArray(...))` projection on `run.reasoning.content`. Currently broken in production against `@tanstack/db@0.6.7` — documented in `packages/agents-runtime/test/entity-timeline.test.ts`'s `reasoning content remains populated after status flips to completed` and friends. Unit tests against the projection pass cleanly; the bug only surfaces in a long-lived stream-backed live query after the parent row's `.update()`, with the field silently becoming `null` even though deltas are present in the local DB. A fresh subscription (navigate-away + back, or reload) recovers. Holding this branch as a draft PR so the work isn't lost. Merge once TanStack DB ships an upstream fix that makes the placeholder tests pass against a long-lived production live query. Diff vs `kevin/reasoning-content`: - `entity-timeline.ts` — add `content: concat(toArray(<delta-join>))` back to `reasoning.select(...)`, drop the parallel `reasoningDeltas` sub-collection. Alias stays `reasoningChunk` (not the generic `chunk`) to avoid the alias-collision class of bug. - `EntityTimelineReasoningItem` — `content: string` reinstated; `EntityTimelineReasoningDeltaItem` removed. - `client.ts` — drop `EntityTimelineReasoningDeltaItem` export. - `AgentResponseLive` — drop the `run.reasoningDeltas` subscription + client-side concat; `reasoningEntries` reads `content` straight off the projected row. - Tests — three reasoning-content tests assert `reasoning[0].content` (rather than concatenating raw deltas). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Tracks down and fixes the bug that's been driving the client-side-concat workaround in #4508 and blocking #4532. ## Root cause TanStack DB's "includes" — fields whose value is a sub-query like \`concat(toArray(...))\` — are deferred. A row carrying an include arrives with the field set to \`null\` and a hidden \`Symbol(includesRouting)\` marker describing how to compute it. The include is only materialized when something downstream reads it *in the right way*. The empirical rule (figured out via DevTools probes — \`.toArray\` on the sub-collection always showed the populated string, \`useLiveQuery\` output had \`content: null\`): **An include is materialized only when it's referenced inside a \`caseWhen\` object body in a downstream \`.select(...)\`. A bare top-level reference doesn't trigger it — the include is just aliased forward, still deferred.** This is why \`items.text.content\` has always worked and reasoning hasn't. The items consumer derefs \`item.textContent\` inside the \`text: caseWhen(item.text.key, { ..., content: item.textContent })\` body. The reasoning consumer had \`content: concat(toArray(...))\` (or, after the source/consumer split, \`content: r.reasoningContent\`) at the top level of its select. useLiveQuery handed the row to React with \`content: null\`. ## Fix Wrap the include reference inside a \`caseWhen\` object body, mirroring items: \`\`\`ts reasoning: q .from({ r: runReasoningSource }) ... .select(({ r }) => ({ key: r.key, run_id: r.run_id, order: r.order, status: r.status, body: caseWhen(r.key, { content: r.reasoningContent, }), summary_title: r.summary_title, encrypted: r.encrypted, })) \`\`\` \`r.key\` is always truthy on a real row, so the caseWhen is effectively unconditional — its only purpose is being an object body that forces the include reference to materialize. UI reads \`entry.body?.content\` (via the type) and \`AgentResponseLive\` maps it back into a flat \`content: string\` on \`ReasoningEntry\` so \`ReasoningSection\`'s API is unchanged. This drops the need for the client-side concat workaround that was the original target of #4532. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@KyleAMathews here are some screenshots showing how it displays while it's thinking and how it displays when it's done thinking (the "Thought for 2s" block is expandable on click).
|
The entity-stream-db mock omitted the reasoning and reasoningDeltas collections, so loadOutboundIdSeed crashed when reading db.collections.reasoning.toArray under three process-wake scenarios. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>


Summary
While the model is "thinking" (Anthropic extended thinking, DeepSeek-R1 reasoning, Moonshot K2, OpenAI Responses summaries) the agent response now shows the reasoning text faded above the answer, with the existing
Thinkingshimmer heading plus elapsed-time ticker. Once the reasoning settles it collapses to▸ Thought for 12s— click to expand. Multiple reasoning rows per run render independently in order (one per LLM step in tool-using turns). UX intentionally mirrors Claude Code + OpenCode patterns.Implementation (end-to-end)
reasoningrow gainsrun_id,encrypted(Anthropic redacted-thinking opaque payload, must round-trip back to the model verbatim), andsummary_title(extracted at write time). NewreasoningDeltascollection mirrorstextDeltas. Strictly additive.OutboundBridgegainsonReasoningStart/onReasoningDelta/onReasoningEnd, parallel to the text path. Reasoning counter added toOutboundIdSeed.pi-adapter.tsroutes pi-ai'sthinking_start/thinking_delta/thinking_endevents to the bridge. Parses a**Title**\n\n<body>heading once at write time (OpenAI Responses; no-op for Anthropic / DeepSeek / Moonshot). Defensive: handles latethinking_deltawithout a precedingthinking_start, and closes an open reasoning row onmessage_end(e.g. provider abort).reasoning: Collection<EntityTimelineReasoningItem>onEntityTimelineRunRow, content built via the same delta-join pattern asEntityTimelineTextItem.content.<ReasoningSection>renders above items inAgentResponseLive:StreamdownwithThinkingIndicatorheading + summary title + elapsed-time ticker▸ Thought for Nswith click-to-expand. Closure duration snapshotted fromDate.now() - timestampusing the samesawStreamingReftrick from the elapsed-time PR — accurate for in-session settles, stays a bareThoughtfor rows already settled on first mount (no real end timestamp available client-side).⊘ Reasoning redacted by provider safety filters. The encrypted payload is still persisted server-side so the model gets it back on the next turn.Reference
Patterns informed by reading OpenCode's reasoning implementation:
reasoning-start/reasoning-delta/reasoning-end)ReasoningPartstorage shape includingencryptedfor Anthropic round-tripreasoningSummary()headline parser (5-line regex, OpenAI Responses only)Test plan
pnpm typecheckclean inagents-runtime+agents-server-uipnpm test outbound-bridge pi-adapter entity-timelineinagents-runtime(95 passed: 18 bridge + 21 adapter + 56 timeline)pnpm testinagents-server-ui(66 passed)pnpm -C packages/agents-runtime build— dist artifacts emit cleanlyThought for Nson settleNotes
AgentResponse(the non-Live path used for old scrollback sections) doesn't yet surface reasoning — historical rows recorded before this PR lack the data anyway. Follow-up if we discover sessions where this matters.runtime-dsl.test.ts401 failures (anddispatch-policy-routing.test.ts500 failures) reproduce identically on cleanmainand were not introduced by this PR.🤖 Generated with Claude Code