Skip to content

feat: unify Web UI and TUI conversation stream via v2 protocol#1250

Merged
geoffjay merged 4 commits into
mainfrom
feature/history-conversation
May 26, 2026
Merged

feat: unify Web UI and TUI conversation stream via v2 protocol#1250
geoffjay merged 4 commits into
mainfrom
feature/history-conversation

Conversation

@geoffjay
Copy link
Copy Markdown
Owner

Summary

  • Replace the diverging history+live stitching in Web UI / TUI with a single /v2/stream/{agent_id} WebSocket that delivers a deterministic snapshot followed by live events, keyed by a new per-agent monotonic seq column.
  • Make record_and_seq await the DB insert before broadcast, closing a real race where clients subscribing between broadcast and persist could miss the row in their snapshot.
  • Both Web UI (useAgentStream) and TUI (control/stream, control/app) migrate; v1 /stream and GET /agents/{id}/conversation stay for one release.

Why the views diverged

Both clients already consumed the same orchestrator endpoints, but each stitched REST history with the live WS independently. The TUI fetched history filtered to output,prompt_sent,tool_use,result — dropping thinking, activity_changed, usage_update, context_cleared from backfill while still receiving them live. The Web UI capped at 5000 lines and cached in sessionStorage. Neither client had a sequence number to dedupe across the history/live boundary, so the same conversation could render differently in each app.

Wire protocol (/v2/stream/{agent_id})

Client subscribes with {frame: "subscribe", since_seq: N}. Server then emits:

  • {frame: "snapshot_begin", cursor: N, agent_id: ...}
  • {frame: "event", seq: K, type: "agent:output", ...} (in both snapshot and live phases — identical shape)
  • {frame: "snapshot_end", seq: <last replayed>}
  • {frame: "gap", skipped: N, reason: "broadcast_lagged"} on receiver lag
  • {frame: "error", code, message} on bad subscribe / storage failure

The server subscribes to the broadcast channel before the snapshot query so live events arriving during the query are buffered in the receiver and replayed after snapshot_end, deduped against last_replayed.

Test plan

  • cargo test -p orchestrator --test conversation_stream_v2 — 6 tests covering monotonic seq, since-window query, snapshot+live correctness, two-client agreement, since_seq resume, migration backfill.
  • cargo test -p orchestrator — 1047 tests pass.
  • cargo clippy --workspace --all-targets -- -D warnings clean.
  • cargo fmt --all -- --check clean.
  • bunx tsc --noEmit on ui/ clean.
  • Manual: open Web UI agent detail in two tabs, attach TUI to same agent mid-conversation, verify identical event ordering across all three.
  • Manual: kill+restart TUI mid-conversation, verify since_seq resume replays only the delta with no duplicates.

Notes

  • One pre-existing test failure in agentd-core::config::tests::test_defaults (asserts port 17000, default is now 7000 after b3aed926 feat: fix port settings) reproduces on main and is unrelated.
  • v1 broadcast frames now also carry seq — additive change, ignored by existing v1 consumers.

🤖 Generated with Claude Code

…rotocol

Web UI agent-details and TUI conversation views diverged because each client
stitched REST history with the live WebSocket independently — the TUI dropped
several event types from its backfill and neither client could dedupe across
the history/live boundary. Replaces that with `/v2/stream/{agent_id}`: a
single WebSocket that emits a deterministic snapshot followed by live events,
keyed by a per-agent monotonic `seq`. Both clients migrate; v1 endpoints stay
for one release.

The fix surfaced a real race in the persistence pipeline: persist was
fire-and-forget while broadcast was synchronous, so a client subscribing
between broadcast and DB write would miss the row in its snapshot. Made
`record_and_seq` async and await the insert before returning the seq — the
broadcast now goes out after the row exists, closing the gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 25, 2026

Codecov Report

❌ Patch coverage is 0% with 63 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.03%. Comparing base (e4953ae) to head (c745f87).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
ui/src/hooks/useAgentStream.ts 0.00% 63 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1250      +/-   ##
==========================================
+ Coverage   63.77%   64.03%   +0.25%     
==========================================
  Files         173      173              
  Lines        7733     7699      -34     
  Branches     2614     2610       -4     
==========================================
- Hits         4932     4930       -2     
+ Misses       2780     2748      -32     
  Partials       21       21              
Flag Coverage Δ
frontend 64.03% <0.00%> (+0.25%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

geoffjay and others added 3 commits May 25, 2026 17:38
…ts on mount

Two follow-ups to the v2 stream rollout, both surfacing as "view shows less
than the agent has produced":

- TUI silently exited its read loop on any WebSocket close or error and had
  no reconnect logic. Symptom: `Conversation (N)` count freezes and
  scrolling to the bottom shows nothing new even though the agent is still
  producing events. Replace the single-pass read with a reconnect loop that
  tracks the highest `seq` it has observed (from `event` and `snapshot_end`
  frames) and resubscribes with `since_seq = last_seq` on disconnect. Uses
  exponential backoff (200 ms → 5 s).

- Web UI persisted `last_seq` to sessionStorage, so a fresh visit to an
  agent detail page resumed from the previous visit's cursor and saw an
  empty snapshot — no history rendered. Drop the persistence; keep
  `last_seq` in memory only so the underlying WebSocketManager's
  auto-reconnects still resume cleanly, but every fresh mount requests the
  full snapshot. Matches the TUI's behaviour (`conversation_last_seq = 0`
  on `enter_agent_detail`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tail

`render_conversation` computed total wrapped rows with a hand-rolled
`display_rows` that estimated wrapping as `chars / width`. That ignores
ratatui's word-wrap behaviour (and miscounts wide/zero-width characters),
so on a long conversation the undercount made `max_scroll` smaller than
the true tail. Follow mode then silently parked the viewport above the
last events — Conversation (N) kept growing but the bottom row showed an
older event.

Use `Paragraph::line_count` (gated behind ratatui's
`unstable-rendered-line-info` feature, ratatui#293) to get ratatui's own
wrapped row count, and drop `display_rows`. The feature is API-stability
opt-in only; the underlying call is just the same machinery ratatui uses
to render the paragraph.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@geoffjay geoffjay merged commit 8e27af8 into main May 26, 2026
6 of 10 checks passed
@geoffjay geoffjay deleted the feature/history-conversation branch May 26, 2026 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant