You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today shipcode stores per-phase agent output in fragmented places: plans.raw_output, reviews.raw_output, verifications.raw_output, terminal_events, and pipeline_step_log (V38). The executor's transcript is explicitly NOT persisted (apps/desktop/src/renderer/components/issue-detail/helpers.ts:257-260). There is no threaded conversation view — only flat byte streams in the terminal drawer. As shipcode adds dual-plan dialogue (#TBD) and as users want to audit / replay what each agent actually said, we need a single, structured record of every agent turn that is queryable, threaded, and viewable in the desktop app.
This PRD adds an agent_conversations table (migrateV39), wraps runProviderPhase to log every prompt + response, exposes the rows over IPC, and ships a new Conversations tab in IssueDetail.
Implementation Checklist
The system must add agent_conversations (migrateV39) with columns: id, thread_id (FK threads(id) ON DELETE CASCADE), phase, round, speaker, role ('prompt' | 'response'), parent_id (nullable self-FK), provider, model, content (TEXT), tokens_in, tokens_out, cost_usd, created_at. Indexes on (thread_id, created_at) and (thread_id, phase, round)
The system must wrap runProviderPhase (packages/pipeline/src/pipeline/runtime.ts:313) so every provider call produces exactly two rows: one prompt, one response. Failures still write the prompt row plus a synthetic error response row containing the failure message
The system must add pipeline_step_log.conversation_id (nullable FK) so each step-log row links to its conversation prompt/response pair. Update insertion sites accordingly
The system must expose IPC channel agent-conversations:list-by-thread returning rows sorted by created_at ASC, with optional phase + role filters
The system must add a new ConversationsTab to IssueDetail rendering phase-grouped turns. Default tab order: Overview / Plan History / Pipeline / Conversations / History
Problem Statement
Fragmented persistence. Each phase stores its raw output on a different table; review/plan/verify rows are queryable but execute output is only in terminal_events, which is byte-level pty stream, not logical turns.
Repo-blind reviewer cannot be audited. The reviewer skill operates without repo file context. To improve the reviewer prompt over time we need to read what it actually said vs. what it should have caught — that requires a durable, structured record per turn.
Dual-plan dialogue has nowhere to live. The upcoming dual-plan feature produces 5 turns per plan phase (2 planners + 2 critics + 1 adjudicator). Without a conversation table this dialogue would be invisible or shoved into raw_output blobs.
No replay for failures. When a phase fails, the user sees the failure_reason banner but cannot inspect the prompt that triggered it without grep'ing logs.
Goals
Every agent turn (prompt + response) is persisted with phase, round, speaker, provider, and model metadata.
A new Conversations tab in IssueDetail renders threaded turns per thread, grouped by phase, ordered by time.
Backfill is unnecessary — feature is forward-only. Existing threads have no conversation rows.
Dual-plan dialogue (when shipped) lands in the same table with no schema change.
Non-Goals
Editing or deleting turns. Append-only audit trail.
Cross-thread search / global conversation timeline.
Conversation export to share link or file (read-only view in v1; copy-as-markdown is the only export path).
Replacing terminal_events. Both stay; one is byte-level pty, the other is logical turns.
Per-turn surgical re-run / rewind. Out of scope.
Side-by-side plan diff viewer (Claude plan vs Codex plan). Different feature.
Migration of historical raw_output blobs into the new table.
User Stories
As a developer reviewing why the executor produced a wrong diff, I want to see exactly what the planner said, what the reviewer approved, and what prompt the executor received, in one threaded view. Acceptance:
Open IssueDetail → Conversations tab → see prompt + response turns for plan, review, execute, verify, ordered by time.
Each turn shows speaker (e.g., "claude-planner"), phase, model, tokens, cost.
As Vincent debugging why dual-plan adjudication chose plan A over plan B, I want to read both planner outputs and both critiques in one place. Acceptance:
All 5 dual-plan turns appear under the Plan group with parent_id threading from critiques back to the plan they critiqued.
As a user copying a useful agent response into a separate doc, I want a one-click "Copy as markdown" on each turn. Acceptance:
Hover on a turn → copy icon → clicking copies the speaker + phase + content as markdown to clipboard.
Functional Requirements
The system must add agent_conversations (migrateV39) with columns: id, thread_id (FK threads(id) ON DELETE CASCADE), phase, round, speaker, role ('prompt' | 'response'), parent_id (nullable self-FK), provider, model, content (TEXT), tokens_in, tokens_out, cost_usd, created_at. Indexes on (thread_id, created_at) and (thread_id, phase, round).
The system must wrap runProviderPhase (packages/pipeline/src/pipeline/runtime.ts:313) so every provider call produces exactly two rows: one prompt, one response. Failures still write the prompt row plus a synthetic error response row containing the failure message.
The system must add pipeline_step_log.conversation_id (nullable FK) so each step-log row links to its conversation prompt/response pair. Update insertion sites accordingly.
The system must expose IPC channel agent-conversations:list-by-thread returning rows sorted by created_at ASC, with optional phase + role filters.
The system must add a new ConversationsTab to IssueDetail rendering phase-grouped turns. Default tab order: Overview / Plan History / Pipeline / Conversations / History.
The system must support phase filter chips (Plan / Review / Execute / Verify / Revision / Critique / Adjudicate), multi-select, default all on.
The system must support free-text search over content and speaker.
Each turn must display: speaker badge (color-coded by provider), phase pill, model badge, token + cost row, collapsible content. Content > 30 lines truncates with "Show full" toggle.
"Copy as markdown" must exist per turn and "Copy whole thread as markdown" at the tab top.
The system must rehydrate conversations on app reload from the SQLite source of truth.
Non-Functional Requirements
IPC list must paginate or stream if a thread exceeds 500 turns (rare; cap at 1000 in v1, return tail).
A single turn's content may be up to 5 MB. SQLite handles this fine; UI must virtualize the list to avoid render hangs.
Schema migration must complete in <1 s on a 1 GB database. No table rewrites.
Failure modes during writes must not block the pipeline — log + continue, never propagate as a phase failure.
Success Criteria
Running a full plan → review → execute → verify cycle produces one prompt + one response row per phase, visible in Conversations tab in time order.
Running with dual-plan enabled produces exactly 5 prompt-response pairs under the Plan group with correct parent_id threading.
Killing a planner mid-run produces the prompt row plus an error response row carrying the failure message.
App reload rehydrates the tab without re-running the pipeline.
sqlite3 ~/.shipcode/db \".schema agent_conversations\" matches the migration.
Schema test (packages/db/src/schema.test.ts) covers V39 with at least 3 assertions (table shape, indexes, FK cascade).
Out of Scope
Cross-thread global conversation timeline.
Searching across all threads from a single page.
Editing or deleting turns post-hoc.
Replaying a turn against a different model.
Side-by-side plan diff viewer.
Migration of historical raw_output blobs.
Sidecar files for content > 5 MB (v1 stores everything inline in SQLite).
Dependencies
packages/db/src/schema.ts — new migrateV39 + bump EXPECTED_SCHEMA_VERSION to 39.
The IssueDetail tabs definition (locate via grep on existing tab labels) — add Conversations entry.
packages/db/src/schema.ts:1094 (pipeline_step_log) — add conversation_id column in V39, wire from runtime wrapper.
Verification Plan
tests:
packages/db/src/schema.test.ts — V39 block: table shape, indexes, FK cascade behavior.
packages/db/src/queries/agent-conversations.test.ts (new) — round-trip insert + list + filter by phase.
packages/pipeline/src/pipeline/runtime.test.ts — wrapper writes prompt + response for happy path; writes prompt + error response on provider failure; never throws into the pipeline path.
apps/desktop/src/renderer/features/issue-detail/ConversationsTab.test.tsx — renders phase groups, filter chips toggle, search filters rows, copy-as-markdown writes correct text to clipboard.
manual:
Run a full pipeline cycle. Open Conversations tab. Verify all phases present in time order.
Run with revisionCount=1. Verify revision turns appear with correct round number.
Search "TypeError" — only turns containing it remain.
Filter to "Execute" only — only execution turns visible.
Click "Copy whole thread as markdown" — paste into a scratch file, confirm full transcript with speaker headers.
Quit the app mid-pipeline (Cmd+Q during an active phase). Reopen, confirm partial conversation rows from before the quit are present.
sqlite3 ~/.shipcode/db \".schema agent_conversations\" returns the migration shape.
Risks & Open Questions
Content size blowup on long executor runs. A multi-hour executor may produce a multi-megabyte response. SQLite handles it but the UI must virtualize. Decision: default truncate at 30 lines with "Show full" toggle; full content stays in DB.
Wrapper failures must not break the pipeline. Any DB write error in the wrapper is logged + swallowed. Pipeline phases continue. Verify with a fault-injection test.
Token + cost data availability. OpenRouter responses include token counts; claude/codex CLIs do not always. v1 stores nullable tokens/cost; UI shows "—" when missing.
Conversations tab vs Plan History tab overlap. PlanHistoryTab shows plan versions. Conversations tab shows individual prompt+response turns. Both stay; PlanHistoryTab links to the corresponding conversation row when possible (link out, no merge).
Schema FK cascade on thread deletion. ON DELETE CASCADE means deleting a thread removes its conversation history. Acceptable since threads represent a single pipeline run; bulk thread deletion is rare.
PRD: agent-conversation-log
Executive Summary
Today shipcode stores per-phase agent output in fragmented places:
plans.raw_output,reviews.raw_output,verifications.raw_output,terminal_events, andpipeline_step_log(V38). The executor's transcript is explicitly NOT persisted (apps/desktop/src/renderer/components/issue-detail/helpers.ts:257-260). There is no threaded conversation view — only flat byte streams in the terminal drawer. As shipcode adds dual-plan dialogue (#TBD) and as users want to audit / replay what each agent actually said, we need a single, structured record of every agent turn that is queryable, threaded, and viewable in the desktop app.This PRD adds an
agent_conversationstable (migrateV39), wrapsrunProviderPhaseto log every prompt + response, exposes the rows over IPC, and ships a new Conversations tab in IssueDetail.Implementation Checklist
agent_conversations(migrateV39) with columns:id,thread_id(FK threads(id) ON DELETE CASCADE),phase,round,speaker,role('prompt' | 'response'),parent_id(nullable self-FK),provider,model,content(TEXT),tokens_in,tokens_out,cost_usd,created_at. Indexes on(thread_id, created_at)and(thread_id, phase, round)runProviderPhase(packages/pipeline/src/pipeline/runtime.ts:313) so every provider call produces exactly two rows: one prompt, one response. Failures still write the prompt row plus a synthetic error response row containing the failure messagepipeline_step_log.conversation_id(nullable FK) so each step-log row links to its conversation prompt/response pair. Update insertion sites accordinglyagent-conversations:list-by-threadreturning rows sorted bycreated_atASC, with optional phase + role filtersConversationsTabto IssueDetail rendering phase-grouped turns. Default tab order: Overview / Plan History / Pipeline / Conversations / HistoryProblem Statement
terminal_events, which is byte-level pty stream, not logical turns.Goals
Non-Goals
terminal_events. Both stay; one is byte-level pty, the other is logical turns.User Stories
As a developer reviewing why the executor produced a wrong diff, I want to see exactly what the planner said, what the reviewer approved, and what prompt the executor received, in one threaded view.
Acceptance:
As Vincent debugging why dual-plan adjudication chose plan A over plan B, I want to read both planner outputs and both critiques in one place.
Acceptance:
As a user copying a useful agent response into a separate doc, I want a one-click "Copy as markdown" on each turn.
Acceptance:
Functional Requirements
agent_conversations(migrateV39) with columns:id,thread_id(FK threads(id) ON DELETE CASCADE),phase,round,speaker,role('prompt' | 'response'),parent_id(nullable self-FK),provider,model,content(TEXT),tokens_in,tokens_out,cost_usd,created_at. Indexes on(thread_id, created_at)and(thread_id, phase, round).runProviderPhase(packages/pipeline/src/pipeline/runtime.ts:313) so every provider call produces exactly two rows: one prompt, one response. Failures still write the prompt row plus a synthetic error response row containing the failure message.pipeline_step_log.conversation_id(nullable FK) so each step-log row links to its conversation prompt/response pair. Update insertion sites accordingly.agent-conversations:list-by-threadreturning rows sorted bycreated_atASC, with optional phase + role filters.ConversationsTabto IssueDetail rendering phase-grouped turns. Default tab order: Overview / Plan History / Pipeline / Conversations / History.contentandspeaker.Non-Functional Requirements
contentmay be up to 5 MB. SQLite handles this fine; UI must virtualize the list to avoid render hangs.Success Criteria
sqlite3 ~/.shipcode/db \".schema agent_conversations\"matches the migration.packages/db/src/schema.test.ts) covers V39 with at least 3 assertions (table shape, indexes, FK cascade).Out of Scope
Dependencies
packages/db/src/schema.ts— new migrateV39 + bumpEXPECTED_SCHEMA_VERSIONto 39.packages/db/src/queries/agent-conversations.ts(new) —insertTurn,listForThread,listForThreadByPhase.packages/pipeline/src/pipeline/runtime.ts:313— wraprunProviderPhase.packages/shared/src/ipc-channels.ts— newagent-conversations:list-by-threadchannel.apps/desktop/src/main/ipc/register-conversation-handlers.ts(new).apps/desktop/src/renderer/features/issue-detail/ConversationsTab.tsx(new).packages/db/src/schema.ts:1094(pipeline_step_log) — addconversation_idcolumn in V39, wire from runtime wrapper.Verification Plan
packages/db/src/schema.test.ts— V39 block: table shape, indexes, FK cascade behavior.packages/db/src/queries/agent-conversations.test.ts(new) — round-trip insert + list + filter by phase.packages/pipeline/src/pipeline/runtime.test.ts— wrapper writes prompt + response for happy path; writes prompt + error response on provider failure; never throws into the pipeline path.apps/desktop/src/renderer/features/issue-detail/ConversationsTab.test.tsx— renders phase groups, filter chips toggle, search filters rows, copy-as-markdown writes correct text to clipboard.revisionCount=1. Verify revision turns appear with correct round number.sqlite3 ~/.shipcode/db \".schema agent_conversations\"returns the migration shape.Risks & Open Questions