Skip to content

Agent conversation log: persist + view inter-agent dialogue across all pipeline phases #90

@VincentShipsIt

Description

@VincentShipsIt

PRD: agent-conversation-log

Executive Summary

Today shipcode stores per-phase agent output in fragmented places: plans.raw_output, reviews.raw_output, verifications.raw_output, terminal_events, and pipeline_step_log (V38). The executor's transcript is explicitly NOT persisted (apps/desktop/src/renderer/components/issue-detail/helpers.ts:257-260). There is no threaded conversation view — only flat byte streams in the terminal drawer. As shipcode adds dual-plan dialogue (#TBD) and as users want to audit / replay what each agent actually said, we need a single, structured record of every agent turn that is queryable, threaded, and viewable in the desktop app.

This PRD adds an agent_conversations table (migrateV39), wraps runProviderPhase to log every prompt + response, exposes the rows over IPC, and ships a new Conversations tab in IssueDetail.

Implementation Checklist

  • The system must add agent_conversations (migrateV39) with columns: id, thread_id (FK threads(id) ON DELETE CASCADE), phase, round, speaker, role ('prompt' | 'response'), parent_id (nullable self-FK), provider, model, content (TEXT), tokens_in, tokens_out, cost_usd, created_at. Indexes on (thread_id, created_at) and (thread_id, phase, round)
  • The system must wrap runProviderPhase (packages/pipeline/src/pipeline/runtime.ts:313) so every provider call produces exactly two rows: one prompt, one response. Failures still write the prompt row plus a synthetic error response row containing the failure message
  • The system must add pipeline_step_log.conversation_id (nullable FK) so each step-log row links to its conversation prompt/response pair. Update insertion sites accordingly
  • The system must expose IPC channel agent-conversations:list-by-thread returning rows sorted by created_at ASC, with optional phase + role filters
  • The system must add a new ConversationsTab to IssueDetail rendering phase-grouped turns. Default tab order: Overview / Plan History / Pipeline / Conversations / History

Problem Statement

  • Fragmented persistence. Each phase stores its raw output on a different table; review/plan/verify rows are queryable but execute output is only in terminal_events, which is byte-level pty stream, not logical turns.
  • Repo-blind reviewer cannot be audited. The reviewer skill operates without repo file context. To improve the reviewer prompt over time we need to read what it actually said vs. what it should have caught — that requires a durable, structured record per turn.
  • Dual-plan dialogue has nowhere to live. The upcoming dual-plan feature produces 5 turns per plan phase (2 planners + 2 critics + 1 adjudicator). Without a conversation table this dialogue would be invisible or shoved into raw_output blobs.
  • No replay for failures. When a phase fails, the user sees the failure_reason banner but cannot inspect the prompt that triggered it without grep'ing logs.

Goals

  • Every agent turn (prompt + response) is persisted with phase, round, speaker, provider, and model metadata.
  • A new Conversations tab in IssueDetail renders threaded turns per thread, grouped by phase, ordered by time.
  • Backfill is unnecessary — feature is forward-only. Existing threads have no conversation rows.
  • Dual-plan dialogue (when shipped) lands in the same table with no schema change.

Non-Goals

  • Editing or deleting turns. Append-only audit trail.
  • Cross-thread search / global conversation timeline.
  • Conversation export to share link or file (read-only view in v1; copy-as-markdown is the only export path).
  • Replacing terminal_events. Both stay; one is byte-level pty, the other is logical turns.
  • Per-turn surgical re-run / rewind. Out of scope.
  • Side-by-side plan diff viewer (Claude plan vs Codex plan). Different feature.
  • Migration of historical raw_output blobs into the new table.

User Stories

  • As a developer reviewing why the executor produced a wrong diff, I want to see exactly what the planner said, what the reviewer approved, and what prompt the executor received, in one threaded view.
    Acceptance:

    • Open IssueDetail → Conversations tab → see prompt + response turns for plan, review, execute, verify, ordered by time.
    • Each turn shows speaker (e.g., "claude-planner"), phase, model, tokens, cost.
  • As Vincent debugging why dual-plan adjudication chose plan A over plan B, I want to read both planner outputs and both critiques in one place.
    Acceptance:

    • All 5 dual-plan turns appear under the Plan group with parent_id threading from critiques back to the plan they critiqued.
  • As a user copying a useful agent response into a separate doc, I want a one-click "Copy as markdown" on each turn.
    Acceptance:

    • Hover on a turn → copy icon → clicking copies the speaker + phase + content as markdown to clipboard.

Functional Requirements

  1. The system must add agent_conversations (migrateV39) with columns: id, thread_id (FK threads(id) ON DELETE CASCADE), phase, round, speaker, role ('prompt' | 'response'), parent_id (nullable self-FK), provider, model, content (TEXT), tokens_in, tokens_out, cost_usd, created_at. Indexes on (thread_id, created_at) and (thread_id, phase, round).
  2. The system must wrap runProviderPhase (packages/pipeline/src/pipeline/runtime.ts:313) so every provider call produces exactly two rows: one prompt, one response. Failures still write the prompt row plus a synthetic error response row containing the failure message.
  3. The system must add pipeline_step_log.conversation_id (nullable FK) so each step-log row links to its conversation prompt/response pair. Update insertion sites accordingly.
  4. The system must expose IPC channel agent-conversations:list-by-thread returning rows sorted by created_at ASC, with optional phase + role filters.
  5. The system must add a new ConversationsTab to IssueDetail rendering phase-grouped turns. Default tab order: Overview / Plan History / Pipeline / Conversations / History.
  6. The system must support phase filter chips (Plan / Review / Execute / Verify / Revision / Critique / Adjudicate), multi-select, default all on.
  7. The system must support free-text search over content and speaker.
  8. Each turn must display: speaker badge (color-coded by provider), phase pill, model badge, token + cost row, collapsible content. Content > 30 lines truncates with "Show full" toggle.
  9. "Copy as markdown" must exist per turn and "Copy whole thread as markdown" at the tab top.
  10. The system must rehydrate conversations on app reload from the SQLite source of truth.

Non-Functional Requirements

  • IPC list must paginate or stream if a thread exceeds 500 turns (rare; cap at 1000 in v1, return tail).
  • A single turn's content may be up to 5 MB. SQLite handles this fine; UI must virtualize the list to avoid render hangs.
  • Schema migration must complete in <1 s on a 1 GB database. No table rewrites.
  • Failure modes during writes must not block the pipeline — log + continue, never propagate as a phase failure.

Success Criteria

  • Running a full plan → review → execute → verify cycle produces one prompt + one response row per phase, visible in Conversations tab in time order.
  • Running with dual-plan enabled produces exactly 5 prompt-response pairs under the Plan group with correct parent_id threading.
  • Killing a planner mid-run produces the prompt row plus an error response row carrying the failure message.
  • App reload rehydrates the tab without re-running the pipeline.
  • sqlite3 ~/.shipcode/db \".schema agent_conversations\" matches the migration.
  • Schema test (packages/db/src/schema.test.ts) covers V39 with at least 3 assertions (table shape, indexes, FK cascade).

Out of Scope

  • Cross-thread global conversation timeline.
  • Searching across all threads from a single page.
  • Editing or deleting turns post-hoc.
  • Replaying a turn against a different model.
  • Side-by-side plan diff viewer.
  • Migration of historical raw_output blobs.
  • Sidecar files for content > 5 MB (v1 stores everything inline in SQLite).

Dependencies

  • packages/db/src/schema.ts — new migrateV39 + bump EXPECTED_SCHEMA_VERSION to 39.
  • packages/db/src/queries/agent-conversations.ts (new) — insertTurn, listForThread, listForThreadByPhase.
  • packages/pipeline/src/pipeline/runtime.ts:313 — wrap runProviderPhase.
  • packages/shared/src/ipc-channels.ts — new agent-conversations:list-by-thread channel.
  • apps/desktop/src/main/ipc/register-conversation-handlers.ts (new).
  • apps/desktop/src/renderer/features/issue-detail/ConversationsTab.tsx (new).
  • The IssueDetail tabs definition (locate via grep on existing tab labels) — add Conversations entry.
  • packages/db/src/schema.ts:1094 (pipeline_step_log) — add conversation_id column in V39, wire from runtime wrapper.

Verification Plan

  • tests:
    • packages/db/src/schema.test.ts — V39 block: table shape, indexes, FK cascade behavior.
    • packages/db/src/queries/agent-conversations.test.ts (new) — round-trip insert + list + filter by phase.
    • packages/pipeline/src/pipeline/runtime.test.ts — wrapper writes prompt + response for happy path; writes prompt + error response on provider failure; never throws into the pipeline path.
    • apps/desktop/src/renderer/features/issue-detail/ConversationsTab.test.tsx — renders phase groups, filter chips toggle, search filters rows, copy-as-markdown writes correct text to clipboard.
  • manual:
    • Run a full pipeline cycle. Open Conversations tab. Verify all phases present in time order.
    • Run with revisionCount=1. Verify revision turns appear with correct round number.
    • Search "TypeError" — only turns containing it remain.
    • Filter to "Execute" only — only execution turns visible.
    • Click "Copy whole thread as markdown" — paste into a scratch file, confirm full transcript with speaker headers.
    • Quit the app mid-pipeline (Cmd+Q during an active phase). Reopen, confirm partial conversation rows from before the quit are present.
    • sqlite3 ~/.shipcode/db \".schema agent_conversations\" returns the migration shape.

Risks & Open Questions

  • Content size blowup on long executor runs. A multi-hour executor may produce a multi-megabyte response. SQLite handles it but the UI must virtualize. Decision: default truncate at 30 lines with "Show full" toggle; full content stays in DB.
  • Wrapper failures must not break the pipeline. Any DB write error in the wrapper is logged + swallowed. Pipeline phases continue. Verify with a fault-injection test.
  • Token + cost data availability. OpenRouter responses include token counts; claude/codex CLIs do not always. v1 stores nullable tokens/cost; UI shows "—" when missing.
  • Conversations tab vs Plan History tab overlap. PlanHistoryTab shows plan versions. Conversations tab shows individual prompt+response turns. Both stay; PlanHistoryTab links to the corresponding conversation row when possible (link out, no merge).
  • Schema FK cascade on thread deletion. ON DELETE CASCADE means deleting a thread removes its conversation history. Acceptable since threads represent a single pipeline run; bulk thread deletion is rare.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions