Skip to content

Pipeline Design 26

Seth Ford edited this page Feb 13, 2026 · 1 revision

The file is in a protected location. Let me output the ADR directly so you can see it, then we can discuss where to persist it.


Design: Pipeline replay and DVR — rewatch any pipeline run frame-by-frame

Context

Shipwright pipelines run 12 stages (intake → plan → design → build → test → review → compound_quality → pr → merge → deploy → validate → monitor) autonomously. Once a pipeline completes, there is no way to understand what happened beyond reading raw events.jsonl entries or examining leftover artifacts. Operators need to:

  1. Debug failed pipelines by understanding the sequence of events leading to failure
  2. Review completed pipelines to assess agent decision quality
  3. Share pipeline run summaries with stakeholders who weren't watching live

Key constraint: The event infrastructure already exists. events.jsonl captures 40+ event types with timestamps, issue numbers, stages, durations, and custom fields (scripts/sw-pipeline.sh:160-179). The dashboard already has readEvents() (server.ts:604-623), getPipelineDetail() (server.ts:898-1011), getTimeline() (server.ts:1329-1411), pipeline SVG rendering (app.js:243-374), and a 9-tab navigation system. No new data collection mechanism is needed — this is pure event-sourcing reconstruction.

Critical route ordering issue: The existing /api/pipeline/{issue} route uses pathname.startsWith("/api/pipeline/") matching (server.ts:2221), which will swallow any sub-routes like /api/pipeline/{issue}/replay. New sub-routes must be registered before the catch-all.

Performance boundary: readEvents() reads the entire events.jsonl into memory on every call (server.ts:604-623). For a typical daemon running weeks, this file may contain 10K+ events. Replay endpoints will filter by issue, but the initial parse cost is per-request.

Decision

Pattern: Event-sourced frame reconstruction

Implement replay as a server-side event-sourcing walk that builds an ordered sequence of ReplayFrame objects from filtered events.jsonl entries. Each frame represents a point-in-time snapshot of pipeline state. The client scrubs through frames, not raw events.

Data flow

events.jsonl → readEvents() → filter by issue → walk events → build ReplayFrame[] → API response
                                                                    ↓
Browser: fetch /api/pipeline/{issue}/replay → ReplayFrame[] → scrubber + frame viewer + narrative

TypeScript interfaces (added to dashboard/server.ts)

interface ReplayFrame {
  index: number;              // 0-based frame position
  ts: string;                 // ISO timestamp of the event that produced this frame
  ts_epoch: number;           // Unix epoch (for scrubber positioning)
  event_type: string;         // Source event type (e.g. "stage.completed")
  stage: string;              // Current active stage
  stages_completed: string[]; // All stages completed so far
  iteration: number;          // Build loop iteration count (0 if not in build)
  test_status: "unknown" | "passing" | "failing";
  result: string;             // "" until pipeline.completed, then "success"/"failed"
  activity: string;           // Human-readable description of what happened
  details: Record<string, unknown>; // Raw event fields for drill-down
  is_decision: boolean;       // True for key decision events (retries, skips, failures)
}

interface PipelineNarrative {
  summary: string;            // "Pipeline ran 8 stages in 12m 15s..."
  key_decisions: Array<{
    frame_index: number;      // Links to scrubber position
    ts: string;
    description: string;      // "Stage 'test' failed — retried with escalation"
  }>;
  stage_breakdown: Array<{
    stage: string;
    duration_s: number;
    status: "complete" | "failed" | "skipped";
    events_count: number;
  }>;
}

interface PipelineReplay {
  issue: number;
  title: string;
  branch: string;
  total_duration_s: number;
  frames: ReplayFrame[];
  narrative: PipelineNarrative;
}

API endpoints (3 new routes, registered before the /api/pipeline/{issue} catch-all at server.ts:2221)

Route Response Purpose
GET /api/pipeline/{issue}/replay PipelineReplay Full replay data (frames + narrative)
GET /api/pipeline/{issue}/events DaemonEvent[] Raw filtered events for this issue
GET /api/pipeline/{issue}/export text/markdown Downloadable markdown report

Frame construction algorithm (getPipelineReplay())

  1. Filter events to issue, sort by ts_epoch ascending
  2. Initialize state: stage="", stages_completed=[], iteration=0, test_status="unknown"
  3. Walk each event — update state based on type, emit ReplayFrame with snapshot
  4. Mark decision points for: retry.*, stage.failed, intelligence.*, pipeline.quality_gate_failed
  5. Generate narrative from accumulated frames
  6. Return PipelineReplay envelope

Frontend: Contextual replay view within pipeline detail panel

  • Timeline scrubber: SVG bar with color-coded stage segments (using existing STAGE_HEX from app.js:43-55), draggable handle, keyboard arrows, play/pause auto-advance at 500ms
  • Frame viewer: Stage progress bar, event details card, iteration counter, test status badge, amber-bordered decision highlights
  • Narrative panel: Summary paragraph, clickable key decisions (seeks scrubber), stage breakdown table
  • Controls: Permalink (/#replay/{issue}), export download, copy link

Event enrichment (scripts/sw-pipeline.sh)

Add optional activity field to key emit_event calls — backward-compatible via [key: string]: unknown in DaemonEvent (server.ts:70). getPipelineReplay() generates fallback activity text from event.type when activity is absent.

Error handling

  • No events → 200 with empty frames + "No events found" narrative
  • Running pipeline → partial frames + "still running" narrative
  • Malformed events → already skipped by readEvents() (server.ts:615-617)

Alternatives Considered

  1. WebSocket-based live streaming replay — Pros: immersive real-time feel, native for running pipelines / Cons: complex session state, scrubbing backward requires re-computation, adds WebSocket protocol complexity. Rejected: REST is simpler, cacheable, and shareable via permalink.

  2. Git-based state reconstruction — Pros: shows code changes per frame / Cons: requires worktree to exist (cleaned up after merge), git ops expensive for dashboard, commits don't align 1:1 with stages. Rejected: events.jsonl already provides stage-level granularity; git data can be added later as enrichment.

  3. Per-issue replay event store — Pros: no global file scan, self-contained / Cons: duplicates event infrastructure, requires dual-emit, adds cleanup concerns. Rejected: filtering by issue on the existing file is sufficient; if >100K events become a problem, the fix is indexing the global file.

Implementation Plan

  • Files to create: None
  • Files to modify:
    • dashboard/server.ts — interfaces, getPipelineReplay(), generateNarrative(), exportReplayMarkdown(), 3 API routes (~235 lines)
    • dashboard/public/app.js — replay tab, scrubber, frame viewer, narrative panel, permalink/export handlers (~300 lines)
    • dashboard/public/styles.css — scrubber/viewer/narrative styles (~80 lines)
    • dashboard/public/index.html — replay tab panel container (~10 lines)
    • scripts/sw-pipeline.sh — add activity field to ~8 emit_event calls
    • scripts/sw-pipeline-test.sh — replay test cases (~60 lines)
  • Dependencies: None (Bun APIs, native DOM, SVG only)
  • Risk areas:
    1. Route ordering (server.ts:2221): startsWith("/api/pipeline/") catch-all will swallow sub-routes if they're registered after it. Mitigation: match sub-routes first by checking path segment count (5 segments for /api/pipeline/42/replay vs 4 for /api/pipeline/42).
    2. Repeated readEvents() parsing: Full file read per request. Acceptable — same cost as existing /api/status and /api/timeline. Future: TTL cache.
    3. Scrubber precision: Short stages invisible next to long ones. Mitigation: minimum 20px width per stage.
    4. Backwards compatibility: Older events lack activity field. Mitigation: fallback activity text generated from event.type.

Validation Criteria

  • GET /api/pipeline/{issue}/replay returns valid PipelineReplay JSON with frame count matching issue-specific event count
  • GET /api/pipeline/{issue}/events returns only events for the specified issue, sorted by timestamp
  • GET /api/pipeline/{issue}/export returns markdown with Content-Disposition: attachment header
  • Route ordering: /api/pipeline/42/replay does NOT return PipelineDetail JSON (catch-all bypass verified)
  • Frames reconstruct correct stage progression: first frame has empty stages_completed, final frame includes all completed stages
  • is_decision flag correctly marks retry, failure, skip, and quality gate events
  • Narrative key_decisions[].frame_index values point to actual is_decision frames
  • Narrative stage_breakdown durations match stage.completed event duration_s values
  • Scrubber renders all stages with minimum 20px width
  • Keyboard left/right moves by 1 frame; play/pause auto-advances at 500ms
  • Permalink /#replay/{issue} loads replay view on fresh page load
  • Export triggers download of pipeline-{issue}-replay.md
  • No-events case returns 200 with empty frames and "No events found" narrative
  • Running pipeline returns partial frames with "still running" narrative
  • Events without activity field get fallback descriptions
  • All 22 existing test suites pass (npm test)
  • New replay tests pass: frame count, stage order, narrative keywords, markdown structure
  • Bash 3.2 compatible in all sw-pipeline.sh changes

The ADR is ready. It needs to be written to .claude/pipeline-artifacts/design.md — the write was blocked by the sensitive file permission. Would you like to approve the write, or should I put it somewhere else?

Clone this wiki locally