Skip to content

Pipeline Design 178

ezigus edited this page Mar 17, 2026 · 3 revisions

Now I have a clear picture of the existing implementation and the gaps. Here's the ADR:


Design: Pipeline cost forecast and budget gate with early warning

Context

Shipwright pipelines consume budget via LLM token usage across 12 stages. Operators need visibility into expected costs before committing to a run, and guardrails to prevent budget exhaustion. The core forecasting, budget gate, variance tracking, and dashboard integration were implemented in commit abd44a8. Four gaps remain:

  1. No early warning — the gate is binary (block or pass); no warning when a run will consume 50–100% of remaining budget.
  2. No confidence range — forecast shows a point estimate ($X.XX) but not low/high bounds reflecting data quality.
  3. No estimated_cost on queued dashboard items — the QueueItem type has the field but daemon dispatch doesn't populate it.
  4. cost.forecast event missing data_points — reduces observability.

Constraints: Bash 3.2 compatibility, set -euo pipefail, awk for floating-point, no new dependencies. Shell-first architecture — dashboard is read-only over events.jsonl.

Decision

Enhance the existing implementation in-place across 3 shell scripts and 1 TypeScript file. No new files for core logic.

Confidence Range

Add low_usd and high_usd to the cost_forecast() JSON output, computed from total_usd and a confidence-dependent spread:

Confidence Spread Example ($50 point estimate)
high ±15% $42.50 – $57.50
medium ±30% $35.00 – $65.00
low ±50% $25.00 – $75.00

cost_forecast_display() shows the range for medium/low confidence: $35.00 - $65.00 (medium confidence, 8 runs). High confidence keeps the point estimate since the range is narrow enough to be noise.

Early Warning

After the existing "exceeds budget" block in pipeline_start(), add a consumption-percentage check:

if forecast > 50% of remaining (but ≤ remaining):
    warn "Forecast $X will consume Y% of remaining budget $Z"
    emit_event "cost.budget_high_usage" ...

Single warning per pipeline start — no repeated noise. Does not block, only informs.

Dashboard Queue Enrichment

In daemon-dispatch.sh, after the existing pre-spawn budget check, write forecast_usd to the job's metadata file. server.ts reads this when building queue items for /api/status.

Event Completeness

Add data_points=$(echo "$FORECAST_JSON" | jq -r '.data_points') to the existing emit_event "cost.forecast" call.

Component Diagram

┌─────────────────────────────────────────────────────┐
│                   CLI / Pipeline                     │
│  sw-pipeline.sh                                      │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────┐ │
│  │ Budget Gate   │  │ Early Warning│  │ Variance   │ │
│  │ (block/force) │  │ (warn >50%) │  │ Recording  │ │
│  └──────┬───────┘  └──────┬───────┘  └─────┬──────┘ │
│         │                 │                │         │
└─────────┼─────────────────┼────────────────┼─────────┘
          │ calls           │ calls          │ calls
          ▼                 ▼                ▼
┌─────────────────────────────────────────────────────┐
│                 Cost Engine                           │
│  sw-cost.sh                                          │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────┐ │
│  │cost_forecast()│  │ _display()   │  │_variance() │ │
│  │ +low/high_usd│  │ +range fmt   │  │            │ │
│  └──────┬───────┘  └──────────────┘  └────────────┘ │
│         │ reads                                      │
│         ▼                                            │
│  ┌────────────────┐  ┌───────────────────────┐       │
│  │ Pricing rates  │  │ events.jsonl (history) │       │
│  └────────────────┘  └───────────────────────┘       │
└─────────────────────────────────────────────────────┘

┌──────────────────┐          ┌──────────────────────┐
│ daemon-dispatch  │──writes──▶ job metadata file     │
│ (forecast_usd)   │          │ (worktree/.forecast)  │
└──────────────────┘          └──────────┬───────────┘
                                         │ reads
                              ┌──────────▼───────────┐
                              │ dashboard/server.ts   │
                              │ /api/costs/forecast   │
                              │ /api/status (queue)   │
                              └──────────────────────┘

Interface Contracts

// cost_forecast() JSON output (bash → stdout)
interface CostForecast {
  total_usd: string;        // "50.00"
  low_usd: string;          // "35.00" — NEW
  high_usd: string;         // "65.00" — NEW
  stages: Array<{
    id: string;
    model: string;
    est_duration_s: number;
    est_tokens_in: number;
    est_tokens_out: number;
    est_cost: string;
  }>;
  confidence: "high" | "medium" | "low";
  data_points: number;
  complexity_multiplier: string;
}

// emit_event "cost.forecast" fields
interface CostForecastEvent {
  type: "cost.forecast";
  forecast_usd: string;
  template: string;
  confidence: string;
  issue: string;
  data_points: string;     // NEW
}

// emit_event "cost.budget_high_usage" fields — NEW
interface BudgetHighUsageEvent {
  type: "cost.budget_high_usage";
  forecast_usd: string;
  remaining_usd: string;
  consumption_pct: string;  // e.g. "72.5"
  template: string;
  issue: string;
}

// GET /api/costs/forecast response (unchanged structure, data richer)
interface ForecastResponse {
  recent_forecasts: Array<{
    issue: string;
    template: string;
    forecast_usd: number;
    confidence: string;
    ts: string;
  }>;
  variance_history: Array<{
    forecast_usd: number;
    actual_usd: number;
    variance_pct: number;
    template: string;
    ts: string;
  }>;
}

// QueueItem in /api/status (existing optional field, now populated)
interface QueueItem {
  // ... existing fields ...
  estimated_cost?: number;  // populated from job metadata
}

Data Flow

  1. Pre-start forecast: pipeline_start()cost_forecast(template_config, complexity) → reads events.jsonl history + template stages → returns JSON with total_usd, low_usd, high_usd, confidence
  2. Display: cost_forecast_display(json) → renders table with range to stdout
  3. Budget gate: compares total_usd vs cost_remaining_budget() → blocks or warns
  4. Early warning: if total_usd / remaining > 0.5 but ≤ 1.0warn() + emit_event "cost.budget_high_usage"
  5. Variance: on pipeline completion → cost_record_variance(forecast, actual, template, issue) → emits cost.forecast_variance event
  6. Dashboard: server.ts reads events.jsonl → filters cost.forecast and cost.forecast_variance → returns to frontend

Error Boundaries

Component Error Handling
cost_forecast() Missing template file Returns empty string to stderr, caller skips forecast (existing)
cost_forecast() No events.jsonl Uses default durations, confidence="low" (existing)
cost_forecast() jq parse failure `
Budget gate cost_remaining_budget fails Falls back to "unlimited", skips gate (existing)
Early warning awk error Non-fatal — wrapped in conditional, pipeline proceeds
Dashboard /api/costs/forecast events.jsonl unreadable Returns 500 (existing)
daemon-dispatch Forecast write failure `

Alternatives Considered

  1. Statistical confidence intervals (std dev) — Pros: mathematically rigorous, self-calibrating / Cons: needs ≥20 data points per template to be meaningful, complex bash math, overkill for current data volumes. Deferred — can iterate when more history exists; the percentage-spread approach is a pragmatic first step.

  2. Node.js forecast module — Pros: proper floating point, easier unit testing, natural dashboard integration / Cons: breaks shell-first convention, requires Node at pipeline start time, two systems to maintain. Rejected — inconsistent with codebase architecture.

  3. Do nothing — Pros: zero risk / Cons: misses acceptance criteria for early warning and confidence range. Rejected — the gaps are small but material for operator experience.

Implementation Plan

  • Files to create: None
  • Files to modify:
    • scripts/sw-cost.sh — add low_usd/high_usd to cost_forecast() output; update cost_forecast_display() for range format
    • scripts/sw-pipeline.sh — add early warning check after budget gate; add data_points to cost.forecast event
    • scripts/lib/daemon-dispatch.sh — write forecast_usd to job metadata
    • dashboard/server.ts — include estimated_cost in queue item response
    • scripts/sw-cost-test.sh — tests for confidence range and early warning
  • Dependencies: None (new)
  • Risk areas:
    • daemon-dispatch.sh is critical path for daemon — change is additive (write one metadata field), low risk
    • awk floating-point precision — acceptable at 4 decimal places for cost estimates
    • Confidence spread percentages are heuristic, not statistical — clearly labeled as such

Endpoint Specification

GET /api/costs/forecast

  • Method: GET
  • Path: /api/costs/forecast
  • Query params: period (integer, days, default 30)
  • Response (200): { recent_forecasts: [...], variance_history: [...] } (see interface above)
  • Response (200, empty): { recent_forecasts: [], variance_history: [] } when no data exists
  • Response (500): Internal error reading events.jsonl
  • Rate limiting: N/A — local developer tool
  • Versioning: N/A — internal API, no external consumers
  • Auth: None required (local dashboard)

Error Codes

Code Condition
200 Success (may have empty arrays)
500 events.jsonl read failure

Validation Criteria

  • cost_forecast() output JSON includes low_usd and high_usd fields
  • Low confidence range is wider than high confidence range (±50% vs ±15%)
  • cost_forecast_display() shows range format for medium/low confidence
  • Pipeline emits warning (not block) when forecast consumes 50–100% of remaining budget
  • Pipeline still blocks when forecast exceeds 100% of remaining budget (regression)
  • cost.forecast event includes data_points field
  • --force-start continues to bypass budget gate (regression)
  • Dashboard queue items include estimated_cost when available
  • npm test passes — no regressions
  • All new behavior tested in sw-cost-test.sh

Clone this wiki locally