Skip to content

Pipeline Plan 178

ezigus edited this page Mar 16, 2026 · 4 revisions

Implementation Plan: Pipeline Cost Forecast and Budget Gate

Socratic Design Refinement

Requirements Clarity

Minimum viable change: Add a cost_forecast() function to sw-cost.sh that estimates pipeline cost from template stages × model pricing × historical duration, display the forecast before pipeline_start() proceeds, and gate on budget. Emit variance events at completion.

Implicit requirements: Need sensible defaults when no historical data exists (cold-start). Confidence intervals require enough historical data points to be meaningful. The --force-start override must bypass the gate cleanly.

Acceptance criteria (from issue): forecast display, budget gate with override, variance event, dashboard display, confidence intervals.

Design Alternatives

Approach A: Inline in sw-pipeline.sh — Add forecast logic directly into pipeline_start(). Simple but mixes concerns and makes testing harder.

Approach B: New functions in sw-cost.sh + integration point in sw-pipeline.sh — Add cost_forecast() and cost_forecast_display() to sw-cost.sh (where all cost logic lives), call from pipeline_start() after template is loaded. Clean separation, testable, follows existing patterns.

Chosen: Approach B — Minimal blast radius, follows existing code organization (cost logic in sw-cost.sh, pipeline orchestration in sw-pipeline.sh). The dashboard integration adds a new API endpoint and view component.

Risk Assessment

  • Cold-start with no history: Mitigated by hardcoded default durations per stage (based on typical pipeline runs)
  • Inaccurate forecasts blocking legitimate work: Mitigated by --force-start override and configurable gate
  • Breaking existing budget checks: Low risk — new functions, existing cost_check_budget() unchanged
  • Dashboard changes: Isolated to new view components, no existing views modified

Simplicity Check

  • 3 files modified (sw-cost.sh, sw-pipeline.sh, dashboard metrics.ts)
  • 1 file modified for tests (sw-cost-test.sh)
  • Reuses existing cost_calculate(), cost_check_budget(), cost_remaining_budget(), model pricing, and event emission infrastructure
  • No new dependencies

Architecture Decision Record

Component Diagram

┌─────────────────────┐     ┌──────────────────────┐
│   sw-pipeline.sh    │     │    sw-cost.sh         │
│                     │     │                       │
│  pipeline_start()   │────▶│  cost_forecast()      │
│    ├─ load config   │     │    ├─ count stages    │
│    ├─ FORECAST ◀────│─────│    ├─ get history     │
│    ├─ budget gate   │     │    ├─ calc per-stage  │
│    └─ run_pipeline  │     │    └─ return estimate │
│                     │     │                       │
│  pipeline complete  │────▶│  cost_record_variance │
│                     │     │    └─ emit event      │
└─────────────────────┘     └──────────────────────┘
         │                            │
         ▼                            ▼
┌─────────────────────┐     ┌──────────────────────┐
│  events.jsonl       │     │  dashboard/metrics.ts │
│  cost.forecast      │     │  renderCostForecast() │
│  cost.forecast_var  │     │  forecast card + bar  │
└─────────────────────┘     └──────────────────────┘

Interface Contracts

// cost_forecast <template_config_path> [complexity]
// Returns JSON: { total_usd, stages: [{id, model, est_duration_s, est_cost}], confidence, data_points }
// confidence: "low" (<5 data points), "medium" (5-20), "high" (>20)

// cost_forecast_display <forecast_json>
// Renders formatted forecast to stdout (table of stages, total, confidence)

// cost_record_variance <forecast_usd> <actual_usd> <template> <issue>
// Emits cost.forecast_variance event to events.jsonl

// --force-start flag on pipeline start
// Bypasses budget gate when forecast exceeds remaining budget

Data Flow

  1. pipeline_start() → loads template config → calls cost_forecast "$PIPELINE_CONFIG" "$INTELLIGENCE_COMPLEXITY"
  2. cost_forecast() → reads template stages → looks up historical durations from events.jsonl → computes per-stage cost using cost_calculate() → returns JSON
  3. cost_forecast_display() → renders table to CLI
  4. Budget gate: compares forecast.total_usd against cost_remaining_budget() → blocks or warns
  5. On pipeline completion: cost_record_variance "$FORECAST_USD" "$total_cost" → emits cost.forecast_variance event

Error Boundaries

  • cost_forecast() failures are non-fatal — pipeline proceeds with a warning
  • Budget gate respects --force-start — never hard-blocks when override is present
  • Dashboard forecast display degrades gracefully when no forecast data exists

Files to Modify

Modified Files

  1. scripts/sw-cost.sh — Add cost_forecast(), cost_forecast_display(), cost_record_variance(), CLI subcommand forecast
  2. scripts/sw-pipeline.sh — Add --force-start flag parsing, call forecast before run_pipeline, record variance at completion
  3. scripts/sw-cost-test.sh — Add tests for forecast functions
  4. dashboard/src/views/metrics.ts — Add renderCostForecast() section
  5. dashboard/src/core/api.ts — Add /api/costs/forecast endpoint handler

No New Files Created

All logic fits naturally into existing files following the project's established patterns.


Implementation Steps

Step 1: Add default stage duration constants to sw-cost.sh

Add hardcoded default durations (seconds) per stage for cold-start scenarios:

# Default stage durations (seconds) — used when no historical data
_DEFAULT_STAGE_DURATIONS='{"intake":60,"plan":300,"design":300,"build":1200,"test":180,"review":300,"compound_quality":600,"audit":120,"pr":60,"merge":60,"deploy":120,"validate":60,"monitor":300}'

Step 2: Add cost_forecast() function to sw-cost.sh

# cost_forecast <template_config_path> [complexity]
# Returns JSON with per-stage estimates, total, and confidence level
cost_forecast() {
    local template_config="$1"
    local complexity="${2:-5}"

    # 1. Read enabled stages + their model from template
    # 2. For each stage, look up historical avg duration from events.jsonl
    #    (grep pipeline.completed events, extract stage timings by template)
    # 3. Fall back to _DEFAULT_STAGE_DURATIONS if no history
    # 4. Estimate tokens from duration (heuristic: ~50 input + ~20 output tokens/sec for active stages)
    # 5. Calculate cost per stage using cost_calculate()
    # 6. Determine confidence from data point count
    # 7. Apply complexity multiplier (complexity/5 — normalized around 1.0)
    # 8. Output JSON
}

Historical lookup: query events.jsonl for stage.completed events matching the template, compute average duration_s per stage. Count data points for confidence.

Token estimation heuristic: Based on typical Claude Code usage patterns:

  • Active stages (build): ~50 input + ~20 output tokens/second
  • Light stages (intake, pr, merge): ~20 input + ~10 output tokens/second
  • Review stages (review, compound_quality): ~40 input + ~30 output tokens/second

Step 3: Add cost_forecast_display() to sw-cost.sh

Render a formatted table:

╔══════════════════════════════════════════════════════╗
║  Pipeline Cost Forecast                              ║
╠══════════════════════════════════════════════════════╣
║  Stage              Model     Duration    Est. Cost  ║
║  ─────────────────  ────────  ──────────  ───────── ║
║  intake             haiku     1m 0s       $0.02      ║
║  plan               sonnet    5m 0s       $0.45      ║
║  build              sonnet    20m 0s      $3.60      ║
║  test               haiku     3m 0s       $0.04      ║
║  review             opus      5m 0s       $2.25      ║
║  ─────────────────  ────────  ──────────  ───────── ║
║  TOTAL                        34m 0s      $6.36      ║
║  Confidence: medium (12 historical runs)             ║
║  Budget remaining: $43.64 / $50.00                   ║
╚══════════════════════════════════════════════════════╝

Step 4: Add cost_record_variance() to sw-cost.sh

# cost_record_variance <forecast_usd> <actual_usd> <template> <issue>
cost_record_variance() {
    local forecast="$1" actual="$2" template="$3" issue="${4:-}"
    local variance pct_variance
    variance=$(awk -v f="$forecast" -v a="$actual" 'BEGIN{printf "%.4f", a - f}')
    pct_variance=$(awk -v f="$forecast" -v a="$actual" 'BEGIN{if(f>0) printf "%.1f", ((a-f)/f)*100; else print "0"}')

    emit_event "cost.forecast_variance" \
        "forecast_usd=$forecast" \
        "actual_usd=$actual" \
        "variance_usd=$variance" \
        "variance_pct=$pct_variance" \
        "template=$template" \
        "issue=$issue"
}

Step 5: Add forecast CLI subcommand to sw-cost.sh

Add to the case statement: forecast) cost_forecast_cli "$@" ;; This calls cost_forecast and pipes through cost_forecast_display for CLI usage: shipwright cost forecast --pipeline standard

Step 6: Integrate forecast into sw-pipeline.sh — flag parsing

Add --force-start to the argument parser (alongside existing --pipeline, --goal, etc.):

FORCE_START=false
# In the getopts/while loop:
--force-start) FORCE_START=true ;;

Step 7: Integrate forecast into sw-pipeline.sh — pre-start gate

After load_pipeline_config and before run_pipeline, insert:

# Cost forecast and budget gate
if [[ -f "$SCRIPT_DIR/sw-cost.sh" ]]; then
    source "$SCRIPT_DIR/sw-cost.sh"
    FORECAST_JSON=$(cost_forecast "$PIPELINE_CONFIG" "${INTELLIGENCE_COMPLEXITY:-5}" 2>/dev/null || echo "")
    if [[ -n "$FORECAST_JSON" ]]; then
        FORECAST_USD=$(echo "$FORECAST_JSON" | jq -r '.total_usd // 0')
        cost_forecast_display "$FORECAST_JSON"

        # Budget gate check
        local remaining
        remaining=$(cost_remaining_budget 2>/dev/null || echo "unlimited")
        if [[ "$remaining" != "unlimited" ]]; then
            if awk -v f="$FORECAST_USD" -v r="$remaining" 'BEGIN{exit !(f > r)}'; then
                if [[ "$FORCE_START" == "true" ]]; then
                    warn "Forecast \$${FORECAST_USD} exceeds remaining budget \$${remaining} — proceeding (--force-start)"
                else
                    error "Forecast \$${FORECAST_USD} exceeds remaining budget \$${remaining}"
                    echo -e "  Override: ${DIM}shipwright pipeline start ... --force-start${RESET}"
                    exit 1
                fi
            fi
        fi

        emit_event "cost.forecast" \
            "forecast_usd=$FORECAST_USD" \
            "template=$PIPELINE_NAME" \
            "confidence=$(echo "$FORECAST_JSON" | jq -r '.confidence')" \
            "issue=${ISSUE_NUMBER:-0}"
    fi
fi

Step 8: Record variance at pipeline completion

After the pipeline.completed event emission (around line 2700), add:

if [[ -n "${FORECAST_USD:-}" && "${FORECAST_USD}" != "0" ]]; then
    cost_record_variance "$FORECAST_USD" "$total_cost" "$PIPELINE_NAME" "${ISSUE_NUMBER:-}"
fi

Step 9: Add dashboard API endpoint

In dashboard/src/core/api.ts, add handler for /api/costs/forecast:

  • Read recent cost.forecast and cost.forecast_variance events from events.jsonl
  • Return forecast data for display

Step 10: Add dashboard forecast display

In dashboard/src/views/metrics.ts, add renderCostForecast():

  • Show last forecast with confidence indicator
  • Show forecast vs actual variance trend (bar chart)
  • Color-code: green (within 20%), yellow (20-50% off), red (>50% off)

Step 11: Add tests to sw-cost-test.sh

Test cases:

  • cost_forecast with a template file returns valid JSON with expected fields
  • cost_forecast with no historical data uses defaults and returns confidence "low"
  • cost_forecast_display produces formatted output
  • cost_record_variance emits correct event
  • Budget gate blocks when forecast > remaining
  • Budget gate allows with --force-start
  • Variance calculation correctness (positive/negative)

Task Checklist

  • Task 1: Add default stage duration constants and token-rate heuristics to sw-cost.sh
  • Task 2: Implement cost_forecast() function with historical lookup and cold-start defaults
  • Task 3: Implement cost_forecast_display() formatted CLI output
  • Task 4: Implement cost_record_variance() with event emission
  • Task 5: Add forecast CLI subcommand to sw-cost.sh case statement
  • Task 6: Add --force-start flag parsing to sw-pipeline.sh
  • Task 7: Integrate forecast display and budget gate into pipeline_start()
  • Task 8: Record forecast variance at pipeline completion in sw-pipeline.sh
  • Task 9: Add dashboard API endpoint for forecast data
  • Task 10: Add dashboard forecast display component
  • Task 11: Write tests for all forecast functions in sw-cost-test.sh
  • Task 12: Run full test suite and fix any failures

Testing Approach

Unit Tests (sw-cost-test.sh)

Following existing test patterns in the repo (mock binaries, PASS/FAIL counters):

  1. cost_forecast basic: Create a minimal template JSON, call cost_forecast, verify JSON output has total_usd, stages, confidence, data_points
  2. cost_forecast cold-start: No events.jsonl → uses defaults, confidence = "low"
  3. cost_forecast with history: Seed events.jsonl with stage.completed events, verify durations are averaged
  4. cost_forecast_display: Verify output contains stage names, costs, total line
  5. cost_record_variance: Verify event written to events.jsonl with correct fields
  6. Variance math: forecast=$5.00 actual=$6.50 → variance=$1.50, pct=30.0%
  7. Budget gate integration: Mock budget at $10, forecast at $15, verify exit code 1; with FORCE_START=true, verify exit code 0

Integration Tests

  • npm test — run existing suite to verify no regressions
  • Manual: shipwright cost forecast --pipeline cost-aware to verify CLI output

Definition of Done

  • cost_forecast() returns JSON with total_usd, stages[], confidence, data_points for any template
  • Forecast is displayed before pipeline starts in CLI with formatted table
  • Pipeline start is blocked when forecast exceeds remaining budget (exit 1)
  • --force-start flag overrides the budget gate with a warning
  • cost.forecast event emitted to events.jsonl before pipeline runs
  • cost.forecast_variance event emitted after pipeline completes with forecast vs actual
  • Dashboard shows cost forecast when pipeline is queued/starting
  • Confidence interval shown as low/medium/high based on historical data count
  • All new functions have tests in sw-cost-test.sh
  • npm test passes with no regressions
  • Bash 3.2 compatible (no associative arrays, no readarray)

Endpoint Specification

CLI Endpoint: shipwright cost forecast

  • Input: --pipeline <name> (template name, default: standard), --complexity <1-10> (default: 5)
  • Output: Formatted forecast table to stdout; --json flag for raw JSON
  • Exit codes: 0 (success), 1 (error loading template)

Event Schemas

// cost.forecast (emitted before pipeline start)
{"ts":"...","type":"cost.forecast","forecast_usd":"6.36","template":"standard","confidence":"medium","issue":"178"}

// cost.forecast_variance (emitted after pipeline completion)
{"ts":"...","type":"cost.forecast_variance","forecast_usd":"6.36","actual_usd":"7.12","variance_usd":"0.76","variance_pct":"11.9","template":"standard","issue":"178"}

Dashboard API: GET /api/costs/forecast

  • Response: { recent_forecasts: [{issue, template, forecast_usd, confidence, ts}], variance_history: [{forecast, actual, variance_pct, template, ts}] }
  • Error: { error: { code: "NO_DATA", message: "No forecast data available" } }

Rate Limiting: Not applicable (local dashboard, single-user). Versioning: Not applicable (internal API, no external consumers).


User Stories

Primary: As a pipeline operator, I want to see estimated pipeline cost before it starts, so that I can make informed go/no-go decisions and avoid budget surprises.

Secondary: As a team lead reviewing daemon activity, I want to see forecast accuracy over time in the dashboard, so that I can trust the estimates and tune budgets appropriately.

Acceptance Criteria (Given/When/Then)

  • Given a budget is set and a pipeline template is selected, when I run shipwright pipeline start, then I see a cost forecast table before execution begins
  • Given the forecast exceeds remaining budget, when I run pipeline start without --force-start, then the pipeline is blocked with an error message showing the override command
  • Given the forecast exceeds remaining budget, when I run with --force-start, then the pipeline proceeds with a warning
  • Given a pipeline completes, when I check events.jsonl, then I see a cost.forecast_variance event with forecast, actual, and percentage difference

Edge Cases from User Perspective

  1. No historical data (cold start): Uses hardcoded defaults, shows confidence as "low" — user knows estimate is rough
  2. Budget not configured: Forecast displays but gate is skipped — shows "Budget: unlimited"
  3. Forecast is $0 (all stages disabled): Shows "No billable stages" message, skips gate

Component Hierarchy (Dashboard)

MetricsView
├── renderMetrics() (existing)
│   ├── Success Rate Donut
│   ├── Duration / Throughput
│   └── Cost Breakdown
└── renderCostForecast() (new)
    ├── ForecastCard (last forecast summary)
    │   ├── Total estimate
    │   ├── Confidence badge (low/medium/high)
    │   └── Budget status bar
    └── VarianceChart (forecast vs actual history)
        └── Bar chart with color-coded variance

State Management: Forecast data fetched via api.fetchForecastData() → stored in store → rendered by renderCostForecast(). Same pattern as existing cost views.

Accessibility: Semantic HTML (table for forecast data), color + text for confidence indicators, keyboard-accessible chart tooltips.

Responsive Breakpoints: Single-column layout at 320px-768px, two-column at 1024px+ (forecast card beside variance chart). Follows existing metrics grid pattern.


Monitoring Checklist

P0 — Immediate

  • Forecast function doesn't crash or block pipeline start (non-fatal wrapper)
  • Budget gate correctly blocks/allows based on comparison
  • --force-start override works

P1 — Short-term

  • Forecast accuracy: track variance_pct in events, alert if consistently >50%
  • No regressions in existing cost tracking

P2 — Medium-term

  • Forecast model accuracy improves as more historical data accumulates
  • Dashboard variance chart shows trend toward tighter estimates

Anomaly Detection Triggers

  • Forecast variance consistently >100% for 5+ pipelines → heuristic needs recalibration
  • cost.forecast events stop appearing → integration broken

Log Analysis

  • Check for cost_forecast errors in pipeline output
  • Monitor for jq parse failures on events.jsonl queries

Auto-Rollback Decision Criteria

Not applicable — this is a pre-start gate feature, not a deployment. If the forecast function fails, the pipeline proceeds normally (fail-open design).


Alternatives Considered

Approach Pros Cons
A: Inline in sw-pipeline.sh Single file change Mixes concerns, hard to test, bloats already large file (3000+ lines)
B: Functions in sw-cost.sh (chosen) Clean separation, testable, follows existing patterns Requires sourcing sw-cost.sh in pipeline (already done for some paths)
C: Separate sw-forecast.sh script Maximum isolation Adds a new file unnecessarily, more complex wiring

Risk Analysis

Risk Impact Mitigation
Forecast crashes pipeline start High — blocks all pipelines Wrap in `
Inaccurate estimates frustrate users Medium — trust erosion Show confidence level, improve with data
Budget gate too aggressive Medium — blocks legitimate work --force-start override, configurable gate
Historical data query slow on large events.jsonl Low — one-time read at start Use tail -1000 to limit scan, cache result
Breaking Bash 3.2 compat Medium — fails on macOS No associative arrays, test on macOS

Clone this wiki locally