Skip to content

Pipeline Plan 178

ezigus edited this page Mar 17, 2026 · 4 revisions

Now I have a complete picture. Here's the implementation plan:


Implementation Plan: Pipeline Cost Forecast and Budget Gate

Brainstorming / Design Decisions

Alternatives Considered

Approach A: Simple multiplier (stage_count × flat_rate)

  • Pros: Trivial to implement, no historical data needed
  • Cons: Inaccurate — ignores model tiers, stage duration variance, complexity
  • Verdict: Too simplistic for useful go/no-go decisions

Approach B: Per-stage forecast using template model assignments + historical durations (CHOSEN)

  • Pros: Leverages existing per-stage model config from templates, uses real event history, provides confidence intervals, minimal new infrastructure
  • Cons: Requires parsing events.jsonl for history; cold-start needs defaults
  • Verdict: Best accuracy/complexity tradeoff. Builds on existing estimate_pipeline_cost() pattern but makes it per-stage aware

Approach C: ML regression model trained on historical runs

  • Pros: Most accurate long-term
  • Cons: Massive over-engineering for current data volume; requires training pipeline
  • Verdict: Future enhancement when data volume justifies it

Minimum Viable Change

  • cost_forecast() in sw-cost.sh → per-stage JSON output with confidence
  • Budget gate in pipeline_start() → block or warn before stages run
  • --force-start override flag
  • Variance event emission at pipeline completion
  • shipwright cost forecast CLI command
  • Dashboard: show forecast on queued items

Risk Assessment

  • Breaking existing pipelines: Low risk — forecast is advisory pre-start; budget gate respects --ignore-budget and new --force-start
  • Cold start (no history): Handled by default stage durations + "low" confidence
  • Bash 3.2 compat: Must avoid associative arrays; use jq for all JSON manipulation
  • Performance: Scanning events.jsonl could be slow with large files → use tail -1000 + grep filter

Files to Modify

File Action Purpose
scripts/sw-cost.sh Modify Add cost_forecast(), cost_forecast_display(), cost_record_variance(), forecast CLI subcommand
scripts/sw-pipeline.sh Modify Add --force-start flag, hook forecast + budget gate into pipeline_start(), emit variance at completion
config/event-schema.json Modify Add cost.forecast and cost.forecast_variance event types
dashboard/src/types/api.ts Modify Add CostForecast interface, extend QueueItem with forecast fields
dashboard/server.ts Modify Add /api/costs/forecast endpoint, include forecast in queue data
dashboard/src/views/pipelines.ts Modify Display forecast for queued pipelines
src/cost-forecast.test.js Create Unit tests for forecast logic
scripts/sw-pipeline-test.sh Modify Add forecast + budget gate integration tests

Implementation Steps

Step 1: Add cost_forecast() engine to sw-cost.sh

Add after cost_remaining_budget() (~line 310):

  • Default stage durations JSON constant (seconds): intake=60, plan=300, design=300, build=1200, test=180, review=300, compound_quality=600, audit=120, pr=60, merge=60, deploy=120, validate=60, monitor=300

  • Token rate constants per stage type (tokens/second): build=50in/20out, review/compound_quality=40in/30out, test=10in/5out, default=20in/10out

  • cost_forecast(pipeline_config_path, complexity) function:

    1. Read template JSON → extract enabled stages with their model assignments
    2. Query historical durations from events.jsonl: grep "stage.completed" → group by stage → compute avg duration and count
    3. For each enabled stage: use historical avg duration (or default), apply complexity multiplier (complexity / 5.0), estimate tokens from duration × token rates, calculate cost via cost_calculate()
    4. Sum total cost, determine confidence level (≥20 data points=high, 5-19=medium, <5=low)
    5. Output JSON: {total_usd, low_usd, high_usd, stages: [{id, model, est_duration_s, est_cost}], confidence, data_points, complexity_multiplier}
    6. Confidence interval: low = total × 0.7, high = total × 1.5 (medium); narrower for high confidence
  • cost_forecast_display(forecast_json) function: Pretty-print forecast table with per-stage breakdown, total, confidence level, and budget comparison

  • cost_record_variance(forecast_usd, actual_usd, confidence, template, issue) function: Emit cost.forecast_variance event with forecast, actual, variance USD, variance %, template, issue

  • CLI subcommand forecast: shipwright cost forecast [--pipeline standard] [--complexity 5] [--json]

Step 2: Add --force-start flag to sw-pipeline.sh

  • Add FORCE_START=false to defaults (~line 299)
  • Add --force-start) FORCE_START=true; shift ;; to argument parser (~line 460)
  • Add help text for --force-start

Step 3: Hook forecast + budget gate into pipeline_start()

After load_pipeline_config (~line 2438), before state file creation:

  1. Call cost_forecast "$PIPELINE_CONFIG" "${INTELLIGENCE_COMPLEXITY:-5}"
  2. Save forecast JSON to $ARTIFACTS_DIR/cost-forecast.json
  3. Display forecast via cost_forecast_display
  4. Get remaining budget via cost_remaining_budget
  5. Budget gate logic:
    • If budget is "unlimited" → skip gate
    • If forecast high_usd > remaining budget AND FORCE_START != true AND IGNORE_BUDGET != true → block with error message showing forecast vs budget, suggest --force-start
    • If forecast total_usd > 50% of remaining budget → warn (don't block)
  6. Emit cost.forecast event with forecast_usd, confidence, template, issue
  7. Store PIPELINE_FORECAST_USD for variance tracking at completion

Step 4: Emit variance at pipeline completion

At pipeline completion (~line 2700 for success, ~line 2752 for failure):

  1. If PIPELINE_FORECAST_USD is set, call cost_record_variance "$PIPELINE_FORECAST_USD" "$total_cost" "$FORECAST_CONFIDENCE" "$PIPELINE_NAME" "${ISSUE_NUMBER:-0}"

Step 5: Update event schema

Add to config/event-schema.json:

  • cost.forecast: fields = forecast_usd, low_usd, high_usd, confidence, template, issue, complexity, data_points
  • cost.forecast_variance: fields = forecast_usd, actual_usd, variance_usd, variance_pct, confidence, template, issue

Step 6: Dashboard — API endpoint

In dashboard/server.ts, add /api/costs/forecast endpoint:

  • Accept query param ?pipeline=standard&complexity=5
  • Shell out to shipwright cost forecast --pipeline X --complexity Y --json
  • Return JSON response

Extend the /api/state queue items to include forecast data when available (read from pipeline artifacts).

Step 7: Dashboard — types and UI

In dashboard/src/types/api.ts:

  • Add CostForecast interface: {total_usd, low_usd, high_usd, confidence, stages, data_points}
  • Extend QueueItem with forecast?: CostForecast

In dashboard/src/views/pipelines.ts:

  • When rendering queued items, show forecast if available: "Est: $45-$60 (medium confidence)"

Step 8: Tests

Unit tests (src/cost-forecast.test.js):

  • cost_forecast with mock template and no history → returns defaults with low confidence
  • cost_forecast with mock history → returns historical averages with appropriate confidence
  • cost_forecast with complexity multiplier → scales durations correctly
  • cost_record_variance → emits correct event
  • Budget gate logic: blocks when over budget, warns at 50-100%, passes when under

Integration tests (add to scripts/sw-pipeline-test.sh):

  • Pipeline start with forecast display
  • Pipeline blocked by budget gate → verify exit code + message
  • Pipeline with --force-start overrides gate
  • Variance event emitted after completion

Task Checklist

  • Task 1: Add default stage durations and token rate constants to sw-cost.sh
  • Task 2: Implement cost_forecast() function with historical data lookup and confidence levels
  • Task 3: Implement cost_forecast_display() CLI table renderer
  • Task 4: Implement cost_record_variance() function
  • Task 5: Add forecast CLI subcommand to sw-cost.sh router
  • Task 6: Add --force-start flag and FORCE_START variable to sw-pipeline.sh
  • Task 7: Hook forecast generation + budget gate into pipeline_start() before stage execution
  • Task 8: Hook variance tracking into pipeline completion (success + failure paths)
  • Task 9: Update config/event-schema.json with new event types
  • Task 10: Add CostForecast type and /api/costs/forecast endpoint to dashboard
  • Task 11: Update dashboard pipelines view to display forecast for queued items
  • Task 12: Write unit tests for forecast engine and variance tracking
  • Task 13: Add integration tests to sw-pipeline-test.sh for budget gate behavior
  • Task 14: Run full test suite and fix any regressions

Testing Approach

  1. Unit tests: Mock events.jsonl with known data, verify forecast calculations match expected values. Test confidence thresholds at boundaries (4, 5, 19, 20 data points). Test cold-start defaults.
  2. Integration tests: Use mock binaries from existing test harness. Create temp budget.json with low limit, verify pipeline start is blocked. Verify --force-start override. Verify variance event in events.jsonl after completion.
  3. Dashboard tests: Verify /api/costs/forecast returns valid JSON. Verify queue rendering includes forecast display.
  4. Manual verification: shipwright cost forecast --pipeline cost-aware --json produces valid output. shipwright pipeline start --issue N --dry-run shows forecast.

Definition of Done

  • shipwright cost forecast CLI command works with all template types
  • Forecast displayed before pipeline start (both interactive and headless)
  • Pipeline blocked when forecast exceeds remaining budget (with clear error message)
  • --force-start override bypasses budget gate with acknowledgment
  • --ignore-budget also bypasses forecast gate (backward compatible)
  • cost.forecast event emitted to events.jsonl at pipeline start
  • cost.forecast_variance event emitted at pipeline completion
  • Confidence intervals: low (<5 runs), medium (5-19), high (≥20)
  • Dashboard shows forecast for queued pipelines
  • All existing tests pass (npm test)
  • New tests cover forecast calculation, budget gate, variance tracking
  • Bash 3.2 compatible (no associative arrays, no bash 4+ features)

User Stories

Primary: As a pipeline operator, I want to see estimated cost before a pipeline starts, so that I can make an informed go/no-go decision and avoid surprise budget overruns.

Secondary: As a team lead monitoring costs, I want the system to automatically block pipelines that would exceed our daily budget, so that runaway spending is prevented without manual oversight.

Edge Cases

  1. No historical data (cold start): Uses conservative defaults with "low" confidence label — user knows estimate is rough
  2. Budget not configured: Forecast still displayed but gate is skipped (matches existing cost_remaining_budget behavior)
  3. Forecast far exceeds budget but --force-start used: Pipeline proceeds with warning logged and event emitted for audit
  4. Template with all stages disabled: Forecast returns $0.00 with note
  5. Model pricing changed mid-day: Forecast uses current pricing; variance tracking captures the drift

Endpoint Specification

GET /api/costs/forecast?pipeline=standard&complexity=5

  • Response 200: {total_usd, low_usd, high_usd, confidence, stages: [{id, model, est_duration_s, est_cost}], data_points}
  • Response 400: {error: {code: "invalid_template", message: "Unknown pipeline template: foo"}}
  • No auth required (local dashboard)
  • No rate limiting (local-only)
  • No versioning needed (internal API)

This plan adds ~250 lines to sw-cost.sh, ~50 lines to sw-pipeline.sh, and ~100 lines of tests. The blast radius is contained to cost infrastructure and the pipeline start path — no existing stage execution logic is modified.

Clone this wiki locally