Skip to content

Pipeline Plan 178

ezigus edited this page Mar 17, 2026 · 4 revisions

Implementation Plan: Pipeline Cost Forecast and Budget Gate with Early Warning

Issue: #178

Pipeline: full | Branch: feat/pipeline-cost-forecast-and-budget-gate-w-178


Socratic Design Refinement

Requirements Clarity

Minimum viable change: Estimate pipeline cost pre-start, display it, block if over budget, emit variance post-completion. The acceptance criteria are explicit and well-scoped.

Implicit requirements:

  • Cold-start behavior when no historical data exists (use conservative defaults)
  • The forecast CLI subcommand (shipwright cost forecast) for standalone use
  • Dashboard integration for queued pipeline visibility

Acceptance criteria (from issue):

  1. Estimate pipeline cost using: template stage count x avg duration x model tier cost
  2. Display forecast before pipeline start in CLI and dashboard
  3. Block start if forecast exceeds remaining budget (configurable, with --force-start override)
  4. Emit forecast vs actual cost variance to events.jsonl after pipeline completes
  5. Show cost forecast in dashboard when pipeline is queued
  6. Include confidence interval (low/medium/high) based on historical data quality

Design Alternatives

Approach A: Shell-native in sw-cost.sh (chosen)

  • Add cost_forecast(), cost_forecast_display(), cost_record_variance() to existing sw-cost.sh
  • Integrate budget gate directly into pipeline_start() in sw-pipeline.sh
  • Tradeoffs: +consistent with codebase patterns, +reuses existing pricing/budget infra, +no new dependencies, -bash arithmetic limitations (mitigated by awk)

Approach B: Node.js forecast module

  • Implement in TypeScript alongside dashboard
  • Tradeoffs: +better floating point, +easier testing, -breaks shell-first convention, -requires Node.js at pipeline start, -two systems to maintain

Decision: Approach A -- minimal blast radius, reuses existing cost_calculate(), cost_check_budget(), cost_remaining_budget(), pricing system, and event emission. Dashboard gets a read-only API endpoint.

Risk Assessment

  • Risk: Inaccurate forecasts with no history -> Mitigated by conservative default stage durations and "low" confidence label
  • Risk: Budget gate blocks legitimate work -> Mitigated by --force-start and --ignore-budget overrides
  • Risk: awk floating-point drift -> Acceptable for cost estimates (4 decimal places)
  • Risk: events.jsonl query slowness -> Mitigated by reading only last 1000 lines

Simplicity Check

All changes stay within 3 existing files (sw-cost.sh, sw-pipeline.sh, dashboard) plus tests. No new files needed for core logic. Reuses existing pricing, budget, and event infrastructure entirely.


Current Implementation Status

The feature was implemented in a prior pipeline iteration (commits abd44a8 through 055448a). All 12 tasks were marked complete. The implementation includes:

What Exists (verified by code review)

Component File Status
cost_forecast() sw-cost.sh:941-1058 Complete -- historical lookup, cold-start defaults, complexity multiplier, confidence levels
cost_forecast_display() sw-cost.sh:1062-1121 Complete -- formatted table with stage breakdown, confidence indicator, budget status
cost_record_variance() sw-cost.sh:1123-1138 Complete -- emits cost.forecast_variance event with USD and % variance
_forecast_token_rates() sw-cost.sh:925-937 Complete -- per-stage-category token rate heuristics
Default stage durations sw-cost.sh:921 Complete -- JSON constant for cold-start fallback
forecast CLI subcommand sw-cost.sh (cost_forecast_cli) Complete -- shipwright cost forecast --pipeline <name>
--force-start flag sw-pipeline.sh:300,463 Complete -- parsed and stored as FORCE_START variable
Budget gate in pipeline_start sw-pipeline.sh:2597-2627 Complete -- blocks if forecast > remaining, supports override
Forecast event emission sw-pipeline.sh:2620-2624 Complete -- cost.forecast event with USD, template, confidence, issue
Variance recording at completion sw-pipeline.sh:2957-2962 Complete -- calls cost_record_variance with forecast vs actual
Dashboard API /api/costs/forecast dashboard/server.ts:3721-3769 Complete -- returns recent_forecasts + variance_history
Dashboard renderCostForecast() dashboard/src/views/metrics.ts:479-542 Complete -- forecast card + variance accuracy table
Dashboard API client dashboard/src/core/api.ts Complete -- fetchCostForecast() method
Tests sw-cost-test.sh:230-400 Complete -- forecast JSON, display, variance, --force-start, pipeline integration

Gaps Identified

  1. No early warning at 50-100% budget consumption -- The cost intelligence design guidance says "Warn (don't block) if forecast is 50-100% of remaining budget." Current code only blocks when forecast > remaining. No warning for high-consumption scenarios.

  2. No confidence range display -- Forecast shows point estimate ($X.XX) but the issue requests confidence intervals (e.g., "$50-$70 (medium confidence)"). Currently only shows confidence level as a label.

  3. No estimated_cost on queued dashboard items -- The QueueItem type in dashboard has an estimated_cost? field, but the daemon dispatch doesn't populate it when queuing issues. Dashboard shows forecast in the metrics view but not inline on queued items.

  4. Forecast event missing data_points field -- The cost.forecast event should include data_points for observability completeness.


Files to Modify

File Action Purpose
scripts/sw-cost.sh Modify Add confidence range (low/high bounds) to forecast JSON output
scripts/sw-cost.sh Modify Update cost_forecast_display() to show cost range
scripts/sw-pipeline.sh Modify Add early warning when forecast consumes 50-100% of remaining budget
scripts/sw-pipeline.sh Modify Add data_points field to cost.forecast event
scripts/lib/daemon-dispatch.sh Modify Attach forecast_usd to queued job metadata
dashboard/server.ts Modify Include estimated_cost in queue item response
scripts/sw-cost-test.sh Modify Add tests for confidence range and early warning

Implementation Steps

Step 1: Add confidence range bounds to cost_forecast()

In sw-cost.sh, enhance the forecast JSON output to include low_usd and high_usd bounds:

  • High confidence: +/-15% range (tight)
  • Medium confidence: +/-30% range
  • Low confidence: +/-50% range (wide, conservative)

Modify the final jq -n call in cost_forecast() (line 1052) to compute and add low_usd and high_usd fields based on confidence level and total_usd.

Step 2: Update cost_forecast_display() to show range

Change the TOTAL line from $X.XX to $LOW - $HIGH format when confidence is not "high". For high confidence, keep the point estimate. Update the Confidence line: medium (5 historical runs) -- est. $45-$75.

Step 3: Add early warning in pipeline budget gate

In sw-pipeline.sh lines 2604-2627, after the "exceeds budget" check, add a warning check:

  • If forecast consumes >50% of remaining budget but doesn't exceed it, emit warn with message like "Forecast $X will consume Y% of remaining budget $Z"
  • Emit cost.budget_high_usage event for tracking

Step 4: Add data_points to forecast event

In sw-pipeline.sh line 2620-2624, add data_points= to the emit_event "cost.forecast" call.

Step 5: Populate estimated_cost for queued dashboard items

In daemon-dispatch.sh, after computing the forecast for pre-spawn budget check, attach forecast_usd to the job metadata so the dashboard /api/status endpoint can include it in queue items.

Step 6: Include estimated_cost in dashboard queue response

In dashboard/server.ts, when building queue items for /api/status, read the forecast_usd from job metadata and include it as estimated_cost in the response.

Step 7: Update tests

Add to sw-cost-test.sh:

  • Test that forecast JSON includes low_usd and high_usd fields
  • Test that confidence range width varies by confidence level
  • Verify early warning behavior via grep in sw-pipeline.sh

Step 8: Run full test suite

Execute npm test and fix any failures.


Task Checklist

  • Task 1: Add low_usd and high_usd confidence range fields to cost_forecast() JSON output in sw-cost.sh
  • Task 2: Update cost_forecast_display() to show cost range instead of point estimate
  • Task 3: Add early warning in sw-pipeline.sh when forecast consumes 50-100% of remaining budget
  • Task 4: Add data_points field to cost.forecast event emission in sw-pipeline.sh
  • Task 5: Attach forecast_usd to queued job metadata in daemon-dispatch.sh for dashboard
  • Task 6: Include estimated_cost in dashboard queue item response in server.ts
  • Task 7: Add tests for confidence range fields and early warning in sw-cost-test.sh
  • Task 8: Run full test suite and verify all tests pass

Dependencies: Task 1 blocks Tasks 2 and 7. Task 5 blocks Task 6. All other tasks are independent.


Testing Approach

  1. Unit tests (sw-cost-test.sh):

    • Verify cost_forecast() JSON output includes low_usd, high_usd fields
    • Verify range width varies: low confidence has wider range than high
    • Verify cost_forecast_display() output includes range format
    • Grep-based verification that early warning code exists in sw-pipeline.sh
  2. Integration verification:

    • shipwright cost forecast --pipeline standard --json returns complete JSON with new fields
    • shipwright cost forecast --pipeline standard displays range in formatted output
  3. Regression:

    • npm test -- full test suite must pass
    • Existing cost tests must continue to pass unchanged

Definition of Done

  • cost_forecast() returns JSON with total_usd, low_usd, high_usd, stages[], confidence, data_points, complexity_multiplier
  • Confidence range displayed in CLI forecast table (e.g., "$45 - $75 (medium confidence)")
  • Pipeline warns (but doesn't block) when forecast is 50-100% of remaining budget
  • cost.forecast event includes data_points field
  • Dashboard queue items show estimated_cost when available
  • All existing tests continue to pass
  • New tests cover confidence range and early warning

Alternatives Considered

Approach Pros Cons Decision
A: Enhance existing implementation Minimal changes, builds on working code, low risk Incremental, not a rewrite Chosen -- feature is 90% done, only gaps remain
B: Statistical confidence intervals More rigorous (std dev based), adapts automatically Requires sufficient historical data, complex bash math, overkill for current data volume Deferred -- can iterate later when more data exists
C: Do nothing (accept current state) Zero risk Misses early warning and range display from acceptance criteria Rejected -- gaps are small but meaningful for user experience

Risk Analysis

Risk Impact Likelihood Mitigation
Confidence range math error in awk Low -- cosmetic only, doesn't affect blocking Low Unit test verifies bounds
Early warning spam when budget is tight Medium -- annoying repeated warnings Low Only warn once per pipeline start
daemon-dispatch.sh changes break spawning High -- daemon stops working Low Minimal change (add metadata field), test daemon start
Dashboard type mismatch Low -- TypeScript catches at build Very low Existing QueueItem type already has estimated_cost? optional field

User Stories

Primary: As a pipeline operator, I want to see the estimated cost range before a pipeline starts, so that I can make informed go/no-go decisions based on my budget.

Secondary: As a team lead monitoring multiple daemon pipelines, I want early warning when a pipeline will consume a large portion of my remaining budget, so that I can intervene before the budget is exhausted.

Acceptance Criteria (Given/When/Then)

  • Given a configured budget and a pipeline about to start, When the forecast exceeds remaining budget, Then the pipeline is blocked with an error and a --force-start override hint
  • Given a configured budget, When the forecast consumes >50% of remaining budget, Then a warning is displayed (but pipeline proceeds)
  • Given historical pipeline data, When a forecast is generated, Then confidence range bounds reflect data quality (tight for high confidence, wide for low)
  • Given a completed pipeline with a pre-start forecast, When the pipeline finishes, Then a cost.forecast_variance event is emitted with forecast, actual, and percentage variance

Edge Cases from User Perspective

  1. First run (no history): Shows "low confidence" with wide range ($X - $Y), uses conservative defaults -- user sees worst-case estimate
  2. Budget nearly exhausted: Warns at 50%+ consumption, blocks at 100%+ -- user gets graduated feedback
  3. Force override: --force-start bypasses budget gate with explicit warning -- user maintains control

Endpoint Specification

GET /api/costs/forecast

  • Request: ?period=30 (days, optional, default 30)
  • Response: { recent_forecasts: [{issue, template, forecast_usd, confidence, ts}], variance_history: [{forecast_usd, actual_usd, variance_pct, template, ts}] }
  • Status codes: 200 OK, 500 Internal Server Error
  • No auth required (local dashboard)

Error Codes

  • 200: Success with data
  • 200 with empty arrays: No forecast data available yet
  • 500: Internal error reading events.jsonl

Rate Limiting

Not applicable -- local CLI tool and dashboard, not a public API.

Versioning

Not applicable -- internal API, no external consumers.


Component Hierarchy (Dashboard)

MetricsView
  |-- renderCostBreakdown()       -- existing
  |-- renderCostTrend()           -- existing
  |-- renderCostForecast()        -- enhanced (shows range)
  |     |-- Forecast Card         -- latest forecast with confidence range
  |     +-- Variance Table        -- historical accuracy
  +-- ... other metrics

State Management Approach

  • Dashboard fetches forecast data via api.fetchCostForecast() -> REST endpoint
  • Data flows: server.ts reads events.jsonl -> filters forecast/variance events -> returns JSON -> metrics.ts renders
  • No local state needed -- pure render from server data

Accessibility Checklist

  • Semantic HTML tables for variance data (existing)
  • Color-coded confidence uses text labels alongside color (existing: "high"/"medium"/"low" text)
  • Keyboard navigable (standard HTML elements)

Responsive Breakpoints

Not applicable -- dashboard is a developer tool used on desktop monitors. Existing responsive behavior from dashboard framework applies.

Clone this wiki locally