Skip to content

Pipeline Plan 178

ezigus edited this page Mar 16, 2026 · 4 revisions

Implementation Plan: Pipeline Cost Forecast and Budget Gate (#178)

Socratic Design Refinement

Requirements Clarity

Minimum viable change: Add cost estimation before pipeline start using template stages, historical durations, and model pricing. Display forecast, gate on budget, emit variance events.

Implicit requirements: Cold-start behavior when no historical data exists; graceful degradation with confidence levels; CLI and dashboard parity.

Acceptance criteria (from issue):

  1. Estimate cost using: template stage count x avg duration x model tier cost
  2. Display forecast before pipeline start in CLI and dashboard
  3. Block start if forecast exceeds remaining budget (configurable, --force-start override)
  4. Emit forecast vs actual cost variance to events.jsonl after pipeline completes
  5. Show cost forecast in dashboard when pipeline is queued
  6. Include confidence interval (low/medium/high) based on historical data quality

Alternatives Considered

Approach A: Inline calculation in sw-pipeline.sh - Simple but mixes concerns, harder to test standalone. Approach B: Dedicated functions in sw-cost.sh with CLI subcommand - Clean separation, independently testable, reusable from daemon and CLI. Chosen. Approach C: Separate forecast script - Over-engineering for the scope; cost functions belong with cost module.

Trade-offs: Approach B adds ~270 lines to sw-cost.sh but keeps all cost logic co-located. The pipeline integration is minimal (~42 lines), maintaining separation of concerns.

Risk Assessment

  • Cold-start estimates may be wildly inaccurate: Mitigated by confidence levels (low when <4 data points) and default duration constants.
  • Budget gate could block legitimate pipelines: Mitigated by --force-start override flag.
  • Historical data query could be slow: Mitigated by reading only last 1000 events.jsonl lines.

Architecture

Component Diagram

                    CLI / Daemon
                         |
                    sw-pipeline.sh
                    (budget gate)
                         |
              +----------+----------+
              |                     |
         sw-cost.sh            dashboard/
     (forecast engine)      (forecast display)
              |                     |
     +--------+--------+     server.ts (API)
     |        |        |          |
  forecast  display  variance   metrics.ts
  function  function function   (view)
              |
       events.jsonl
    (historical data)

Components

  1. Forecast Engine (sw-cost.sh) - Calculates estimated cost from template stages, historical durations, model pricing
  2. Budget Gate (sw-pipeline.sh) - Checks forecast against remaining budget at pipeline start
  3. Variance Tracker (sw-cost.sh) - Records forecast vs actual after pipeline completion
  4. CLI Interface (sw-cost.sh case statement) - shipwright cost forecast subcommand
  5. Dashboard Display (dashboard/) - API endpoint + metrics view for forecast data

Interface Contracts

// cost_forecast(template_config_path, complexity) → JSON
interface ForecastResult {
  total_usd: number;
  stages: Array<{
    stage: string;
    model: string;
    duration_s: number;
    estimated_cost_usd: number;
  }>;
  confidence: "low" | "medium" | "high";
  data_points: number;
  complexity_multiplier: number;
}

// cost_forecast_display(forecast_json) → formatted CLI output
// cost_record_variance(forecast_usd, actual_usd, template, issue) → event emission
// cost_check_budget(estimated_cost) → exit code: 0=ok, 1=warning, 2=blocked

Data Flow

  1. Pipeline start → load template JSON → call cost_forecast()
  2. cost_forecast() → query events.jsonl for historical stage.completed durations → compute per-stage cost → return JSON
  3. Display forecast via cost_forecast_display()
  4. Check cost_check_budget(total_usd) → block or proceed
  5. Emit cost.forecast event
  6. Pipeline runs...
  7. Pipeline completes → cost_record_variance(forecast, actual, template, issue) → emit cost.forecast_variance event

Error Boundaries

  • Forecast engine: returns defaults on missing data (no hard failure)
  • Budget gate: exits with code 1 if over budget (overridable)
  • Variance tracker: best-effort, failure doesn't block pipeline completion
  • Dashboard: returns empty arrays if no forecast data exists

Files to Modify

File Action Purpose
scripts/sw-cost.sh Modify (+271 lines) Add cost_forecast(), cost_forecast_display(), cost_record_variance(), forecast CLI subcommand
scripts/sw-pipeline.sh Modify (+42 lines) Add --force-start flag, forecast display, budget gate in pipeline_start(), variance recording at completion
scripts/sw-cost-test.sh Modify (+165 lines) Tests for forecast, display, variance, and budget gate functions
dashboard/server.ts Modify (+50 lines) /api/costs/forecast endpoint serving recent forecasts and variance history
dashboard/src/core/api.ts Modify (+19 lines) fetchCostForecast() client function
dashboard/src/views/metrics.ts Modify (+66 lines) renderCostForecast() component showing forecast table and variance chart

Implementation Steps

  1. Add default stage duration constants to sw-cost.sh — JSON map of stage → default seconds (120s baseline)
  2. Add token rate heuristics by stage category — intake/review are read-heavy (high input), build is write-heavy (high output)
  3. Implement cost_forecast() — reads template stages, queries historical durations from events.jsonl, applies complexity multiplier, computes per-stage cost using model pricing
  4. Implement confidence calculation — high (>20 data points), medium (4-20), low (<4)
  5. Implement cost_forecast_display() — formatted table with stage, model, duration, estimated cost, total, confidence, budget status
  6. Implement cost_record_variance() — computes variance USD and percentage, emits cost.forecast_variance event
  7. Add forecast CLI subcommandshipwright cost forecast [--pipeline <template>] [--complexity <N>] [--json]
  8. Parse --force-start flag in sw-pipeline.sh argument handling
  9. Integrate forecast gate into pipeline_start() — forecast, display, check budget, emit event or exit
  10. Record variance at pipeline end — capture actual cost, call cost_record_variance()
  11. Add dashboard API endpoint/api/costs/forecast returns recent forecasts and variance history from events.jsonl
  12. Add dashboard forecast view — table of recent forecasts, variance trend visualization
  13. Write tests — unit tests for forecast calculation, display output, variance computation, budget gate interaction

Task Checklist

  • Task 1: Add default stage duration constants and token-rate heuristics to sw-cost.sh
  • Task 2: Implement cost_forecast() function with historical lookup and cold-start defaults
  • Task 3: Implement cost_forecast_display() formatted CLI output
  • Task 4: Implement cost_record_variance() with event emission
  • Task 5: Add forecast CLI subcommand to sw-cost.sh case statement
  • Task 6: Add --force-start flag parsing to sw-pipeline.sh
  • Task 7: Integrate forecast display and budget gate into pipeline_start()
  • Task 8: Record forecast variance at pipeline completion in sw-pipeline.sh
  • Task 9: Add dashboard API endpoint for forecast data
  • Task 10: Add dashboard forecast display component
  • Task 11: Write tests for all forecast functions in sw-cost-test.sh
  • Task 12: Run full test suite and fix any failures

Testing Approach

  • Unit tests (sw-cost-test.sh): Test forecast calculation with mock events.jsonl data, display output format, variance computation accuracy, budget gate return codes
  • Integration: Verify shipwright cost forecast --pipeline standard produces valid JSON output
  • Regression: Run full npm test suite to ensure no existing functionality broken
  • Manual validation: shipwright cost forecast --json with and without historical data

Performance

Baseline Metrics

  • Pipeline start time: <2s (should not add noticeable latency)
  • Historical query: reads last 1000 lines of events.jsonl (typically <100KB)

Optimization Targets

  • Forecast calculation: <500ms added to pipeline start
  • No blocking I/O on the critical path beyond the single events.jsonl read

Profiling Strategy

  • Not applicable — the forecast is a one-time calculation at pipeline start, not a hot path

Benchmark Plan

  • Not applicable for this scope — the operation is bounded by a single tail -1000 | jq call

Definition of Done

  • cost_forecast() returns JSON with total_usd, stages[], confidence, data_points for any template
  • Forecast displayed before pipeline start with formatted table
  • Pipeline blocked when forecast exceeds remaining budget (exit 1)
  • --force-start overrides budget gate with warning
  • cost.forecast event emitted to events.jsonl
  • cost.forecast_variance event emitted after pipeline completes
  • Dashboard shows cost forecast data
  • Confidence interval shown as low/medium/high
  • All tests pass (npm test)

Endpoint Specification

GET /api/costs/forecast?period=30

  • Response 200:
    {
      "recent_forecasts": [{"issue": "178", "template": "standard", "forecast_usd": 5.42, "confidence": "medium", "ts": "..."}],
      "variance_history": [{"forecast_usd": 5.42, "actual_usd": 6.12, "variance_pct": 12.9, "template": "standard", "ts": "..."}]
    }
  • Error 500: {"error": "Failed to read forecast data"}

Rate Limiting

Not applicable — internal dashboard API, single-user access.

Versioning

No API versioning needed — internal tooling, not public API.

Error Codes

  • Exit 0: Forecast within budget, pipeline proceeds
  • Exit 1: Forecast exceeds budget, pipeline blocked (override with --force-start)
  • Exit 2: Budget check warning (>=80% utilization), pipeline proceeds with warning

Responsive Breakpoints

Not applicable — dashboard uses terminal-width-aware rendering, not responsive CSS breakpoints.

Accessibility Checklist

Not applicable — CLI output and terminal-based dashboard, no web accessibility requirements.

Component Hierarchy

Dashboard cost forecast is a leaf component within the metrics view — no complex state management needed.

State Management Approach

  • Forecast data flows: events.jsonl → server API → client fetch → render
  • No client-side state persistence needed; data is read-only from server

Clone this wiki locally