Skip to content

Pipeline Design 178

ezigus edited this page Mar 16, 2026 · 3 revisions

Design: Pipeline Cost Forecast and Budget Gate with Early Warning

Context

Shipwright pipelines can cost $5-$50+ per run depending on template, model routing, and complexity. Today, cost visibility is retrospective only — operators see spend after the pipeline completes via cost_dashboard() and cost.record events. The existing budget gate in sw-pipeline.sh (lines ~1692-1702) checks remaining budget per-stage during execution, but by the time it triggers, significant cost has already been incurred.

Problem: There is no pre-start cost estimate. Operators cannot make informed go/no-go decisions before committing resources. The daemon can silently exhaust budgets by queuing expensive pipelines.

Constraints:

  • Bash 3.2 compatible (macOS default) — no associative arrays, no readarray, no ${var,,}
  • All cost logic lives in scripts/sw-cost.sh; pipeline orchestration in scripts/sw-pipeline.sh
  • Events flow through ~/.shipwright/events.jsonl (newline-delimited JSON)
  • Dashboard reads from API endpoints backed by events.jsonl queries
  • Existing cost_calculate(), cost_remaining_budget(), cost_check_budget(), model pricing, and emit_event() infrastructure are reusable
  • Templates define per-stage model assignments in templates/pipelines/*.json

Decision

Approach B: New functions in sw-cost.sh + integration hook in sw-pipeline.sh.

Add three functions to sw-cost.sh:

  1. cost_forecast() — estimates total pipeline cost from template stages x model pricing x historical/default durations
  2. cost_forecast_display() — renders a formatted CLI table
  3. cost_record_variance() — emits forecast-vs-actual event at completion

Integrate into sw-pipeline.sh at two points:

  • Pre-start (after template load, before run_pipeline): display forecast, enforce budget gate
  • Post-completion (after pipeline.completed event): record variance

Component Diagram

┌──────────────────────────┐        ┌─────────────────────────────┐
│    sw-pipeline.sh        │        │       sw-cost.sh            │
│                          │        │                             │
│  pipeline_start()        │        │  cost_forecast()            │
│    ├─ load_pipeline_cfg  │        │    ├─ read template stages  │
│    ├─ ─── NEW ──────────▶│───────▶│    ├─ query history         │
│    │  forecast + gate    │◀───────│    ├─ fallback defaults     │
│    ├─ run_pipeline       │        │    ├─ cost_calculate()      │
│    │   ├─ stage loop     │        │    └─ return JSON           │
│    │   └─ budget checks  │        │                             │
│    └─ ─── NEW ──────────▶│───────▶│  cost_forecast_display()    │
│       record variance    │        │    └─ formatted table       │
│                          │        │                             │
│  --force-start flag      │        │  cost_record_variance()     │
│    └─ bypass gate        │        │    └─ emit_event            │
└──────────┬───────────────┘        └──────────┬──────────────────┘
           │                                   │
           ▼                                   ▼
┌──────────────────────────┐        ┌─────────────────────────────┐
│   events.jsonl           │        │  dashboard/                 │
│   cost.forecast          │        │   api.ts  → /api/costs/     │
│   cost.forecast_variance │        │              forecast       │
│   pipeline.completed     │◀───────│   metrics.ts → forecast     │
│   (existing events)      │        │              card + chart   │
└──────────────────────────┘        └─────────────────────────────┘

Interface Contracts

# cost_forecast <template_config_path> [complexity]
# Input:  Path to template JSON (e.g., templates/pipelines/standard.json)
#         Optional complexity score 1-10 (default: 5)
# Output: JSON to stdout
# {
#   "total_usd": "6.36",
#   "stages": [
#     {"id": "intake", "model": "haiku", "est_duration_s": 60, "est_tokens_in": 1200, "est_tokens_out": 600, "est_cost": "0.02"},
#     ...
#   ],
#   "confidence": "low|medium|high",
#   "data_points": 12,
#   "complexity_multiplier": "1.0"
# }
# Exit: 0 on success, 1 on error (caller wraps in || true)
# Errors: Writes warnings to stderr; never blocks pipeline on failure

# cost_forecast_display <forecast_json>
# Input:  JSON string (output of cost_forecast)
# Output: Formatted Unicode table to stdout
# Exit: 0 always (display-only, non-fatal)

# cost_record_variance <forecast_usd> <actual_usd> <template> [issue]
# Input:  Forecast total, actual total, template name, optional issue number
# Output: None (emits event to events.jsonl)
# Event:  cost.forecast_variance with forecast_usd, actual_usd, variance_usd, variance_pct, template, issue
# Exit: 0 always (non-fatal)

# CLI: shipwright cost forecast --pipeline <name> [--complexity <1-10>] [--json]
# Input:  Template name, optional complexity, optional raw JSON flag
# Output: Formatted table (default) or raw JSON (--json)
# Exit: 0 success, 1 template not found

# Pipeline flag: --force-start
# Bypasses forecast budget gate when forecast > remaining budget
# Emits warning via warn() but does not block
// Dashboard API: GET /api/costs/forecast
// Response (200):
interface ForecastResponse {
  recent_forecasts: Array<{
    issue: string;
    template: string;
    forecast_usd: number;
    confidence: "low" | "medium" | "high";
    ts: string;
  }>;
  variance_history: Array<{
    forecast_usd: number;
    actual_usd: number;
    variance_pct: number;
    template: string;
    ts: string;
  }>;
}
// Response (200, no data): { recent_forecasts: [], variance_history: [] }

Data Flow

  1. User runs shipwright pipeline start --issue 178
  2. pipeline_start() loads template config (load_pipeline_config -> templates/pipelines/standard.json)
  3. NEW: cost_forecast "$PIPELINE_CONFIG" "${INTELLIGENCE_COMPLEXITY:-5}" called
    • Reads template JSON -> extracts enabled stages + per-stage model assignments
    • Queries events.jsonl for stage.completed events matching template -> computes average duration_s per stage (limited to last 1000 lines via tail)
    • Falls back to _DEFAULT_STAGE_DURATIONS for stages with no history
    • Applies token-rate heuristics per stage type (build: 50in+20out/sec, review: 40in+30out/sec, light: 20in+10out/sec)
    • Calls existing cost_calculate() per stage with estimated tokens + model
    • Applies complexity multiplier (complexity / 5, normalized around 1.0)
    • Counts data points -> confidence: low (<5), medium (5-20), high (>20)
    • Returns JSON to stdout
  4. NEW: cost_forecast_display "$FORECAST_JSON" renders table
  5. NEW: Budget gate -- compares forecast.total_usd vs cost_remaining_budget()
    • If forecast > remaining and no --force-start: error() + exit 1
    • If forecast > remaining and --force-start: warn() + continue
    • If unlimited budget or forecast <= remaining: continue
  6. NEW: emit_event "cost.forecast" with forecast_usd, template, confidence, issue
  7. Pipeline executes normally (existing flow unchanged)
  8. NEW: At completion (after pipeline.completed event), cost_record_variance "$FORECAST_USD" "$total_cost" "$PIPELINE_NAME" "$ISSUE_NUMBER"

Error Boundaries

Component Failure Mode Behavior
cost_forecast() jq parse error, missing template, events.jsonl missing Returns empty string; caller skips forecast display + gate
cost_forecast_display() Malformed JSON input Prints warning, continues
Budget gate cost_remaining_budget() fails Returns "unlimited", gate skipped
cost_record_variance() emit_event fails Non-fatal, pipeline result unaffected
Historical query Large events.jsonl (>100MB) tail -1000 limits scan; stale data acceptable
Dashboard endpoint No forecast events exist Returns empty arrays, UI shows "No forecast data"

Design principle: Fail-open. The forecast system is advisory; it MUST never prevent a pipeline from running due to its own bugs. All new code paths are wrapped in || true or guarded by -n checks.

Token Estimation Heuristics

Stage categories and estimated tokens per second of active execution:

Category Stages Input tok/s Output tok/s Rationale
Heavy build 50 20 Extended code generation, iteration
Review review, compound_quality, audit 40 30 Code reading + detailed feedback
Light intake, plan, design, pr, merge, deploy, validate, monitor 20 10 Short prompts, structured output
Minimal test 10 5 Test execution is mostly compute, not tokens

These are calibration starting points. As cost.forecast_variance events accumulate, heuristics can be tuned (future work, not in scope).

Cold-Start Strategy

When no historical stage.completed events exist for a template:

{
  "intake": 60, "plan": 300, "design": 300, "build": 1200,
  "test": 180, "review": 300, "compound_quality": 600, "audit": 120,
  "pr": 60, "merge": 60, "deploy": 120, "validate": 60, "monitor": 300
}

These defaults are conservative (sum: ~53 minutes). Confidence is reported as "low" with data_points: 0, clearly signaling to the operator that the estimate is rough.

Alternatives Considered

  1. Inline in sw-pipeline.sh -- Pros: single file change, no cross-file sourcing / Cons: sw-pipeline.sh is already 3000+ lines; mixes cost logic with orchestration; harder to unit test forecast math independently; violates existing separation (cost logic in sw-cost.sh)

  2. Separate sw-forecast.sh script -- Pros: maximum isolation, independent versioning / Cons: creates a new file unnecessarily; would need to duplicate or re-source pricing and cost_calculate(); more complex wiring for a feature that is fundamentally a cost concern

  3. Functions in sw-cost.sh (chosen) -- Pros: follows existing pattern (all cost logic in one file); reuses cost_calculate(), pricing data, cost_remaining_budget() directly; testable via existing sw-cost-test.sh harness; minimal new wiring (sw-pipeline.sh already sources sw-cost.sh for budget checks) / Cons: sw-cost.sh grows, but the functions are cohesive with existing cost concerns

Implementation Plan

Files to modify (with full paths)

  1. scripts/sw-cost.sh -- Add cost_forecast(), cost_forecast_display(), cost_record_variance(), forecast CLI subcommand, default stage duration constants, token-rate heuristics
  2. scripts/sw-pipeline.sh -- Add --force-start flag parsing, pre-start forecast call + budget gate, post-completion variance recording
  3. scripts/sw-cost-test.sh -- Add test cases for forecast functions, variance math, budget gate behavior
  4. dashboard/src/views/metrics.ts -- Add renderCostForecast() section (forecast card + variance chart)
  5. dashboard/src/core/api.ts -- Add fetchForecastData() and /api/costs/forecast endpoint handler

Files to create

None. All logic fits into existing files following established patterns.

Dependencies

None new. Uses existing jq, awk, tail, and Shipwright infrastructure.

Risk areas

  • Large events.jsonl performance: Mitigated by tail -1000 limit on historical queries
  • Bash 3.2 compatibility: No associative arrays; use JSON + jq for structured data. Token-rate lookup uses a case statement, not arrays.
  • Forecast accuracy at low data points: Clearly communicated via confidence level; does not block with --force-start
  • Variable scope in sw-pipeline.sh: FORECAST_USD must be visible at completion time. Declare at function scope alongside existing TOTAL_INPUT_TOKENS etc.
  • Template model field availability: Some templates may not specify per-stage models. Fall back to template defaults.model (always present).

Dashboard Component Hierarchy

MetricsView
  renderMetrics() (existing -- unchanged)
    Success Rate Donut
    Duration / Throughput
    Cost Breakdown (by model, stage, issue, budget)
  renderCostForecast() (NEW)
    ForecastCard
      Total estimate ($X.XX)
      Confidence badge (low=gray, medium=yellow, high=green)
      Budget status bar (reuses existing budget bar pattern)
    VarianceChart
      Horizontal bars: forecast vs actual per pipeline
      Color: green (|variance| <= 20%), yellow (20-50%), red (>50%)

State management: api.fetchForecastData() -> stored in component-local state -> rendered. Same pattern as fetchCostBreakdown() in existing metrics view.

Accessibility: Semantic <table> for forecast data. Confidence uses color + text label. Variance bars include aria-label with numeric values. Keyboard-navigable chart tooltips.

Responsive breakpoints: Single-column stack at 320-768px. Two-column (forecast card | variance chart) at 1024px+. Follows existing metrics grid CSS.

Data Pipeline: Schema and Flow

Event Schemas (new events)

// cost.forecast -- emitted before pipeline starts
{
  "ts": "2026-03-16T08:00:00Z",
  "ts_epoch": 1773897600,
  "type": "cost.forecast",
  "forecast_usd": "6.36",
  "template": "standard",
  "confidence": "medium",
  "data_points": "12",
  "issue": "178"
}

// cost.forecast_variance -- emitted after pipeline completes
{
  "ts": "2026-03-16T08:45:00Z",
  "ts_epoch": 1773900300,
  "type": "cost.forecast_variance",
  "forecast_usd": "6.36",
  "actual_usd": "7.12",
  "variance_usd": "0.7600",
  "variance_pct": "11.9",
  "template": "standard",
  "issue": "178"
}

Data Flow Diagram

Template JSON ──┐
                ├──> cost_forecast() ──> JSON stdout ──> cost_forecast_display()
events.jsonl ───┘         |                                      |
  (stage.completed         |                                     v
   history query)          v                              CLI table output
                    emit_event                                   |
                    "cost.forecast" ──> events.jsonl              |
                                          |                      v
                                          |              Budget gate check
                                          |              (cost_remaining_budget)
                                          |                      |
                                          |         ┌────────────┴────────────┐
                                          |         | forecast <= budget      | forecast > budget
                                          |         | -> proceed              | -> --force-start?
                                          |         |                         |   yes: warn + proceed
                                          |         |                         |   no: exit 1
                                          |         └────────────┬────────────┘
                                          |                      |
                                          |              run_pipeline()
                                          |                      |
                                          |              pipeline.completed
                                          |                      |
                                          |                      v
                                          |         cost_record_variance()
                                          |                      |
                                          └──────── emit_event <─┘
                                                   "cost.forecast_variance"
                                                         |
                                                         v
                                              Dashboard API query
                                              /api/costs/forecast

Failure points: Template not found (-> skip forecast), events.jsonl missing (-> use defaults), jq unavailable (-> skip forecast), budget.json missing (-> "unlimited").

Idempotency Strategy

  • cost_forecast() is a pure read -- no side effects, safe to call multiple times
  • emit_event appends to events.jsonl -- duplicate events are tolerable (timestamped, queryable)
  • cost_record_variance() is fire-and-forget -- duplicate variance events don't corrupt state
  • No schema migrations needed -- events.jsonl is append-only with no schema versioning

Rollback Plan

  1. Revert the commits touching sw-cost.sh, sw-pipeline.sh, sw-cost-test.sh, dashboard files
  2. No data migration needed -- cost.forecast and cost.forecast_variance events in events.jsonl are inert (nothing reads them after rollback)
  3. No config changes to undo
  4. --force-start flag becomes unrecognized (harmless -- falls through existing parser)

Validation Criteria

  • cost_forecast returns valid JSON with total_usd, stages[], confidence, data_points for every template in templates/pipelines/
  • Cold-start (no events.jsonl history) returns confidence "low" with default durations
  • With seeded stage.completed events, historical averages are used and confidence reflects data point count
  • cost_forecast_display renders readable table with stage breakdown, total, confidence, and budget status
  • shipwright cost forecast --pipeline standard works as CLI command
  • shipwright cost forecast --pipeline standard --json outputs raw JSON
  • Pipeline start shows forecast table before execution
  • When forecast > remaining budget: exit 1 with override hint
  • When forecast > remaining budget + --force-start: warn + proceed
  • When budget is unlimited: forecast displays, gate skipped
  • cost.forecast event emitted to events.jsonl with correct fields
  • cost.forecast_variance event emitted after completion with correct variance math
  • Dashboard /api/costs/forecast returns forecast and variance data
  • Dashboard renders forecast card and variance chart
  • All new functions have tests in sw-cost-test.sh
  • npm test passes with no regressions
  • No Bash 3.2 incompatibilities (no associative arrays, no readarray)
  • Forecast function failure does not block pipeline start (fail-open verified)

Clone this wiki locally