-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Design 178
Now let me write the ADR with concrete codebase references.
Shipwright pipelines can run 12 stages, each consuming model tokens at different rates (Opus at $15/$75 per M tokens, Sonnet at $3/$15, Haiku at $0.25/$1.25). Today, estimate_pipeline_cost() in sw-pipeline.sh provides a rough aggregate estimate (~8K input / ~4K output per stage), and cost_check_budget() in sw-cost.sh:197 only checks whether the daily budget is already exceeded — it cannot predict whether a pipeline about to start will blow the budget. Operators discover cost overruns after the fact.
Constraints from the codebase:
- Bash 3.2 compatibility — no associative arrays, no
readarray, no${var,,} - All JSON manipulation via
jq --arg(no string interpolation) - Events written to
~/.shipwright/events.jsonlviaemit_event -
set -euo pipefailin all scripts;grep -cunder pipefail produces double output (use|| true+${var:-0}) - Dashboard is TypeScript/Bun with vitest; shell tests use
lib/test-helpers.shassertions - Existing
--ignore-budgetflag on pipeline start (line ~460 ofsw-pipeline.sh)
Approach B: Per-stage forecast using template model assignments + historical durations.
The forecast engine lives in sw-cost.sh as new functions, called by sw-pipeline.sh before stage execution begins. This keeps cost logic centralized (single responsibility) and reuses the existing cost_calculate() function for per-model pricing.
┌─────────────────────────────┐
│ Pipeline Template JSON │
│ (enabled stages + models) │
└──────────────┬──────────────┘
│
▼
┌──────────────┐ ┌─────────────────────────────┐ ┌──────────────┐
│ events.jsonl │────▶│ cost_forecast() engine │────▶│ forecast.json│
│ (history) │ │ in sw-cost.sh │ │ (artifact) │
└──────────────┘ └──────────────┬──────────────┘ └──────────────┘
│
┌─────────┴─────────┐
▼ ▼
┌──────────────┐ ┌────────────────┐
│ Budget Gate │ │ CLI / Dashboard │
│ (block/warn) │ │ (display) │
└──────────────┘ └────────────────┘
│
▼
┌────────────────────────┐
│ Pipeline runs stages │
│ ... │
│ On completion: │
│ cost_record_variance() │
└────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│ sw-cost.sh │
│ │
│ ┌──────────────────┐ ┌──────────────────────────────┐ │
│ │ cost_calculate() │ │ cost_forecast() │ │
│ │ (existing) │◀─│ - reads template stages │ │
│ └──────────────────┘ │ - queries event history │ │
│ │ - applies complexity mult │ │
│ ┌──────────────────┐ │ - computes confidence │ │
│ │ cost_remaining_ │ └──────────────────────────────┘ │
│ │ budget() (exists) │ │
│ └──────────────────┘ ┌──────────────────────────────┐ │
│ │ cost_forecast_display() │ │
│ │ - renders table to stdout │ │
│ └──────────────────────────────┘ │
│ ┌──────────────────────────────┐ │
│ │ cost_record_variance() │ │
│ │ - emits forecast vs actual │ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ sw-pipeline.sh │
│ │
│ pipeline_start(): │
│ 1. load_pipeline_config │
│ 2. cost_forecast → save artifact │
│ 3. budget_gate (block | warn | pass) │
│ 4. run stages... │
│ 5. cost_record_variance on completion │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ dashboard/server.ts │
│ GET /api/costs/forecast — shells to sw cost forecast │
│ GET /api/status — enriched with forecast from artifact │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ dashboard/src/views/pipelines.ts │
│ Queue items display: "Est: $45–$60 (medium confidence)" │
└─────────────────────────────────────────────────────────┘
// -- sw-cost.sh outputs (JSON to stdout) --
// cost_forecast(pipeline_config_path, complexity) → stdout
interface CostForecast {
total_usd: number; // point estimate
low_usd: number; // lower bound (total × 0.8 high, × 0.7 medium, × 0.5 low)
high_usd: number; // upper bound (total × 1.2 high, × 1.5 medium, × 2.0 low)
confidence: "high" | "medium" | "low";
data_points: number; // historical runs used
complexity_multiplier: number;
stages: Array<{
id: string; // e.g. "build", "review"
model: string; // e.g. "sonnet", "opus"
est_duration_s: number;
est_cost_usd: number;
}>;
}
// cost_record_variance(forecast_usd, actual_usd, confidence, template, issue) → event emitted
// No return value; writes to events.jsonl
// -- Budget gate return codes (in pipeline_start context) --
// 0 = proceed (under budget or budget unlimited)
// 1 = warn (forecast 50–100% of remaining; pipeline proceeds with warning)
// 2 = block (forecast.high_usd > remaining AND no --force-start; pipeline exits)
// -- Dashboard API --
// GET /api/costs/forecast?pipeline=standard&complexity=5
// Response 200: CostForecast
// Response 400: { error: { code: string, message: string } }
// -- Dashboard types (additions) --
interface QueueItem {
issue: number;
title: string;
score?: number;
estimated_cost?: number; // existing field
factors?: unknown; // existing field
forecast?: CostForecast; // NEW
}| Component | Error | Handling |
|---|---|---|
cost_forecast() |
No events.jsonl or empty | Falls back to default durations; sets confidence="low" |
cost_forecast() |
Invalid template JSON | Returns error JSON {"error": "..."}, pipeline logs warning and skips gate |
cost_forecast() |
jq not available | Detected at script top; forecast skipped with warning |
| Budget gate |
cost_remaining_budget returns "unlimited" |
Gate skipped entirely |
| Budget gate | Forecast fails | Pipeline proceeds with warning (forecast is advisory, not blocking-critical) |
cost_record_variance() |
Missing forecast data at completion | Skipped silently (no-op if PIPELINE_FORECAST_USD unset) |
| Dashboard endpoint |
sw cost forecast shell-out fails |
Returns 500 with error message |
| Level | Data Points | Interval Width | Rationale |
|---|---|---|---|
| High | >= 20 runs | ±20% (×0.8 / ×1.2) | Enough data for stable averages |
| Medium | 5–19 runs | ±30-50% (×0.7 / ×1.5) | Moderate uncertainty |
| Low | < 5 runs | ±50-100% (×0.5 / ×2.0) | Cold start, conservative bounds |
remaining = cost_remaining_budget()
if remaining == "unlimited" → PASS
if FORCE_START || IGNORE_BUDGET → PASS (with audit event)
if forecast.high_usd > remaining → BLOCK (exit 2, suggest --force-start)
if forecast.total_usd > remaining × 0.5 → WARN (continue)
else → PASS
Scan events.jsonl for stage.completed events, extract duration per stage name, compute running averages. Limit scan to tail -1000 lines for performance. Group by stage ID. This reuses the existing event format — no new data collection needed.
-
Simple multiplier (stage_count × flat_rate) — Pros: Trivial, zero dependencies. Cons: Ignores model tier differences (Opus is 60× more expensive than Haiku), ignores stage duration variance (build averages 20min vs intake at 1min), produces estimates so inaccurate they erode trust in the gate. Rejected: too coarse for meaningful go/no-go decisions.
-
ML regression on historical runs — Pros: Could capture non-linear relationships (e.g., complexity × model × time-of-day). Cons: Requires training infrastructure, minimum ~100 runs for stable regression, adds Python dependency to a shell-native project, massive over-engineering for current data volume (~dozens of runs). Rejected: future enhancement when data justifies it.
| File | Lines Changed (est.) | Purpose |
|---|---|---|
scripts/sw-cost.sh |
+200 |
cost_forecast(), cost_forecast_display(), cost_record_variance(), forecast CLI subcommand |
scripts/sw-pipeline.sh |
+50 |
--force-start flag, forecast + budget gate in pipeline_start(), variance at completion |
config/event-schema.json |
+20 |
cost.forecast and cost.forecast_variance event type definitions |
dashboard/server.ts |
+30 |
/api/costs/forecast endpoint, forecast in queue enrichment |
dashboard/src/types/api.ts |
+15 |
CostForecast interface, extend QueueItem
|
dashboard/src/views/pipelines.ts |
+15 | Forecast display on queued items |
scripts/sw-pipeline-test.sh |
+60 | Integration tests for budget gate |
| File | Purpose |
|---|---|
src/cost-forecast.test.js |
Unit tests for forecast math and variance tracking |
- None new. Uses existing
jq,awk,bash, vitest.
-
events.jsonlscan performance: Mitigated bytail -1000+ grep filter. If file exceeds ~100K lines, consider indexed lookup (future). -
pipeline_start()is already ~300 lines: Adding forecast + gate adds ~50 lines of sequential logic. Inserted as a discrete block afterload_pipeline_config, before state file creation — minimal entanglement with existing flow. -
Bash 3.2 float arithmetic: All cost math uses
awk(already the pattern incost_calculate()). Nobcdependency. -
Race condition on budget check: Between forecast check and actual spend, another pipeline could start. Acceptable — the gate is advisory, not transactional.
--force-startexists as escape valve.
-
shipwright cost forecast --pipeline standard --jsonreturns validCostForecastJSON -
shipwright cost forecast --pipeline standardrenders human-readable table with per-stage breakdown - Cold start (empty events.jsonl): forecast uses defaults, shows "low" confidence
- With 25+ historical stage.completed events: shows "high" confidence with narrow interval
- Pipeline start displays forecast before executing stages
- Pipeline blocked when
forecast.high_usd > remaining_budget(exit code 2, message includes--force-starthint) -
--force-startbypasses gate with audit event emitted -
--ignore-budgetalso bypasses forecast gate (backward compatible) -
cost.forecastevent emitted at pipeline start -
cost.forecast_varianceevent emitted at pipeline completion with forecast/actual/variance fields - All existing tests pass (
npm test) - New unit tests cover: forecast calculation, confidence thresholds at boundaries (4/5/19/20 data points), complexity multiplier scaling, variance recording
- New integration tests cover: gate blocks over budget, gate warns at 50-100%,
--force-startoverride, variance event in events.jsonl - No Bash 4+ features used (verified by
shellcheckor manual review) - Dashboard
/api/costs/forecastreturns valid JSON for all template types - Dashboard queue view shows forecast inline for queued items
pipelines.ts (view)
└─ renderQueueTable()
└─ renderQueueRow(item: QueueItem)
└─ renderForecastBadge(forecast?: CostForecast) // NEW
- "Est: $45–$60 (medium confidence)"
- Color-coded: green (under 50% budget), yellow (50-100%), red (over)
State lives in FleetState.queue[].forecast — fetched from server, no local state management needed. The forecast data flows from GET /api/status through to render.
No new state stores. Forecast data is embedded in the existing FleetState response from /api/status. The queue enrichment in server.ts reads cost-forecast.json from pipeline artifacts when available. Pure props-down data flow.
- Forecast badge uses semantic
<span>witharia-label="Estimated cost: $45 to $60, medium confidence" - Color coding supplemented with text labels (not color-only)
- Budget warning uses
role="alert"for screen reader announcement - Table cells use
<td>with column headers in<th>(existing pattern)
- 320px: Forecast column hidden; available via row expansion (existing mobile pattern)
- 768px+: Forecast shown as compact badge: "$45–$60 (M)"
- 1024px+: Full forecast text: "Est: $45–$60 (medium confidence, 12 runs)"
- 1440px+: No change from 1024px
The dashboard already uses a responsive table pattern — forecast column follows the same hide/show behavior as existing optional columns.