-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Design 178
Now I have a clear picture of the existing implementation and the gaps. Here's the ADR:
Shipwright pipelines consume budget via LLM token usage across 12 stages. Operators need visibility into expected costs before committing to a run, and guardrails to prevent budget exhaustion. The core forecasting, budget gate, variance tracking, and dashboard integration were implemented in commit abd44a8. Four gaps remain:
- No early warning — the gate is binary (block or pass); no warning when a run will consume 50–100% of remaining budget.
-
No confidence range — forecast shows a point estimate (
$X.XX) but not low/high bounds reflecting data quality. -
No
estimated_coston queued dashboard items — theQueueItemtype has the field but daemon dispatch doesn't populate it. -
cost.forecastevent missingdata_points— reduces observability.
Constraints: Bash 3.2 compatibility, set -euo pipefail, awk for floating-point, no new dependencies. Shell-first architecture — dashboard is read-only over events.jsonl.
Enhance the existing implementation in-place across 3 shell scripts and 1 TypeScript file. No new files for core logic.
Add low_usd and high_usd to the cost_forecast() JSON output, computed from total_usd and a confidence-dependent spread:
| Confidence | Spread | Example ($50 point estimate) |
|---|---|---|
| high | ±15% | $42.50 – $57.50 |
| medium | ±30% | $35.00 – $65.00 |
| low | ±50% | $25.00 – $75.00 |
cost_forecast_display() shows the range for medium/low confidence: $35.00 - $65.00 (medium confidence, 8 runs). High confidence keeps the point estimate since the range is narrow enough to be noise.
After the existing "exceeds budget" block in pipeline_start(), add a consumption-percentage check:
if forecast > 50% of remaining (but ≤ remaining):
warn "Forecast $X will consume Y% of remaining budget $Z"
emit_event "cost.budget_high_usage" ...
Single warning per pipeline start — no repeated noise. Does not block, only informs.
In daemon-dispatch.sh, after the existing pre-spawn budget check, write forecast_usd to the job's metadata file. server.ts reads this when building queue items for /api/status.
Add data_points=$(echo "$FORECAST_JSON" | jq -r '.data_points') to the existing emit_event "cost.forecast" call.
┌─────────────────────────────────────────────────────┐
│ CLI / Pipeline │
│ sw-pipeline.sh │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ Budget Gate │ │ Early Warning│ │ Variance │ │
│ │ (block/force) │ │ (warn >50%) │ │ Recording │ │
│ └──────┬───────┘ └──────┬───────┘ └─────┬──────┘ │
│ │ │ │ │
└─────────┼─────────────────┼────────────────┼─────────┘
│ calls │ calls │ calls
▼ ▼ ▼
┌─────────────────────────────────────────────────────┐
│ Cost Engine │
│ sw-cost.sh │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │
│ │cost_forecast()│ │ _display() │ │_variance() │ │
│ │ +low/high_usd│ │ +range fmt │ │ │ │
│ └──────┬───────┘ └──────────────┘ └────────────┘ │
│ │ reads │
│ ▼ │
│ ┌────────────────┐ ┌───────────────────────┐ │
│ │ Pricing rates │ │ events.jsonl (history) │ │
│ └────────────────┘ └───────────────────────┘ │
└─────────────────────────────────────────────────────┘
┌──────────────────┐ ┌──────────────────────┐
│ daemon-dispatch │──writes──▶ job metadata file │
│ (forecast_usd) │ │ (worktree/.forecast) │
└──────────────────┘ └──────────┬───────────┘
│ reads
┌──────────▼───────────┐
│ dashboard/server.ts │
│ /api/costs/forecast │
│ /api/status (queue) │
└──────────────────────┘
// cost_forecast() JSON output (bash → stdout)
interface CostForecast {
total_usd: string; // "50.00"
low_usd: string; // "35.00" — NEW
high_usd: string; // "65.00" — NEW
stages: Array<{
id: string;
model: string;
est_duration_s: number;
est_tokens_in: number;
est_tokens_out: number;
est_cost: string;
}>;
confidence: "high" | "medium" | "low";
data_points: number;
complexity_multiplier: string;
}
// emit_event "cost.forecast" fields
interface CostForecastEvent {
type: "cost.forecast";
forecast_usd: string;
template: string;
confidence: string;
issue: string;
data_points: string; // NEW
}
// emit_event "cost.budget_high_usage" fields — NEW
interface BudgetHighUsageEvent {
type: "cost.budget_high_usage";
forecast_usd: string;
remaining_usd: string;
consumption_pct: string; // e.g. "72.5"
template: string;
issue: string;
}
// GET /api/costs/forecast response (unchanged structure, data richer)
interface ForecastResponse {
recent_forecasts: Array<{
issue: string;
template: string;
forecast_usd: number;
confidence: string;
ts: string;
}>;
variance_history: Array<{
forecast_usd: number;
actual_usd: number;
variance_pct: number;
template: string;
ts: string;
}>;
}
// QueueItem in /api/status (existing optional field, now populated)
interface QueueItem {
// ... existing fields ...
estimated_cost?: number; // populated from job metadata
}-
Pre-start forecast:
pipeline_start()→cost_forecast(template_config, complexity)→ readsevents.jsonlhistory + template stages → returns JSON withtotal_usd,low_usd,high_usd,confidence -
Display:
cost_forecast_display(json)→ renders table with range to stdout -
Budget gate: compares
total_usdvscost_remaining_budget()→ blocks or warns -
Early warning: if
total_usd / remaining > 0.5but≤ 1.0→warn()+emit_event "cost.budget_high_usage" -
Variance: on pipeline completion →
cost_record_variance(forecast, actual, template, issue)→ emitscost.forecast_varianceevent -
Dashboard:
server.tsreadsevents.jsonl→ filterscost.forecastandcost.forecast_variance→ returns to frontend
| Component | Error | Handling |
|---|---|---|
cost_forecast() |
Missing template file | Returns empty string to stderr, caller skips forecast (existing) |
cost_forecast() |
No events.jsonl | Uses default durations, confidence="low" (existing) |
cost_forecast() |
jq parse failure | ` |
| Budget gate |
cost_remaining_budget fails |
Falls back to "unlimited", skips gate (existing) |
| Early warning | awk error | Non-fatal — wrapped in conditional, pipeline proceeds |
Dashboard /api/costs/forecast
|
events.jsonl unreadable | Returns 500 (existing) |
| daemon-dispatch | Forecast write failure | ` |
-
Statistical confidence intervals (std dev) — Pros: mathematically rigorous, self-calibrating / Cons: needs ≥20 data points per template to be meaningful, complex bash math, overkill for current data volumes. Deferred — can iterate when more history exists; the percentage-spread approach is a pragmatic first step.
-
Node.js forecast module — Pros: proper floating point, easier unit testing, natural dashboard integration / Cons: breaks shell-first convention, requires Node at pipeline start time, two systems to maintain. Rejected — inconsistent with codebase architecture.
-
Do nothing — Pros: zero risk / Cons: misses acceptance criteria for early warning and confidence range. Rejected — the gaps are small but material for operator experience.
- Files to create: None
-
Files to modify:
-
scripts/sw-cost.sh— addlow_usd/high_usdtocost_forecast()output; updatecost_forecast_display()for range format -
scripts/sw-pipeline.sh— add early warning check after budget gate; adddata_pointstocost.forecastevent -
scripts/lib/daemon-dispatch.sh— writeforecast_usdto job metadata -
dashboard/server.ts— includeestimated_costin queue item response -
scripts/sw-cost-test.sh— tests for confidence range and early warning
-
- Dependencies: None (new)
-
Risk areas:
-
daemon-dispatch.shis critical path for daemon — change is additive (write one metadata field), low risk - awk floating-point precision — acceptable at 4 decimal places for cost estimates
- Confidence spread percentages are heuristic, not statistical — clearly labeled as such
-
- Method: GET
-
Path:
/api/costs/forecast -
Query params:
period(integer, days, default 30) -
Response (200):
{ recent_forecasts: [...], variance_history: [...] }(see interface above) -
Response (200, empty):
{ recent_forecasts: [], variance_history: [] }when no data exists - Response (500): Internal error reading events.jsonl
- Rate limiting: N/A — local developer tool
- Versioning: N/A — internal API, no external consumers
- Auth: None required (local dashboard)
| Code | Condition |
|---|---|
| 200 | Success (may have empty arrays) |
| 500 | events.jsonl read failure |
-
cost_forecast()output JSON includeslow_usdandhigh_usdfields - Low confidence range is wider than high confidence range (±50% vs ±15%)
-
cost_forecast_display()shows range format for medium/low confidence - Pipeline emits warning (not block) when forecast consumes 50–100% of remaining budget
- Pipeline still blocks when forecast exceeds 100% of remaining budget (regression)
-
cost.forecastevent includesdata_pointsfield -
--force-startcontinues to bypass budget gate (regression) - Dashboard queue items include
estimated_costwhen available -
npm testpasses — no regressions - All new behavior tested in
sw-cost-test.sh