-
Notifications
You must be signed in to change notification settings - Fork 1
Pipeline Design 56
Shipwright currently routes all pipeline stages to a single model (typically opus) regardless of task complexity. The intelligence layer already has intelligence_recommend_model() with SPRT-based evidence testing and per-model pricing in sw-intelligence.sh, plus an A/B testing gate in sw-pipeline.sh (~L7267) that defaults to opus with intelligent routing as the experimental arm (20%).
The problem: this wastes budget on simple tasks (haiku could handle intake/test stages) and under-provisions complex ones. The A/B gate is inverted — the smarter routing should be the default, not the experiment.
Constraints:
- Bash 3.2 compatibility (no associative arrays, no
readarray, no${var,,}) -
set -euo pipefailin all scripts - Atomic file writes (tmp + mv)
- JSON via
jq --arg, never string interpolation - Existing
intelligence_recommend_model()and SPRT infrastructure must be preserved, not replaced - Pipeline templates are JSON; changes must be backward-compatible (stages without
config.modelshould fall back gracefully)
Three-tier model routing with automatic escalation on failure.
Stage Start
→ intelligence_stage_defaults(stage, complexity) returns default model
→ Pipeline checks template override (config.model per stage)
→ Template override wins if set; otherwise use stage default
→ On retry failure: escalate_model(current) bumps haiku→sonnet→opus
→ On loop stall (CONSECUTIVE_FAILURES >= 2): escalate from sonnet→opus
→ All routing decisions logged to .claude/pipeline-artifacts/model-routing.log
→ cost_pipeline_summary() reads routing log + costs.json for per-stage breakdown
| Stage | Low Complexity | Medium | High |
|---|---|---|---|
| intake | haiku | haiku | sonnet |
| plan | sonnet | sonnet | opus |
| design | sonnet | opus | opus |
| build | sonnet | sonnet | opus |
| test | haiku | sonnet | sonnet |
| review | sonnet | opus | opus |
| compound_quality | sonnet | opus | opus |
| pr | haiku | haiku | sonnet |
| merge | haiku | haiku | haiku |
| deploy | haiku | sonnet | sonnet |
| validate | haiku | sonnet | sonnet |
| monitor | haiku | haiku | sonnet |
escalate_model(current_model) returns next tier: haiku → sonnet → opus → opus (opus is ceiling). Pure function, no side effects. Caller logs the escalation event.
In run_pipeline() (~L7267), flip the ratio: intelligent routing becomes 80% (default), opus-everywhere becomes the 20% control group. The existing SPRT evidence framework continues to measure which arm performs better, and the ab_test_ratio config flag still controls the split.
-
intelligence_stage_defaults()returns"sonnet"if stage or complexity is unrecognized (safe middle-ground) -
escalate_model()returns"opus"for any unrecognized input (fail to most capable) - Template
config.modelis optional; missing key → fall through tointelligence_stage_defaults() -
cost_pipeline_summary()gracefully handles missingmodel-routing.log(prints "no routing data") - Loop stall escalation only triggers if
MODEL_ESCALATION_ENABLEDis not explicitly"false"(opt-out, not opt-in)
Each routing decision appends to model-routing.log:
timestamp|stage|complexity|selected_model|source(default|template|escalation)|attempt_number
-
Central routing service / separate script — Pros: clean separation, independently testable. Cons: adds another script to source/maintain, introduces IPC overhead, and the existing
sw-intelligence.shalready owns model selection logic. Adding two functions to an existing script is simpler than a new component. -
Associative array for stage→model mapping — Pros: cleaner lookup syntax. Cons: requires Bash 4+, violates the Bash 3.2 compatibility constraint. Case statements are verbose but compatible.
-
Always-escalate on any failure (no threshold) — Pros: faster recovery. Cons: expensive — a single flaky test would immediately jump to opus. The threshold (
CONSECUTIVE_FAILURES >= 2in loop, per-retry in pipeline) balances cost against recovery speed. -
Model routing as a daemon-config-only setting (no per-stage templates) — Pros: simpler config. Cons: loses the ability to override per-stage in specific pipeline templates (e.g.,
cost-awaretemplate could force haiku everywhere).
- None (all changes are additions to existing files)
-
scripts/sw-intelligence.sh— Addintelligence_stage_defaults()andescalate_model() -
scripts/sw-pipeline.sh— Model escalation inrun_stage_with_retry()(~L6755), invert A/B gate inrun_pipeline()(~L7267) -
scripts/sw-loop.sh— Stall-based escalation whenCONSECUTIVE_FAILURES >= 2(~L2150) -
scripts/sw-cost.sh— Addcost_pipeline_summary()function -
templates/pipelines/standard.json— Add per-stageconfig.modelkeys -
templates/pipelines/full.json— Add per-stageconfig.modelkeys -
templates/pipelines/autonomous.json— Add per-stageconfig.modelkeys -
templates/pipelines/deployed.json— Add per-stageconfig.modelkeys -
scripts/sw-intelligence-test.sh— Unit tests for new functions -
scripts/sw-e2e-smoke-test.sh— Smoke tests for routing integration
- None new. Uses existing
jq, existing intelligence infrastructure, existing template loading.
-
run_stage_with_retry()modification (~L6755 insw-pipeline.sh): This is a hot path — every stage passes through it. The model escalation must not break the existing retry logic. The change should be additive: read current model, callescalate_model(), set the new model env var, then proceed with existing retry flow. - A/B gate inversion: Swapping the default/control percentages could affect running daemon pipelines mid-flight. Mitigation: the gate is evaluated once per pipeline run at startup, so in-flight pipelines keep their original assignment.
-
Template backward compatibility: Existing templates without
config.modelmust continue to work. The pipeline code must usejq -r '.stages[].config.model // empty'(not.config.modelwhich would error on missing key). -
model-routing.loggrowth: Log file could grow unbounded across many pipeline runs.cost_pipeline_summary()should scope to the current pipeline run (filter by run ID or timestamp).
-
intelligence_stage_defaults "build" "low"returns"sonnet";"intake" "low"returns"haiku";"design" "high"returns"opus" -
escalate_model "haiku"returns"sonnet";"sonnet"returns"opus";"opus"returns"opus" -
escalate_model "unknown"returns"opus"(fail-safe) -
intelligence_stage_defaults "nonexistent_stage" "low"returns"sonnet"(safe default) - Pipeline retry escalates model: first attempt uses stage default, second attempt uses next tier
- Loop stall with
CONSECUTIVE_FAILURES=2triggers model escalation log entry - Loop stall with
MODEL_ESCALATION_ENABLED=falsedoes NOT escalate - Templates with
config.modeloverrideintelligence_stage_defaults()return value - Templates without
config.modelfall through tointelligence_stage_defaults()cleanly -
cost_pipeline_summaryproduces per-stage cost breakdown when routing log exists -
cost_pipeline_summaryhandles missing routing log gracefully (no error, informative message) - A/B gate now assigns 80% to intelligent routing, 20% to opus control
- Full test suite passes:
npm testexits 0 - No Bash 4+ features used (no
declare -A, noreadarray, no${var,,})