Skip to content

Pipeline Design 189

Seth Ford edited this page Apr 5, 2026 · 2 revisions

ADR written to .claude/pipeline-artifacts/design.md.

Summary of key decisions:

  1. Composable middleware (Option B) over 14 per-stage wrappers — most stages share identical orchestration, only build/test are special
  2. 4 new functions in pipeline-stages.sh: check_human_directives(), select_stage_model(), broadcast_stage_discovery(), run_stage()
  3. Self-healing build+test stays inline in run_pipeline() — counter coupling makes extraction risky for no testability gain
  4. Error boundaries: cross-cutting concerns never cause stage failure (fail-open/fire-and-forget), only run_stage() propagates actual stage failures
  5. ~200 lines removed from run_pipeline(), ~180 added to pipeline-stages.sh as testable functions
  6. 1 new test file (sw-lib-pipeline-execution-test.sh) + ~20 new tests in existing sw-lib-pipeline-stages-test.sh fault)
  • Functions share state via global variables (ARTIFACTS_DIR, ISSUE_NUMBER, CLAUDE_MODEL, etc.)
  • Optional modules (audit_emit, gh_checks_stage_update, ucb1_select_model) may or may not be loaded — all calls must use type funcname >/dev/null 2>&1 guards
  • The self-healing build+test loop tightly couples two stages with counter management — it cannot be cleanly extracted without breaking the completed counter

Decision

Extract 4 composable middleware functions into scripts/lib/pipeline-stages.sh, then simplify run_pipeline() to call them. This is Option B from the plan — composable functions rather than 14 per-stage wrappers.

New Functions

1. check_human_directives(stage_id) — returns 0 (proceed) or 1 (skipped)

Extracts lines 542-569 from run_pipeline(). Handles two file-based intervention mechanisms:

  • skip-stage.txt: grep for stage ID, remove from file if found, emit stage.skipped event
  • human-message.txt: display message, emit pipeline.human_message event, delete file

Fail-safe: all file reads guarded with 2>/dev/null || true. If files are missing or malformed, stage proceeds normally.

2. select_stage_model(stage_id) — returns 0 always (best-effort)

Extracts lines 694-763 from run_pipeline(). Three-tier model selection:

  1. UCB1 (when ucb1_select_model is available and has data): Direct model recommendation from multi-armed bandit
  2. A/B testing (when intelligence_recommend_model is available): Randomized experiment/control split with configurable ratio from daemon-config.json
  3. Graduated (when routing file shows >=50 samples): Bypass A/B, use recommended model directly

Side effect: exports CLAUDE_MODEL, emits intelligence.model_ucb1 or intelligence.model_ab event.

3. broadcast_stage_discovery(stage_id) — returns 0 always (fire-and-forget)

Extracts lines 809-821 from run_pipeline(). Maps stage ID to discovery category and file patterns:

  • plan -> *.md
  • design -> *.md,*.ts,*.tsx,*.js
  • build -> src/*,*.ts,*.tsx,*.js
  • test -> *.test.*,*_test.*
  • review -> *.md,*.ts,*.tsx
  • default -> *

Calls sw-discovery.sh broadcast as a subprocess. All errors suppressed.

4. run_stage(stage_id, enabled_count, completed_count) — returns 0 (success) or 1 (failure)

Wraps lines 766-852 from run_pipeline() into a composite function that orchestrates:

  1. Progress display (Stage: id [n/total])
  2. Status update to running
  3. Start time recording + event emission
  4. GitHub Check Run in_progress update
  5. Audit trail stage.start emission
  6. Delegate to run_stage_with_retry(stage_id)
  7. On success: mark complete, capture patterns (intake), timing, events, audit, vitals, UCB1 outcome, discovery broadcast, model routing log
  8. On failure: mark failed, error events, audit, vitals, UCB1 outcome, cancel remaining check runs

Sets LAST_STAGE_ERROR and LAST_STAGE_ERROR_CLASS on failure for caller consumption.

What stays inline in run_pipeline()

  • Self-healing build+test loop (lines 612-648): Tightly couples two stages with counter management (completed += 2). Extracting this would require passing mutable counter state through function boundaries, adding complexity for no testability gain.
  • Gate checks (lines 664-679): Interactive read prompt that controls pipeline pause/resume flow. Must stay in the loop to return 0 from run_pipeline().
  • Budget enforcement (lines 681-692): Similar flow control — needs to pause the pipeline, not just skip a stage.
  • Intelligence skip evaluation (lines 577-586): Already a clean single function call; wrapping it adds nothing.
  • CI resume logic (lines 596-609): Artifact verification that may fall through to stage execution.

Error Handling Strategy

Function Error Boundary Failure Mode
check_human_directives Fail-open File errors suppressed, stage proceeds
select_stage_model Fail-open All paths guarded, falls back to _smart_model default sonnet
broadcast_stage_discovery Fire-and-forget Subprocess errors suppressed with `2>/dev/null
run_stage Fail-propagate Returns 1 on stage failure, caller handles pipeline-level response

Variable Scope Contract

All new functions operate on globals already set by sw-pipeline.sh and pipeline-execution.sh. Each function uses ${VAR:-default} for every global reference, matching the existing convention in pipeline-stages.sh (lines 31-48). No new globals introduced.

Alternatives Considered

  1. Per-Stage Wrappers (run_intake_stage(), etc.) — Pros: Matches issue acceptance criteria literally; each stage wrapper independently testable. Cons: 14 wrapper functions where 12 are identical boilerplate (only build/test have special logic); violates DRY; ~300 lines of duplicated orchestration. Rejected because the orchestration is stage-agnostic.

  2. Do Nothing — Pros: Zero regression risk; stages are already in separate files. Cons: run_pipeline() remains 390 lines mixing concerns; cross-cutting logic untestable in isolation. Rejected because the opportunity to improve testability is worth the moderate risk.

  3. Event-Driven Stage Lifecycle — Pros: Maximum decoupling via event emitters/listeners. Cons: Bash has no native event system; implementing one adds significant complexity for only 4 cross-cutting concerns. Rejected as overkill.

Implementation Plan

Files to create

  • scripts/sw-lib-pipeline-execution-test.sh — Unit tests for run_stage_with_retry, self_healing_build_test, and orchestration integration

Files to modify

  • scripts/lib/pipeline-stages.sh — Add 4 new functions (~180 lines)
  • scripts/lib/pipeline-execution.sh — Simplify run_pipeline() (~200 lines removed, ~40 added)
  • scripts/sw-lib-pipeline-stages-test.sh — Add ~20 unit tests (~250 lines)
  • package.json — Register sw-lib-pipeline-execution-test.sh

Dependencies

  • No new external dependencies
  • New functions depend on existing loaded modules: pipeline-state.sh, helpers.sh, compat.sh
  • All dependencies already sourced before pipeline-stages.sh in the load chain

Risk areas

1. Variable Scope Breakage (HIGH likelihood, MEDIUM impact) Extracted functions reference 15+ globals. If unset in test contexts, functions fail under set -u. Mitigation: Every global uses ${VAR:-default}. Test setup mirrors existing pipeline-stages-test.sh.

2. Self-Healing Counter Coupling (MEDIUM likelihood, HIGH impact) The build+test self-healing loop increments completed by 2. If run_stage() is accidentally called during self-healing, counts break. Mitigation: Self-healing block stays inline. The existing continue on line 648 prevents run_stage() from being reached.

3. Return vs Exit in Subshells (LOW likelihood, HIGH impact) If a new function is called in a pipeline (|) or $(), return exits the subshell not the caller. Mitigation: All new functions called directly (no pipes). Code review enforces this.

4. Module Load Order (LOW likelihood, MEDIUM impact) New functions may call audit_emit, gh_checks_stage_update which load conditionally. Mitigation: All optional calls use type funcname >/dev/null 2>&1 && guards.

Validation Criteria

  • check_human_directives(), select_stage_model(), broadcast_stage_discovery(), run_stage() exist in scripts/lib/pipeline-stages.sh
  • run_pipeline() reduced by ~200 lines (from ~390 to ~190 in the stage loop)
  • run_pipeline() calls all 4 new functions (verified by grep)
  • All functions use ${VAR:-default} for every global variable reference
  • All optional module calls use type funcname >/dev/null 2>&1 guards
  • Self-healing build+test loop remains inline (not extracted)
  • No new subshells introduced for extracted functions
  • Unit tests cover happy path, error path, and missing dependencies for each function
  • sw-lib-pipeline-stages-test.sh passes with ~20 new tests
  • sw-lib-pipeline-execution-test.sh registered in package.json
  • sw-pipeline-test.sh (58 tests) passes without modification
  • sw-e2e-smoke-test.sh (19 tests) passes without modification
  • No Bash 3.2 incompatibilities introduced
  • CLAUDE.md Shared Libraries table updated

Clone this wiki locally