-
Notifications
You must be signed in to change notification settings - Fork 1
Pipeline Plan 189
Plan written to .claude/pipeline-artifacts/plan.md.
Summary of the plan:
The issue assumed sw-pipeline.sh was ~2839 lines, but prior work already decomposed it to 279 lines with stages in lib/pipeline-stages-*.sh. The remaining work is extracting cross-cutting concerns from run_pipeline() in pipeline-execution.sh (~390 lines of mixed orchestration):
4 new functions in pipeline-stages.sh:
-
check_human_directives()— skip-stage/human-message file handling -
select_stage_model()— UCB1/A/B intelligence model routing -
broadcast_stage_discovery()— post-success cross-pipeline learning -
run_stage()— unified stage runner (timing, audit, checks, retry, events)
Key design decision: Extract composable middleware functions (Option B) rather than 14 per-stage wrappers (Option A), since most stages share identical orchestration — only build/test have special self-healing logic.
10 implementation tasks, 2 new/modified test files, ~20 new unit tests, ~200 lines removed from run_pipeline().
mpound_quality(), stage_audit()| |lib/pipeline-stages-delivery.sh| 965 |stage_pr(), stage_merge(), stage_deploy()| |lib/pipeline-stages-monitor.sh| 407 |stage_validate(), stage_monitor()` |
Stage implementations are already in domain-specific lib files. The acceptance criteria "New lib/pipeline-stages.sh with functions: run_intake_stage, run_plan_stage, etc." and "sw-pipeline.sh reduced to <1500 lines" are already satisfied in spirit.
What's NOT yet decomposed is in pipeline-execution.sh — the run_pipeline() function (lines 507-897, ~390 lines) mixes generic orchestration with stage-specific logic:
- Build-specific self-healing (lines 612-648): Self-healing build+test loop with TDD integration
- Stage-specific test skip (lines 656-662): Skip test stage when handled by self-healing
- Model routing (lines 694-763): ~70 lines of intelligence model routing per stage
- Post-success callbacks (lines 789-821): Stage-specific discovery broadcasting, memory capture
- Human intervention (lines 542-569): Skip-stage directives, human messages
These can be extracted into composable functions in lib/pipeline-stages.sh, creating the run_<stage>_stage() wrappers the issue requests.
- Pros: Matches issue acceptance criteria exactly; each stage independently testable
-
Cons: Lots of boilerplate — most wrappers would just call
stage_<id>()since only build/test have special logic - Rejected: Too much duplication for minimal benefit
-
Pros: DRY; each concern (model routing, human intervention, GitHub checks, discovery) becomes a testable function;
run_pipeline()becomes a clean loop - Cons: Slightly more complex function signatures
- Chosen: This approach
- Pros: No risk of regression
-
Cons: Doesn't satisfy issue acceptance criteria; misses opportunity to simplify
run_pipeline() - Rejected: Issue requests specific deliverables
sw-pipeline.sh (CLI router, 279 lines)
|
+-- lib/pipeline-cli.sh (argument parsing)
+-- lib/pipeline-commands.sh (start/resume/status/abort)
| |
| +-- lib/pipeline-execution.sh (orchestration loop)
| |
| +-- lib/pipeline-stages.sh (stage runner + helpers) <-- MODIFY
| | |
| | +-- run_stage() <-- NEW: unified stage runner
| | +-- select_stage_model() <-- NEW: model routing extract
| | +-- check_human_directives() <-- NEW: human intervention
| | +-- broadcast_stage_discovery() <-- NEW: post-success hook
| | |
| | +-- lib/pipeline-stages-intake.sh (stage_intake, stage_plan, ...)
| | +-- lib/pipeline-stages-build.sh (stage_build, stage_test, ...)
| | +-- lib/pipeline-stages-review.sh (stage_review, ...)
| | +-- lib/pipeline-stages-delivery.sh (stage_pr, stage_merge, ...)
| | +-- lib/pipeline-stages-monitor.sh (stage_validate, stage_monitor)
| |
| +-- lib/pipeline-state.sh (state persistence)
|
+-- (intelligence, GitHub, memory modules)
# NEW: Unified stage runner — encapsulates timing, checks, audit, discovery
# Called by run_pipeline() for each stage. Returns 0 on success, 1 on failure.
# Side effects: updates stage status, emits events, writes model-routing.log
run_stage() {
local stage_id="$1" # e.g. "intake", "build", "plan"
local enabled_count="$2" # total enabled stages (for progress display)
local completed_count="$3" # stages completed so far
# Returns: 0=success, 1=failure
# Sets: LAST_STAGE_ERROR, LAST_STAGE_ERROR_CLASS (on failure)
}
# NEW: Intelligence model routing for a stage
# Extracts the ~70 lines of UCB1/A/B model routing from run_pipeline()
select_stage_model() {
local stage_id="$1"
# Side effect: exports CLAUDE_MODEL, emits intelligence.model_* events
# Returns: 0 always (model selection is best-effort)
}
# NEW: Check for human intervention directives (skip-stage, human-message)
# Returns 0 if stage should proceed, 1 if skipped
check_human_directives() {
local stage_id="$1"
# Returns: 0=proceed, 1=skipped (emits stage.skipped event)
}
# NEW: Post-success discovery broadcast
broadcast_stage_discovery() {
local stage_id="$1"
# Side effect: calls sw-discovery.sh broadcast with stage-appropriate patterns
}run_pipeline() iterates stages from PIPELINE_CONFIG
|
+-- For each stage:
| +-- check_human_directives(stage_id) -> skip or proceed
| +-- check intelligence skip (existing pipeline_should_skip_stage)
| +-- check already complete (get_stage_status)
| +-- Handle self-healing build+test special case
| +-- Gate check (approve/auto)
| +-- Budget check
| +-- select_stage_model(stage_id) -> sets CLAUDE_MODEL
| +-- run_stage(stage_id, counts) -> timing, retry, events, audit, discovery
|
+-- Pipeline summary on completion
-
run_stage()catches stage failures viarun_stage_with_retry(), records error class/message, emits events. Returns 1 on failure —run_pipeline()handles pipeline-level failure response. -
select_stage_model()is best-effort — all failures suppressed with|| true. Defaults to_smart_model default sonnet. -
check_human_directives()is fail-safe — file read errors suppressed. Worst case: stage proceeds normally. -
broadcast_stage_discovery()is fire-and-forget — all calls guarded with2>/dev/null || true.
| File | Action | Lines Changed (est.) |
|---|---|---|
scripts/lib/pipeline-stages.sh |
Add run_stage(), select_stage_model(), check_human_directives(), broadcast_stage_discovery()
|
+180 |
scripts/lib/pipeline-execution.sh |
Simplify run_pipeline() to use new functions |
-200, +40 |
scripts/sw-lib-pipeline-stages-test.sh |
Add unit tests for all new functions | +250 |
scripts/sw-lib-pipeline-execution-test.sh |
New test file for execution orchestration | +200 (new file) |
Move lines 542-569 from run_pipeline() into a standalone function. Lowest-risk extraction — pure file reads with no complex state.
Move lines 694-763 from run_pipeline(). Most complex extraction — UCB1 + A/B testing logic.
Move lines 809-821 from run_pipeline(). Simple stage-to-pattern mapping + subprocess call.
Composite function that wraps: timing, status updates, GitHub check runs, audit, retry, model outcome recording, vitals, and discovery broadcast. This is lines 766-852 of run_pipeline() refactored into a self-contained function.
Replace inline logic with calls to new functions. Main loop calls check_human_directives, select_stage_model, and run_stage instead of having all that logic inline.
-
check_human_directives: skip-stage file, human-message file, no files, malformed files -
select_stage_model: UCB1 path, A/B test path, no intelligence functions, graduated path -
broadcast_stage_discovery: each stage type maps to correct patterns -
run_stage: success path, failure path, timing recorded, events emitted
New test file covering:
-
run_stage_with_retry: plan artifact skip, error classification, retry backoff -
self_healing_build_test: convergence detection, cycle limits
./scripts/sw-lib-pipeline-stages-test.sh
./scripts/sw-pipeline-test.sh
./scripts/sw-e2e-smoke-test.shAdd sw-lib-pipeline-execution-test.sh to the test scripts in package.json.
Update the Architecture section to reflect the new function boundaries and the Shared Libraries table.
- Task 1: Extract
check_human_directives()fromrun_pipeline()intopipeline-stages.sh - Task 2: Extract
select_stage_model()fromrun_pipeline()intopipeline-stages.sh - Task 3: Extract
broadcast_stage_discovery()fromrun_pipeline()intopipeline-stages.sh - Task 4: Create
run_stage()composite function inpipeline-stages.sh - Task 5: Refactor
run_pipeline()inpipeline-execution.shto use new functions - Task 6: Add unit tests for new functions to
sw-lib-pipeline-stages-test.sh - Task 7: Create
sw-lib-pipeline-execution-test.shwith orchestration tests - Task 8: Register new test file in
package.json - Task 9: Run existing test suites to verify no regression
- Task 10: Update CLAUDE.md documentation with new architecture
-
Unit tests (~20 tests): Each new function tested in isolation with mocked dependencies
-
check_human_directives: 4 tests (skip file, human message, no files, cleanup) -
select_stage_model: 5 tests (UCB1, A/B experiment, A/B control, graduated, no intelligence) -
broadcast_stage_discovery: 4 tests (plan, build, test, unknown stage) -
run_stage: 5 tests (success, failure, timing, audit, GitHub checks)
-
-
Integration tests (~5 tests):
run_pipeline()with mocked stage functions to verify sequencing -
E2E tests (existing):
sw-pipeline-test.shandsw-e2e-smoke-test.shverify full pipeline flow
- 100% of new function code paths
- Existing test suites pass without modification (critical requirement)
- Happy path: Stage runs, succeeds, timing/events recorded, discovery broadcast
- Error case 1: Stage fails, error classified, pipeline halts with correct status
- Error case 2: Human skip directive removes stage from skip file
-
Edge case 1:
select_stage_model()with no intelligence functions available -
Edge case 2:
run_stage()when audit/vitals/checks functions don't exist (optional modules)
-
pipeline-stages.shexports:check_human_directives,select_stage_model,broadcast_stage_discovery,run_stage -
run_pipeline()inpipeline-execution.shuses new functions (reduced by ~200 lines) - All 102+ existing test suites pass without modification
- New unit tests pass for each extracted function
- New
sw-lib-pipeline-execution-test.shregistered inpackage.json - No performance regression: function call overhead is negligible for shell (no new subshells)
- CLAUDE.md updated with new architecture
What could break: Extracted functions reference global variables (ARTIFACTS_DIR, ISSUE_NUMBER, CLAUDE_MODEL, etc.) that are set in sw-pipeline.sh. If a function is sourced in a different context (e.g., test env), variables may be unset.
Mitigation: Each new function will use ${VAR:-default} patterns for all global variable references, matching existing lib file conventions. Test environment sets all required globals (already shown in existing test setup).
What could break: The build+test self-healing loop (lines 612-648) tightly couples build and test stages with counter management. Extracting this incorrectly could break the completed counter.
Mitigation: Leave the self-healing block inline in run_pipeline() — it's already a special case that interleaves two stages. Only extract the per-stage concerns (model routing, timing, discovery).
What could break: Using return in a function that's called from a pipeline (|) or subshell would exit the subshell, not the calling function.
Mitigation: Maintain direct function calls (no pipes). All new functions use return 0/return 1 consistently. Test both success and failure paths.
What could break: New functions in pipeline-stages.sh reference functions from pipeline-state.sh, pipeline-intelligence-skip.sh, and helper modules. If sourced before those modules, functions will be undefined.
Mitigation: pipeline-stages.sh is already sourced AFTER all dependency modules in sw-pipeline.sh (line 52, after lines 37-50). New functions use type funcname >/dev/null 2>&1 guards before calling optional functions, matching existing patterns.
Performance profiling is not applicable for this refactoring because:
- Shell function calls have negligible overhead (~0.1ms per call)
- No new subshells, subprocesses, or file I/O are introduced
- The refactoring is a code reorganization, not an algorithm change
- Pipeline stage latency is dominated by Claude API calls (30s-300s per stage), not shell orchestration
The existing STAGE_TIMINGS mechanism already captures per-stage timing. We'll verify no regression by comparing before/after timings in the existing test suite.
Not applicable — see Baseline Metrics above. The refactoring moves code between files without changing execution paths.
Not applicable — this is a refactoring for testability and maintainability, not performance.
Run sw-pipeline-test.sh (58 tests) and sw-e2e-smoke-test.sh (19 tests) before and after. Both suites complete in <60s and exercise the full pipeline orchestration path. Any timing regression >5% would indicate an issue.
- Stale issue metadata (Likelihood: HIGH) — The issue was created when sw-pipeline.sh was monolithic. Prior work already extracted stages. Evidence: sw-pipeline.sh is 279 lines, stage files exist.
-
Partial extraction (Likelihood: HIGH) — Stages were extracted but orchestration concerns weren't. Evidence:
run_pipeline()still has ~390 lines of mixed concerns. -
Scope creep in orchestrator (Likelihood: MEDIUM) — New features (model routing, discovery, human intervention) were added to
run_pipeline()after the initial extraction, re-bloating it.
-
sw-pipeline.sh: 279 lines (confirmed viawc -l) -
lib/pipeline-execution.sh: 897 lines, withrun_pipeline()spanning lines 507-897 - Stage functions (
stage_intake,stage_plan, etc.) already exist inlib/pipeline-stages-*.sh - Existing
sw-lib-pipeline-stages-test.shtests helper functions but not orchestration - No existing
sw-lib-pipeline-execution-test.sh
Complete the decomposition by extracting cross-cutting concerns from run_pipeline() into testable functions in pipeline-stages.sh. This differs from the original issue's assumption (extract stages from sw-pipeline.sh) because stages are already extracted — the remaining work is orchestration cleanup.
- Run existing tests BEFORE changes to establish baseline
- Extract one function at a time, running tests after each
- Run full test suite after all extractions
- Verify
run_pipeline()line count decreased by ~200 lines - Verify new functions are independently testable