Pipeline Plan 189

Plan written to .claude/pipeline-artifacts/plan.md.

Summary of the plan:

The issue assumed sw-pipeline.sh was ~2839 lines, but prior work already decomposed it to 279 lines with stages in lib/pipeline-stages-*.sh. The remaining work is extracting cross-cutting concerns from run_pipeline() in pipeline-execution.sh (~390 lines of mixed orchestration):

4 new functions in pipeline-stages.sh:

check_human_directives() — skip-stage/human-message file handling
select_stage_model() — UCB1/A/B intelligence model routing
broadcast_stage_discovery() — post-success cross-pipeline learning
run_stage() — unified stage runner (timing, audit, checks, retry, events)

Key design decision: Extract composable middleware functions (Option B) rather than 14 per-stage wrappers (Option A), since most stages share identical orchestration — only build/test have special self-healing logic.

10 implementation tasks, 2 new/modified test files, ~20 new unit tests, ~200 lines removed from run_pipeline(). mpound_quality(), stage_audit()| |lib/pipeline-stages-delivery.sh| 965 |stage_pr(), stage_merge(), stage_deploy()| |lib/pipeline-stages-monitor.sh| 407 |stage_validate(), stage_monitor()` |

Stage implementations are already in domain-specific lib files. The acceptance criteria "New lib/pipeline-stages.sh with functions: run_intake_stage, run_plan_stage, etc." and "sw-pipeline.sh reduced to <1500 lines" are already satisfied in spirit.

Remaining Work

What's NOT yet decomposed is in pipeline-execution.sh — the run_pipeline() function (lines 507-897, ~390 lines) mixes generic orchestration with stage-specific logic:

Build-specific self-healing (lines 612-648): Self-healing build+test loop with TDD integration
Stage-specific test skip (lines 656-662): Skip test stage when handled by self-healing
Model routing (lines 694-763): ~70 lines of intelligence model routing per stage
Post-success callbacks (lines 789-821): Stage-specific discovery broadcasting, memory capture
Human intervention (lines 542-569): Skip-stage directives, human messages

These can be extracted into composable functions in lib/pipeline-stages.sh, creating the run_<stage>_stage() wrappers the issue requests.

Design Alternatives Considered

Option A: Create `run_<stage>_stage()` wrappers per stage (14 functions)

Pros: Matches issue acceptance criteria exactly; each stage independently testable
Cons: Lots of boilerplate — most wrappers would just call stage_<id>() since only build/test have special logic
Rejected: Too much duplication for minimal benefit

Option B: Extract cross-cutting concerns into composable middleware functions

Pros: DRY; each concern (model routing, human intervention, GitHub checks, discovery) becomes a testable function; run_pipeline() becomes a clean loop
Cons: Slightly more complex function signatures
Chosen: This approach

Option C: Do nothing — decomposition is already done

Pros: No risk of regression
Cons: Doesn't satisfy issue acceptance criteria; misses opportunity to simplify run_pipeline()
Rejected: Issue requests specific deliverables

Component Diagram

sw-pipeline.sh (CLI router, 279 lines)
    |
    +-- lib/pipeline-cli.sh (argument parsing)
    +-- lib/pipeline-commands.sh (start/resume/status/abort)
    |       |
    |       +-- lib/pipeline-execution.sh (orchestration loop)
    |               |
    |               +-- lib/pipeline-stages.sh (stage runner + helpers) <-- MODIFY
    |               |       |
    |               |       +-- run_stage()          <-- NEW: unified stage runner
    |               |       +-- select_stage_model() <-- NEW: model routing extract
    |               |       +-- check_human_directives() <-- NEW: human intervention
    |               |       +-- broadcast_stage_discovery() <-- NEW: post-success hook
    |               |       |
    |               |       +-- lib/pipeline-stages-intake.sh (stage_intake, stage_plan, ...)
    |               |       +-- lib/pipeline-stages-build.sh (stage_build, stage_test, ...)
    |               |       +-- lib/pipeline-stages-review.sh (stage_review, ...)
    |               |       +-- lib/pipeline-stages-delivery.sh (stage_pr, stage_merge, ...)
    |               |       +-- lib/pipeline-stages-monitor.sh (stage_validate, stage_monitor)
    |               |
    |               +-- lib/pipeline-state.sh (state persistence)
    |
    +-- (intelligence, GitHub, memory modules)

Interface Contracts

# NEW: Unified stage runner — encapsulates timing, checks, audit, discovery
# Called by run_pipeline() for each stage. Returns 0 on success, 1 on failure.
# Side effects: updates stage status, emits events, writes model-routing.log
run_stage() {
    local stage_id="$1"          # e.g. "intake", "build", "plan"
    local enabled_count="$2"     # total enabled stages (for progress display)
    local completed_count="$3"   # stages completed so far
    # Returns: 0=success, 1=failure
    # Sets: LAST_STAGE_ERROR, LAST_STAGE_ERROR_CLASS (on failure)
}

# NEW: Intelligence model routing for a stage
# Extracts the ~70 lines of UCB1/A/B model routing from run_pipeline()
select_stage_model() {
    local stage_id="$1"
    # Side effect: exports CLAUDE_MODEL, emits intelligence.model_* events
    # Returns: 0 always (model selection is best-effort)
}

# NEW: Check for human intervention directives (skip-stage, human-message)
# Returns 0 if stage should proceed, 1 if skipped
check_human_directives() {
    local stage_id="$1"
    # Returns: 0=proceed, 1=skipped (emits stage.skipped event)
}

# NEW: Post-success discovery broadcast
broadcast_stage_discovery() {
    local stage_id="$1"
    # Side effect: calls sw-discovery.sh broadcast with stage-appropriate patterns
}

Data Flow

run_pipeline() iterates stages from PIPELINE_CONFIG
    |
    +-- For each stage:
    |   +-- check_human_directives(stage_id) -> skip or proceed
    |   +-- check intelligence skip (existing pipeline_should_skip_stage)
    |   +-- check already complete (get_stage_status)
    |   +-- Handle self-healing build+test special case
    |   +-- Gate check (approve/auto)
    |   +-- Budget check
    |   +-- select_stage_model(stage_id) -> sets CLAUDE_MODEL
    |   +-- run_stage(stage_id, counts) -> timing, retry, events, audit, discovery
    |
    +-- Pipeline summary on completion

Error Boundaries

run_stage() catches stage failures via run_stage_with_retry(), records error class/message, emits events. Returns 1 on failure — run_pipeline() handles pipeline-level failure response.
select_stage_model() is best-effort — all failures suppressed with || true. Defaults to _smart_model default sonnet.
check_human_directives() is fail-safe — file read errors suppressed. Worst case: stage proceeds normally.
broadcast_stage_discovery() is fire-and-forget — all calls guarded with 2>/dev/null || true.

Files to Modify

File	Action	Lines Changed (est.)
`scripts/lib/pipeline-stages.sh`	Add `run_stage()`, `select_stage_model()`, `check_human_directives()`, `broadcast_stage_discovery()`	+180
`scripts/lib/pipeline-execution.sh`	Simplify `run_pipeline()` to use new functions	-200, +40
`scripts/sw-lib-pipeline-stages-test.sh`	Add unit tests for all new functions	+250
`scripts/sw-lib-pipeline-execution-test.sh`	New test file for execution orchestration	+200 (new file)

Implementation Steps

Step 1: Extract `check_human_directives()` into `pipeline-stages.sh`

Move lines 542-569 from run_pipeline() into a standalone function. Lowest-risk extraction — pure file reads with no complex state.

Step 2: Extract `select_stage_model()` into `pipeline-stages.sh`

Move lines 694-763 from run_pipeline(). Most complex extraction — UCB1 + A/B testing logic.

Step 3: Extract `broadcast_stage_discovery()` into `pipeline-stages.sh`

Move lines 809-821 from run_pipeline(). Simple stage-to-pattern mapping + subprocess call.

Step 4: Create `run_stage()` in `pipeline-stages.sh`

Composite function that wraps: timing, status updates, GitHub check runs, audit, retry, model outcome recording, vitals, and discovery broadcast. This is lines 766-852 of run_pipeline() refactored into a self-contained function.

Step 5: Simplify `run_pipeline()` in `pipeline-execution.sh`

Replace inline logic with calls to new functions. Main loop calls check_human_directives, select_stage_model, and run_stage instead of having all that logic inline.

Step 6: Write unit tests for new functions in `sw-lib-pipeline-stages-test.sh`

check_human_directives: skip-stage file, human-message file, no files, malformed files
select_stage_model: UCB1 path, A/B test path, no intelligence functions, graduated path
broadcast_stage_discovery: each stage type maps to correct patterns
run_stage: success path, failure path, timing recorded, events emitted

Step 7: Create `sw-lib-pipeline-execution-test.sh` for orchestration tests

New test file covering:

run_stage_with_retry: plan artifact skip, error classification, retry backoff
self_healing_build_test: convergence detection, cycle limits

Step 8: Run existing tests to verify no regression

./scripts/sw-lib-pipeline-stages-test.sh
./scripts/sw-pipeline-test.sh
./scripts/sw-e2e-smoke-test.sh

Step 9: Register new test in package.json

Add sw-lib-pipeline-execution-test.sh to the test scripts in package.json.

Step 10: Update CLAUDE.md documentation

Update the Architecture section to reflect the new function boundaries and the Shared Libraries table.

Task Checklist

Testing Approach

Test Pyramid Breakdown

Unit tests (~20 tests): Each new function tested in isolation with mocked dependencies
- check_human_directives: 4 tests (skip file, human message, no files, cleanup)
- select_stage_model: 5 tests (UCB1, A/B experiment, A/B control, graduated, no intelligence)
- broadcast_stage_discovery: 4 tests (plan, build, test, unknown stage)
- run_stage: 5 tests (success, failure, timing, audit, GitHub checks)
Integration tests (~5 tests): run_pipeline() with mocked stage functions to verify sequencing
E2E tests (existing): sw-pipeline-test.sh and sw-e2e-smoke-test.sh verify full pipeline flow

Coverage Targets

100% of new function code paths
Existing test suites pass without modification (critical requirement)

Critical Paths to Test

Happy path: Stage runs, succeeds, timing/events recorded, discovery broadcast
Error case 1: Stage fails, error classified, pipeline halts with correct status
Error case 2: Human skip directive removes stage from skip file
Edge case 1: select_stage_model() with no intelligence functions available
Edge case 2: run_stage() when audit/vitals/checks functions don't exist (optional modules)

Definition of Done

pipeline-stages.sh exports: check_human_directives, select_stage_model, broadcast_stage_discovery, run_stage
run_pipeline() in pipeline-execution.sh uses new functions (reduced by ~200 lines)
All 102+ existing test suites pass without modification
New unit tests pass for each extracted function
New sw-lib-pipeline-execution-test.sh registered in package.json
No performance regression: function call overhead is negligible for shell (no new subshells)
CLAUDE.md updated with new architecture

Failure Mode Analysis

1. Variable Scope Breakage (CRITICAL)

What could break: Extracted functions reference global variables (ARTIFACTS_DIR, ISSUE_NUMBER, CLAUDE_MODEL, etc.) that are set in sw-pipeline.sh. If a function is sourced in a different context (e.g., test env), variables may be unset. Mitigation: Each new function will use ${VAR:-default} patterns for all global variable references, matching existing lib file conventions. Test environment sets all required globals (already shown in existing test setup).

2. Self-Healing Build+Test Coupling

What could break: The build+test self-healing loop (lines 612-648) tightly couples build and test stages with counter management. Extracting this incorrectly could break the completed counter. Mitigation: Leave the self-healing block inline in run_pipeline() — it's already a special case that interleaves two stages. Only extract the per-stage concerns (model routing, timing, discovery).

3. `return` vs `exit` in Sourced Functions

What could break: Using return in a function that's called from a pipeline (|) or subshell would exit the subshell, not the calling function. Mitigation: Maintain direct function calls (no pipes). All new functions use return 0/return 1 consistently. Test both success and failure paths.

4. Module Load Order

What could break: New functions in pipeline-stages.sh reference functions from pipeline-state.sh, pipeline-intelligence-skip.sh, and helper modules. If sourced before those modules, functions will be undefined. Mitigation: pipeline-stages.sh is already sourced AFTER all dependency modules in sw-pipeline.sh (line 52, after lines 37-50). New functions use type funcname >/dev/null 2>&1 guards before calling optional functions, matching existing patterns.

Baseline Metrics

Performance profiling is not applicable for this refactoring because:

Shell function calls have negligible overhead (~0.1ms per call)
No new subshells, subprocesses, or file I/O are introduced
The refactoring is a code reorganization, not an algorithm change
Pipeline stage latency is dominated by Claude API calls (30s-300s per stage), not shell orchestration

The existing STAGE_TIMINGS mechanism already captures per-stage timing. We'll verify no regression by comparing before/after timings in the existing test suite.

Profiling Strategy

Not applicable — see Baseline Metrics above. The refactoring moves code between files without changing execution paths.

Optimization Targets

Not applicable — this is a refactoring for testability and maintainability, not performance.

Benchmark Plan

Run sw-pipeline-test.sh (58 tests) and sw-e2e-smoke-test.sh (19 tests) before and after. Both suites complete in <60s and exercise the full pipeline orchestration path. Any timing regression >5% would indicate an issue.

Root Cause Hypothesis

Stale issue metadata (Likelihood: HIGH) — The issue was created when sw-pipeline.sh was monolithic. Prior work already extracted stages. Evidence: sw-pipeline.sh is 279 lines, stage files exist.
Partial extraction (Likelihood: HIGH) — Stages were extracted but orchestration concerns weren't. Evidence: run_pipeline() still has ~390 lines of mixed concerns.
Scope creep in orchestrator (Likelihood: MEDIUM) — New features (model routing, discovery, human intervention) were added to run_pipeline() after the initial extraction, re-bloating it.

Evidence Gathered

sw-pipeline.sh: 279 lines (confirmed via wc -l)
lib/pipeline-execution.sh: 897 lines, with run_pipeline() spanning lines 507-897
Stage functions (stage_intake, stage_plan, etc.) already exist in lib/pipeline-stages-*.sh
Existing sw-lib-pipeline-stages-test.sh tests helper functions but not orchestration
No existing sw-lib-pipeline-execution-test.sh

Fix Strategy

Complete the decomposition by extracting cross-cutting concerns from run_pipeline() into testable functions in pipeline-stages.sh. This differs from the original issue's assumption (extract stages from sw-pipeline.sh) because stages are already extracted — the remaining work is orchestration cleanup.

Verification Plan

Run existing tests BEFORE changes to establish baseline
Extract one function at a time, running tests after each
Run full test suite after all extractions
Verify run_pipeline() line count decreased by ~200 lines
Verify new functions are independently testable

Pipeline Plan 189

Remaining Work

Design Alternatives Considered

Option A: Create run_<stage>_stage() wrappers per stage (14 functions)

Option B: Extract cross-cutting concerns into composable middleware functions

Option C: Do nothing — decomposition is already done

Component Diagram

Interface Contracts

Data Flow

Error Boundaries

Files to Modify

Implementation Steps

Step 1: Extract check_human_directives() into pipeline-stages.sh

Step 2: Extract select_stage_model() into pipeline-stages.sh

Step 3: Extract broadcast_stage_discovery() into pipeline-stages.sh

Step 4: Create run_stage() in pipeline-stages.sh

Step 5: Simplify run_pipeline() in pipeline-execution.sh

Step 6: Write unit tests for new functions in sw-lib-pipeline-stages-test.sh

Step 7: Create sw-lib-pipeline-execution-test.sh for orchestration tests

Step 8: Run existing tests to verify no regression

Step 9: Register new test in package.json

Step 10: Update CLAUDE.md documentation

Task Checklist

Testing Approach

Test Pyramid Breakdown

Coverage Targets

Critical Paths to Test

Definition of Done

Failure Mode Analysis

1. Variable Scope Breakage (CRITICAL)

2. Self-Healing Build+Test Coupling

3. return vs exit in Sourced Functions

4. Module Load Order

Baseline Metrics

Profiling Strategy

Optimization Targets

Benchmark Plan

Root Cause Hypothesis

Evidence Gathered

Fix Strategy

Verification Plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Option A: Create `run_<stage>_stage()` wrappers per stage (14 functions)

Step 1: Extract `check_human_directives()` into `pipeline-stages.sh`

Step 2: Extract `select_stage_model()` into `pipeline-stages.sh`

Step 3: Extract `broadcast_stage_discovery()` into `pipeline-stages.sh`

Step 4: Create `run_stage()` in `pipeline-stages.sh`

Step 5: Simplify `run_pipeline()` in `pipeline-execution.sh`

Step 6: Write unit tests for new functions in `sw-lib-pipeline-stages-test.sh`

Step 7: Create `sw-lib-pipeline-execution-test.sh` for orchestration tests

3. `return` vs `exit` in Sourced Functions