Skip to content

Pipeline Plan 189

Seth Ford edited this page Apr 5, 2026 · 2 revisions

Plan written to .claude/pipeline-artifacts/plan.md.

Summary of the plan:

The issue assumed sw-pipeline.sh was ~2839 lines, but prior work already decomposed it to 279 lines with stages in lib/pipeline-stages-*.sh. The remaining work is extracting cross-cutting concerns from run_pipeline() in pipeline-execution.sh (~390 lines of mixed orchestration):

4 new functions in pipeline-stages.sh:

  1. check_human_directives() — skip-stage/human-message file handling
  2. select_stage_model() — UCB1/A/B intelligence model routing
  3. broadcast_stage_discovery() — post-success cross-pipeline learning
  4. run_stage() — unified stage runner (timing, audit, checks, retry, events)

Key design decision: Extract composable middleware functions (Option B) rather than 14 per-stage wrappers (Option A), since most stages share identical orchestration — only build/test have special self-healing logic.

10 implementation tasks, 2 new/modified test files, ~20 new unit tests, ~200 lines removed from run_pipeline(). mpound_quality(), stage_audit()| |lib/pipeline-stages-delivery.sh| 965 |stage_pr(), stage_merge(), stage_deploy()| |lib/pipeline-stages-monitor.sh| 407 |stage_validate(), stage_monitor()` |

Stage implementations are already in domain-specific lib files. The acceptance criteria "New lib/pipeline-stages.sh with functions: run_intake_stage, run_plan_stage, etc." and "sw-pipeline.sh reduced to <1500 lines" are already satisfied in spirit.

Remaining Work

What's NOT yet decomposed is in pipeline-execution.sh — the run_pipeline() function (lines 507-897, ~390 lines) mixes generic orchestration with stage-specific logic:

  1. Build-specific self-healing (lines 612-648): Self-healing build+test loop with TDD integration
  2. Stage-specific test skip (lines 656-662): Skip test stage when handled by self-healing
  3. Model routing (lines 694-763): ~70 lines of intelligence model routing per stage
  4. Post-success callbacks (lines 789-821): Stage-specific discovery broadcasting, memory capture
  5. Human intervention (lines 542-569): Skip-stage directives, human messages

These can be extracted into composable functions in lib/pipeline-stages.sh, creating the run_<stage>_stage() wrappers the issue requests.

Design Alternatives Considered

Option A: Create run_<stage>_stage() wrappers per stage (14 functions)

  • Pros: Matches issue acceptance criteria exactly; each stage independently testable
  • Cons: Lots of boilerplate — most wrappers would just call stage_<id>() since only build/test have special logic
  • Rejected: Too much duplication for minimal benefit

Option B: Extract cross-cutting concerns into composable middleware functions

  • Pros: DRY; each concern (model routing, human intervention, GitHub checks, discovery) becomes a testable function; run_pipeline() becomes a clean loop
  • Cons: Slightly more complex function signatures
  • Chosen: This approach

Option C: Do nothing — decomposition is already done

  • Pros: No risk of regression
  • Cons: Doesn't satisfy issue acceptance criteria; misses opportunity to simplify run_pipeline()
  • Rejected: Issue requests specific deliverables

Component Diagram

sw-pipeline.sh (CLI router, 279 lines)
    |
    +-- lib/pipeline-cli.sh (argument parsing)
    +-- lib/pipeline-commands.sh (start/resume/status/abort)
    |       |
    |       +-- lib/pipeline-execution.sh (orchestration loop)
    |               |
    |               +-- lib/pipeline-stages.sh (stage runner + helpers) <-- MODIFY
    |               |       |
    |               |       +-- run_stage()          <-- NEW: unified stage runner
    |               |       +-- select_stage_model() <-- NEW: model routing extract
    |               |       +-- check_human_directives() <-- NEW: human intervention
    |               |       +-- broadcast_stage_discovery() <-- NEW: post-success hook
    |               |       |
    |               |       +-- lib/pipeline-stages-intake.sh (stage_intake, stage_plan, ...)
    |               |       +-- lib/pipeline-stages-build.sh (stage_build, stage_test, ...)
    |               |       +-- lib/pipeline-stages-review.sh (stage_review, ...)
    |               |       +-- lib/pipeline-stages-delivery.sh (stage_pr, stage_merge, ...)
    |               |       +-- lib/pipeline-stages-monitor.sh (stage_validate, stage_monitor)
    |               |
    |               +-- lib/pipeline-state.sh (state persistence)
    |
    +-- (intelligence, GitHub, memory modules)

Interface Contracts

# NEW: Unified stage runner — encapsulates timing, checks, audit, discovery
# Called by run_pipeline() for each stage. Returns 0 on success, 1 on failure.
# Side effects: updates stage status, emits events, writes model-routing.log
run_stage() {
    local stage_id="$1"          # e.g. "intake", "build", "plan"
    local enabled_count="$2"     # total enabled stages (for progress display)
    local completed_count="$3"   # stages completed so far
    # Returns: 0=success, 1=failure
    # Sets: LAST_STAGE_ERROR, LAST_STAGE_ERROR_CLASS (on failure)
}

# NEW: Intelligence model routing for a stage
# Extracts the ~70 lines of UCB1/A/B model routing from run_pipeline()
select_stage_model() {
    local stage_id="$1"
    # Side effect: exports CLAUDE_MODEL, emits intelligence.model_* events
    # Returns: 0 always (model selection is best-effort)
}

# NEW: Check for human intervention directives (skip-stage, human-message)
# Returns 0 if stage should proceed, 1 if skipped
check_human_directives() {
    local stage_id="$1"
    # Returns: 0=proceed, 1=skipped (emits stage.skipped event)
}

# NEW: Post-success discovery broadcast
broadcast_stage_discovery() {
    local stage_id="$1"
    # Side effect: calls sw-discovery.sh broadcast with stage-appropriate patterns
}

Data Flow

run_pipeline() iterates stages from PIPELINE_CONFIG
    |
    +-- For each stage:
    |   +-- check_human_directives(stage_id) -> skip or proceed
    |   +-- check intelligence skip (existing pipeline_should_skip_stage)
    |   +-- check already complete (get_stage_status)
    |   +-- Handle self-healing build+test special case
    |   +-- Gate check (approve/auto)
    |   +-- Budget check
    |   +-- select_stage_model(stage_id) -> sets CLAUDE_MODEL
    |   +-- run_stage(stage_id, counts) -> timing, retry, events, audit, discovery
    |
    +-- Pipeline summary on completion

Error Boundaries

  • run_stage() catches stage failures via run_stage_with_retry(), records error class/message, emits events. Returns 1 on failure — run_pipeline() handles pipeline-level failure response.
  • select_stage_model() is best-effort — all failures suppressed with || true. Defaults to _smart_model default sonnet.
  • check_human_directives() is fail-safe — file read errors suppressed. Worst case: stage proceeds normally.
  • broadcast_stage_discovery() is fire-and-forget — all calls guarded with 2>/dev/null || true.

Files to Modify

File Action Lines Changed (est.)
scripts/lib/pipeline-stages.sh Add run_stage(), select_stage_model(), check_human_directives(), broadcast_stage_discovery() +180
scripts/lib/pipeline-execution.sh Simplify run_pipeline() to use new functions -200, +40
scripts/sw-lib-pipeline-stages-test.sh Add unit tests for all new functions +250
scripts/sw-lib-pipeline-execution-test.sh New test file for execution orchestration +200 (new file)

Implementation Steps

Step 1: Extract check_human_directives() into pipeline-stages.sh

Move lines 542-569 from run_pipeline() into a standalone function. Lowest-risk extraction — pure file reads with no complex state.

Step 2: Extract select_stage_model() into pipeline-stages.sh

Move lines 694-763 from run_pipeline(). Most complex extraction — UCB1 + A/B testing logic.

Step 3: Extract broadcast_stage_discovery() into pipeline-stages.sh

Move lines 809-821 from run_pipeline(). Simple stage-to-pattern mapping + subprocess call.

Step 4: Create run_stage() in pipeline-stages.sh

Composite function that wraps: timing, status updates, GitHub check runs, audit, retry, model outcome recording, vitals, and discovery broadcast. This is lines 766-852 of run_pipeline() refactored into a self-contained function.

Step 5: Simplify run_pipeline() in pipeline-execution.sh

Replace inline logic with calls to new functions. Main loop calls check_human_directives, select_stage_model, and run_stage instead of having all that logic inline.

Step 6: Write unit tests for new functions in sw-lib-pipeline-stages-test.sh

  • check_human_directives: skip-stage file, human-message file, no files, malformed files
  • select_stage_model: UCB1 path, A/B test path, no intelligence functions, graduated path
  • broadcast_stage_discovery: each stage type maps to correct patterns
  • run_stage: success path, failure path, timing recorded, events emitted

Step 7: Create sw-lib-pipeline-execution-test.sh for orchestration tests

New test file covering:

  • run_stage_with_retry: plan artifact skip, error classification, retry backoff
  • self_healing_build_test: convergence detection, cycle limits

Step 8: Run existing tests to verify no regression

./scripts/sw-lib-pipeline-stages-test.sh
./scripts/sw-pipeline-test.sh
./scripts/sw-e2e-smoke-test.sh

Step 9: Register new test in package.json

Add sw-lib-pipeline-execution-test.sh to the test scripts in package.json.

Step 10: Update CLAUDE.md documentation

Update the Architecture section to reflect the new function boundaries and the Shared Libraries table.

Task Checklist

  • Task 1: Extract check_human_directives() from run_pipeline() into pipeline-stages.sh
  • Task 2: Extract select_stage_model() from run_pipeline() into pipeline-stages.sh
  • Task 3: Extract broadcast_stage_discovery() from run_pipeline() into pipeline-stages.sh
  • Task 4: Create run_stage() composite function in pipeline-stages.sh
  • Task 5: Refactor run_pipeline() in pipeline-execution.sh to use new functions
  • Task 6: Add unit tests for new functions to sw-lib-pipeline-stages-test.sh
  • Task 7: Create sw-lib-pipeline-execution-test.sh with orchestration tests
  • Task 8: Register new test file in package.json
  • Task 9: Run existing test suites to verify no regression
  • Task 10: Update CLAUDE.md documentation with new architecture

Testing Approach

Test Pyramid Breakdown

  • Unit tests (~20 tests): Each new function tested in isolation with mocked dependencies
    • check_human_directives: 4 tests (skip file, human message, no files, cleanup)
    • select_stage_model: 5 tests (UCB1, A/B experiment, A/B control, graduated, no intelligence)
    • broadcast_stage_discovery: 4 tests (plan, build, test, unknown stage)
    • run_stage: 5 tests (success, failure, timing, audit, GitHub checks)
  • Integration tests (~5 tests): run_pipeline() with mocked stage functions to verify sequencing
  • E2E tests (existing): sw-pipeline-test.sh and sw-e2e-smoke-test.sh verify full pipeline flow

Coverage Targets

  • 100% of new function code paths
  • Existing test suites pass without modification (critical requirement)

Critical Paths to Test

  • Happy path: Stage runs, succeeds, timing/events recorded, discovery broadcast
  • Error case 1: Stage fails, error classified, pipeline halts with correct status
  • Error case 2: Human skip directive removes stage from skip file
  • Edge case 1: select_stage_model() with no intelligence functions available
  • Edge case 2: run_stage() when audit/vitals/checks functions don't exist (optional modules)

Definition of Done

  • pipeline-stages.sh exports: check_human_directives, select_stage_model, broadcast_stage_discovery, run_stage
  • run_pipeline() in pipeline-execution.sh uses new functions (reduced by ~200 lines)
  • All 102+ existing test suites pass without modification
  • New unit tests pass for each extracted function
  • New sw-lib-pipeline-execution-test.sh registered in package.json
  • No performance regression: function call overhead is negligible for shell (no new subshells)
  • CLAUDE.md updated with new architecture

Failure Mode Analysis

1. Variable Scope Breakage (CRITICAL)

What could break: Extracted functions reference global variables (ARTIFACTS_DIR, ISSUE_NUMBER, CLAUDE_MODEL, etc.) that are set in sw-pipeline.sh. If a function is sourced in a different context (e.g., test env), variables may be unset. Mitigation: Each new function will use ${VAR:-default} patterns for all global variable references, matching existing lib file conventions. Test environment sets all required globals (already shown in existing test setup).

2. Self-Healing Build+Test Coupling

What could break: The build+test self-healing loop (lines 612-648) tightly couples build and test stages with counter management. Extracting this incorrectly could break the completed counter. Mitigation: Leave the self-healing block inline in run_pipeline() — it's already a special case that interleaves two stages. Only extract the per-stage concerns (model routing, timing, discovery).

3. return vs exit in Sourced Functions

What could break: Using return in a function that's called from a pipeline (|) or subshell would exit the subshell, not the calling function. Mitigation: Maintain direct function calls (no pipes). All new functions use return 0/return 1 consistently. Test both success and failure paths.

4. Module Load Order

What could break: New functions in pipeline-stages.sh reference functions from pipeline-state.sh, pipeline-intelligence-skip.sh, and helper modules. If sourced before those modules, functions will be undefined. Mitigation: pipeline-stages.sh is already sourced AFTER all dependency modules in sw-pipeline.sh (line 52, after lines 37-50). New functions use type funcname >/dev/null 2>&1 guards before calling optional functions, matching existing patterns.

Baseline Metrics

Performance profiling is not applicable for this refactoring because:

  • Shell function calls have negligible overhead (~0.1ms per call)
  • No new subshells, subprocesses, or file I/O are introduced
  • The refactoring is a code reorganization, not an algorithm change
  • Pipeline stage latency is dominated by Claude API calls (30s-300s per stage), not shell orchestration

The existing STAGE_TIMINGS mechanism already captures per-stage timing. We'll verify no regression by comparing before/after timings in the existing test suite.

Profiling Strategy

Not applicable — see Baseline Metrics above. The refactoring moves code between files without changing execution paths.

Optimization Targets

Not applicable — this is a refactoring for testability and maintainability, not performance.

Benchmark Plan

Run sw-pipeline-test.sh (58 tests) and sw-e2e-smoke-test.sh (19 tests) before and after. Both suites complete in <60s and exercise the full pipeline orchestration path. Any timing regression >5% would indicate an issue.

Root Cause Hypothesis

  1. Stale issue metadata (Likelihood: HIGH) — The issue was created when sw-pipeline.sh was monolithic. Prior work already extracted stages. Evidence: sw-pipeline.sh is 279 lines, stage files exist.
  2. Partial extraction (Likelihood: HIGH) — Stages were extracted but orchestration concerns weren't. Evidence: run_pipeline() still has ~390 lines of mixed concerns.
  3. Scope creep in orchestrator (Likelihood: MEDIUM) — New features (model routing, discovery, human intervention) were added to run_pipeline() after the initial extraction, re-bloating it.

Evidence Gathered

  • sw-pipeline.sh: 279 lines (confirmed via wc -l)
  • lib/pipeline-execution.sh: 897 lines, with run_pipeline() spanning lines 507-897
  • Stage functions (stage_intake, stage_plan, etc.) already exist in lib/pipeline-stages-*.sh
  • Existing sw-lib-pipeline-stages-test.sh tests helper functions but not orchestration
  • No existing sw-lib-pipeline-execution-test.sh

Fix Strategy

Complete the decomposition by extracting cross-cutting concerns from run_pipeline() into testable functions in pipeline-stages.sh. This differs from the original issue's assumption (extract stages from sw-pipeline.sh) because stages are already extracted — the remaining work is orchestration cleanup.

Verification Plan

  1. Run existing tests BEFORE changes to establish baseline
  2. Extract one function at a time, running tests after each
  3. Run full test suite after all extractions
  4. Verify run_pipeline() line count decreased by ~200 lines
  5. Verify new functions are independently testable

Clone this wiki locally