Skip to content

Pipeline Plan 172

ezigus edited this page Mar 17, 2026 · 2 revisions

Implementation Plan: sw-pipeline.sh Modular Extraction — Stage Executor and State Manager

Status Assessment

The core extraction work is already complete (commit 1d85a6a). sw-pipeline.sh was reduced from 3,171 → 708 lines (78% reduction). Both target modules exist:

  • scripts/lib/pipeline-state.sh (612 lines) — state read/write/validate
  • scripts/lib/pipeline-stage-executor.sh (645 lines) — stage execution with hooks

Remaining work: Test coverage gaps need closing to meet the >80% criterion.


Component Diagram

┌─────────────────────────────────────────────────────┐
│              sw-pipeline.sh (708 lines)             │
│         CLI dispatch, sourcing, signal setup        │
│                                                     │
│  ┌─────────────┐  ┌──────────────┐  ┌───────────┐  │
│  │  commands    │  │ orchestration│  │  stages-*  │  │
│  │  (836 ln)   │→ │  (813 ln)    │→ │  (various) │  │
│  └─────────────┘  └──────────────┘  └───────────┘  │
│         │                │                          │
│         ▼                ▼                          │
│  ┌──────────────────────────────┐                   │
│  │   pipeline-stage-executor.sh │ ← TARGET MODULE   │
│  │   (645 lines)               │                   │
│  │   • classify_error()        │                   │
│  │   • run_stage_with_retry()  │                   │
│  │   • self_healing_build_test()│                  │
│  │   • self_healing_review_*() │                   │
│  └──────────────┬──────────────┘                   │
│                 ▼                                   │
│  ┌──────────────────────────────┐                   │
│  │   pipeline-state.sh          │ ← TARGET MODULE   │
│  │   (612 lines)               │                   │
│  │   • save_artifact()         │                   │
│  │   • get/set_stage_status()  │                   │
│  │   • mark_stage_complete()   │                   │
│  │   • mark_stage_failed()     │                   │
│  │   • write_state()           │                   │
│  │   • resume_state()          │                   │
│  │   • initialize_state()      │                   │
│  └──────────────────────────────┘                   │
└─────────────────────────────────────────────────────┘

Dependency direction: orchestration → executor → state (inward only, no cycles).


Interface Contracts

pipeline-state.sh Public API

// Artifact persistence
save_artifact(name: string, content: string): void  // writes to ARTIFACTS_DIR

// Stage status CRUD
get_stage_status(stage_id: string): string  // "pending"|"running"|"complete"|"failed"|""
set_stage_status(stage_id: string, status: string): void

// Timing
record_stage_start(stage_id: string): void
record_stage_end(stage_id: string): void
get_stage_timing(stage_id: string): string  // formatted "1m30s"
get_stage_timing_seconds(stage_id: string): number  // raw seconds, 0 if unknown
get_slowest_stage(): string  // stage_id or ""

// Descriptions & progress
get_stage_description(stage_id: string): string
build_stage_progress(): string  // "intake:complete plan:running test:pending"

// State transitions (side effects: writes state, emits events, updates GitHub)
update_status(status: string, stage: string): void
mark_stage_complete(stage_id: string): void  // Error: event emit failure (non-fatal)
mark_stage_failed(stage_id: string): void    // Error: event emit failure (non-fatal)

// Persistence
initialize_state(): void     // Resets all state, calls write_state()
write_state(): void          // Error: disk space check failure → return 1
resume_state(): void         // Error: missing state file → exit 1, missing goal → exit 1

// Validation
verify_stage_artifacts(stage_id: string): boolean  // 0=ok, 1=missing artifacts
persist_artifacts(stage: string, ...files: string[]): void  // CI-only, non-fatal

// Meta-cognition
record_stage_effectiveness(stage_id: string, outcome: string): void
get_stage_self_awareness_hint(stage_id: string): string

// Logging
log_stage(stage_id: string, message: string): void

pipeline-stage-executor.sh Public API

// Error classification
classify_error(stage_id: string): "infrastructure"|"configuration"|"logic"|"unknown"

// Execution
run_stage_with_retry(stage_id: string): boolean  // 0=success, 1=failure
  // Error: configuration errors → immediate failure (no retry)
  // Error: repeated logic errors → immediate failure

// Self-healing loops
self_healing_build_test(): boolean  // 0=tests pass, 1=exhausted
  // Error: infrastructure error (simulator not found) → immediate exit
  // Error: convergence stuck (same error 3x) → early exit
  // Error: plateau (no progress 2x) → early exit

self_healing_review_build_test(): boolean  // 0=review passes, 1=exhausted

Data Flow

CLI args → parse_args → pipeline_start()
  → initialize_state() [state module]
  → run_pipeline() [orchestration]
    → for each stage:
      → record_stage_start() [state]
      → run_stage_with_retry() [executor]
        → stage_<id>() [stage modules]
        → classify_error() on failure [executor]
      → mark_stage_complete/failed() [state]
        → write_state() → STATE_FILE
        → emit_event() → events.jsonl
        → gh_update_progress() → GitHub

Error Boundaries

Component Handles Propagates
state Disk space (write_state), missing artifacts (verify), CI push failures (persist_artifacts) Missing state file → exit 1
executor Infrastructure/config/logic classification, convergence detection, retry decisions Stage failure → return 1 to orchestration
orchestration Stage sequencing, gate enforcement, self-healing loop control Pipeline failure → write final state, exit

Test Coverage Gap Analysis

pipeline-state.sh — Current: ~60% function coverage

Function Tested Lines Priority
get_slowest_stage No 15 Medium
build_stage_progress No 20 Medium
update_status No 5 Low (simple)
write_state No 68 High (core persistence)
resume_state No 42 High (crash recovery)
mark_stage_complete No 94 High (many side effects)
mark_stage_failed No 68 High (many side effects)

pipeline-stage-executor.sh — Current: ~40% line coverage

Function Tested Lines Priority
self_healing_build_test No 287 High (core loop)
self_healing_review_build_test No 69 Medium
run_stage_with_retry (edge cases) Partial 157 Medium

Alternatives Considered

Alternative 1: Further decomposition (split large functions)

  • Pros: Smaller units, easier to test
  • Cons: More files, more source calls, higher complexity for bash
  • Decision: Rejected. The current module boundaries are clean. Adding tests for existing functions is simpler and lower-risk than restructuring.

Alternative 2: Rewrite tests using bats-core

  • Pros: Better bash testing framework, TAP output
  • Cons: New dependency, rewrite all existing tests, learning curve
  • Decision: Rejected. Existing test-helpers.sh framework works well and is already used by 9 test files. Consistency matters more.

Alternative 3: Test only public API, skip internal functions

  • Pros: Fewer tests to write
  • Cons: Won't reach >80% coverage target on complex functions like mark_stage_complete
  • Decision: Rejected. Key functions have complex side-effect chains that need verification.

Risk Analysis

Risk Impact Mitigation
Tests for mark_stage_complete/failed need many stubs Medium — fragile test setup Use existing stub pattern from state test file; stub only external calls
self_healing_build_test test could be slow (sleep calls) Low — test isolation Override sleep in test, mock stage functions to fail/succeed deterministically
write_state test could corrupt real state Low Tests already use TEST_TEMP_DIR isolation
resume_state test needs valid state file format Low Generate state file content in test using known-good format

Files to Modify

File Action
scripts/sw-lib-pipeline-state-test.sh Modify — add tests for write_state, resume_state, get_slowest_stage, build_stage_progress, mark_stage_complete, mark_stage_failed
scripts/sw-lib-pipeline-stage-executor-test.sh Modify — add tests for self_healing_build_test convergence detection, run_stage_with_retry edge cases

No new files needed. No production code changes required.


Implementation Steps

Step 1: Add state module tests (pipeline-state.sh)

Add to sw-lib-pipeline-state-test.sh:

  1. get_slowest_stage: Set up STAGE_TIMINGS with multiple stages, verify correct stage returned. Test empty case returns "".

  2. build_stage_progress: Create minimal PIPELINE_CONFIG JSON, set various stage statuses, verify progress string format.

  3. update_status: Call update_status, verify PIPELINE_STATUS and CURRENT_STAGE are set. Verify write_state was called.

  4. write_state: Set up all state variables, call write_state, read STATE_FILE and verify YAML frontmatter structure contains all expected fields (pipeline, goal, status, issue, stages).

  5. resume_state: Write a known state file, call resume_state (with stubs for gh_init, load_pipeline_config, git), verify all variables are restored correctly. Test error cases: missing file (exit 1), missing goal (exit 1), already-complete pipeline (exit 0).

  6. mark_stage_complete: Stub all external calls (emit_event, gh_*, checkpoint, etc.). Call mark_stage_complete, verify: stage status set to "complete", timing recorded, log entry added, write_state called.

  7. mark_stage_failed: Same pattern as mark_stage_complete but verify "failed" status and error comment format.

Step 2: Add executor module tests (pipeline-stage-executor.sh)

Add to sw-lib-pipeline-stage-executor-test.sh:

  1. run_stage_with_retry — plan artifact skip: Create a plan.md with >10 lines, make stage_plan fail. Verify it returns 0 (skip retry because artifact exists).

  2. run_stage_with_retry — configuration error escalation: Create a log file with "MODULE_NOT_FOUND" error, make stage fail. Verify it returns 1 immediately (no retry).

  3. self_healing_build_test — happy path: Mock stage_build and stage_test to succeed on first try. Verify returns 0.

  4. self_healing_build_test — convergence stuck: Mock stage_test to fail with same error 3 times. Verify early exit with return 1.

  5. self_healing_build_test — plateau detection: Mock stage_test to fail with same failure count for 2 iterations. Verify early exit.

Step 3: Verify all tests pass

  1. Run npm test to verify no regressions.
  2. Run individual test files to verify new tests pass in isolation.

Task Checklist

  • Task 1: Add get_slowest_stage and build_stage_progress tests to state test file
  • Task 2: Add write_state test — verify YAML output format and all fields
  • Task 3: Add resume_state tests — happy path and error cases (missing file, missing goal, complete pipeline)
  • Task 4: Add mark_stage_complete test with stubbed externals
  • Task 5: Add mark_stage_failed test with stubbed externals
  • Task 6: Add update_status test
  • Task 7: Add run_stage_with_retry edge case tests (plan artifact skip, config error escalation)
  • Task 8: Add self_healing_build_test happy path test
  • Task 9: Add self_healing_build_test convergence detection tests (stuck + plateau)
  • Task 10: Run full test suite — verify all tests pass with no regressions

Testing Approach

Test Pyramid Breakdown

  • Unit tests (target: ~25 new tests): Test each function in isolation with mocked dependencies
    • State module: ~15 new tests (write_state, resume_state, mark_stage_complete/failed, get_slowest_stage, build_stage_progress, update_status)
    • Executor module: ~10 new tests (self_healing_build_test paths, run_stage_with_retry edge cases)
  • Integration tests (existing): The 12 tests in sw-pipeline-test.sh cover end-to-end pipeline flows
  • E2E tests: Not applicable (shell scripts, no deployment)

Coverage Targets

  • pipeline-state.sh: 80%+ function coverage (from ~60% → ~95%)
  • pipeline-stage-executor.sh: 80%+ line coverage (from ~40% → ~80%)
  • Overall pipeline module coverage: >80%

Critical Paths to Test

  • Happy path: write_state produces valid YAML → resume_state restores it correctly (round-trip)
  • Error case 1: resume_state with missing/corrupt state file → exits with error
  • Error case 2: self_healing_build_test stuck on same error → early convergence exit
  • Edge case 1: get_slowest_stage with no timing data → returns ""
  • Edge case 2: mark_stage_complete with all optional integrations absent → still succeeds

Definition of Done

  • sw-pipeline.sh < 1500 lines (already 708)
  • pipeline-state.sh exists as separate module (already exists, 612 lines)
  • pipeline-stage-executor.sh exists as separate module (already exists, 645 lines)
  • All existing tests pass (npm test green)
  • New tests added: 20+ additional unit tests across both modules
  • Function coverage >80% for both modules
  • No production code changes (test-only additions)
  • Each module can be sourced independently without side effects (include guards verified)

Clone this wiki locally