Skip to content

Pipeline Design 172

ezigus edited this page Mar 17, 2026 · 1 revision

Now I have enough context. Here's the ADR:


Design: sw-pipeline.sh modular extraction — stage executor and state manager

Context

sw-pipeline.sh was a 3,171-line monolith handling CLI dispatch, state management, stage execution, orchestration, self-healing loops, and GitHub integration. This made it difficult to test individual components, led to implicit coupling between state mutations and execution logic, and created a maintenance burden where any change risked cascading breakage.

Constraints:

  • Bash 3.2 compatibility (no associative arrays, no readarray)
  • set -euo pipefail across all scripts
  • Existing test framework (lib/test-helpers.sh) with PASS/FAIL counters — no bats-core
  • State persisted as YAML frontmatter in .claude/pipeline-state.md
  • Stage status tracked via newline-delimited key:value strings in shell variables (not JSON)
  • 17 pipeline library modules already exist under scripts/lib/

Current state: The extraction is complete (commit 1d85a6a). sw-pipeline.sh is 708 lines. The two target modules exist and are functional. What remains is closing test coverage gaps.

Decision

Module Boundaries (3 layers, strict dependency direction)

┌─────────────────────────────────────────────────────────┐
│  sw-pipeline.sh (708 lines)                             │
│  Responsibility: CLI dispatch, signal setup, sourcing   │
│                                                         │
│  ┌─────────────────┐  ┌────────────────────────────┐    │
│  │ pipeline-        │  │ pipeline-orchestration.sh  │    │
│  │ commands.sh      │→ │ (stage sequencing, gates)  │    │
│  │ (CLI verbs)      │  └────────────┬───────────────┘    │
│  └─────────────────┘               │                    │
│                                    ▼                    │
│            ┌────────────────────────────────────┐       │
│            │ pipeline-stage-executor.sh (645 ln) │       │
│            │ Error classify, retry, self-healing │       │
│            └────────────────┬───────────────────┘       │
│                             │                           │
│                             ▼                           │
│            ┌────────────────────────────────────┐       │
│            │ pipeline-state.sh (612 ln)          │       │
│            │ State CRUD, persistence, artifacts  │       │
│            └────────────────────────────────────┘       │
└─────────────────────────────────────────────────────────┘

Dependency direction: orchestration → executor → state (inward only, no cycles)

Include Guards

Both modules use include guards to prevent double-sourcing and ensure they can be loaded independently without side effects:

[[ -n "${_PIPELINE_STATE_LOADED:-}" ]] && return 0
_PIPELINE_STATE_LOADED=1

All required variables have safe defaults via ${VAR:-default} so modules can be sourced in test harnesses without the full sw-pipeline.sh environment.

State Representation

State is not JSON — it uses newline-delimited key:value strings in shell variables (STAGE_STATUSES, STAGE_TIMINGS, LOG_ENTRIES). This avoids jq round-trips for hot-path operations (status lookups happen per-stage). Persistence to disk uses YAML frontmatter in STATE_FILE via write_state().

Atomic writes: write_state() writes to a temp file then mv to prevent corruption on interrupt.

Error Classification Strategy

classify_error() in the executor uses a three-tier approach:

  1. Cache lookup — check ~/.shipwright/optimization/error-classifications.json for previously-seen error signatures (cksum-based)
  2. Pattern matching — grep log tail for known patterns (infrastructure: timeout/OOM/network; configuration: MODULE_NOT_FOUND/ENOENT; logic: assertion/TypeError)
  3. Fallback — "unknown" classification

Classification determines retry behavior: infrastructure errors retry, configuration errors fail immediately, logic errors fail after repeated occurrence.

Self-Healing Loops

self_healing_build_test() (287 lines) implements convergence detection:

  • Same error 3x → early exit (stuck)
  • Same failure count 2x → early exit (plateau)
  • Infrastructure error → immediate exit (not code-fixable)

This is the highest-complexity function and the primary test coverage gap.

Interface Contracts

pipeline-state.sh public API:

save_artifact(name, content) → void                    # writes to ARTIFACTS_DIR
get_stage_status(stage_id) → string                    # "pending"|"running"|"complete"|"failed"|""
set_stage_status(stage_id, status) → void
record_stage_start(stage_id) → void
record_stage_end(stage_id) → void
get_stage_timing(stage_id) → string                    # "1m30s"
get_stage_timing_seconds(stage_id) → int               # 0 if unknown
get_slowest_stage() → string                           # stage_id or ""
get_stage_description(stage_id) → string
build_stage_progress() → string                        # "intake:complete plan:running"
update_status(status, stage) → void                    # sets PIPELINE_STATUS + CURRENT_STAGE, calls write_state
mark_stage_complete(stage_id) → void                   # side effects: event, timing, GitHub, checkpoint
mark_stage_failed(stage_id) → void                     # side effects: event, error comment, GitHub
initialize_state() → void                              # resets all state
write_state() → void | return 1                        # disk space check, YAML frontmatter to STATE_FILE
resume_state() → void | exit 1                         # restores from STATE_FILE; exits on missing file/goal
verify_stage_artifacts(stage_id) → 0|1                 # checks expected artifacts exist

pipeline-stage-executor.sh public API:

classify_error(stage_id) → "infrastructure"|"configuration"|"logic"|"unknown"
run_stage_with_retry(stage_id) → 0|1                   # handles retry logic per classification
self_healing_build_test() → 0|1                        # build+test loop with convergence detection
self_healing_review_build_test() → 0|1                 # review+build+test loop

Error Boundaries

Component Handles locally Propagates upward
state Disk space (write_state), missing artifacts (verify), CI push failures (persist_artifacts) Missing state file → exit 1
executor Error classification, convergence detection, retry decisions Stage failure → return 1
orchestration Stage sequencing, gate enforcement, self-healing loop entry Pipeline failure → writes final state, exits

Data Flow

CLI args → parse_args → pipeline_start()
  → initialize_state()                        [state]
  → run_pipeline()                            [orchestration]
    → for each enabled stage:
      → record_stage_start(id)                [state]
      → run_stage_with_retry(id)              [executor]
        → stage_<id>()                        [stage modules]
        → classify_error(id) on failure       [executor]
      → mark_stage_complete(id)               [state → write_state → STATE_FILE]
         or mark_stage_failed(id)             [state → emit_event → events.jsonl]
                                              [state → gh_update_progress → GitHub]

Alternatives Considered

  1. Further decomposition (split large functions into sub-modules) — Pros: smaller units, easier to test in isolation. Cons: more files to source, higher bash overhead, more complex dependency graph. Rejected: current boundaries are clean and match single-responsibility. Adding tests for existing functions is lower-risk than restructuring.

  2. Rewrite tests using bats-core — Pros: proper TAP output, better assertion library, setup/teardown hooks. Cons: new dependency, requires rewriting 9 existing test files, team learning curve. Rejected: existing test-helpers.sh framework works well and consistency across the test suite matters more.

  3. Test only public API, skip internal functions — Pros: fewer tests to write, less maintenance. Cons: won't reach >80% coverage on complex functions like mark_stage_complete (94 lines with many side effects) and self_healing_build_test (287 lines). Rejected: these functions have complex branching that needs verification.

  4. Convert state to JSON throughout — Pros: structured data, easier validation. Cons: jq round-trips on every status lookup (hot path), Bash 3.2 compatibility concerns, large diff touching all consumers. Rejected: newline-delimited format is fast and sufficient; persistence layer already handles serialization.

Implementation Plan

  • Files to create: None
  • Files to modify:
    • scripts/sw-lib-pipeline-state-test.sh — add ~15 tests for write_state, resume_state, get_slowest_stage, build_stage_progress, mark_stage_complete, mark_stage_failed, update_status
    • scripts/sw-lib-pipeline-stage-executor-test.sh — add ~10 tests for self_healing_build_test (happy path, convergence stuck, plateau detection), run_stage_with_retry edge cases (plan artifact skip, configuration error escalation)
  • Dependencies: None new
  • Risk areas:
    • mark_stage_complete/failed tests require many stubs (emit_event, gh_*, checkpoint) — mitigated by reusing existing stub pattern already in the test file
    • self_healing_build_test contains sleep calls — override sleep in tests, mock stage functions to fail/succeed deterministically
    • write_state test must not corrupt real state — tests already use TEST_TEMP_DIR isolation
    • resume_state needs a valid YAML state file — generate content in test using known-good format from write_state

Validation Criteria

  • sw-pipeline.sh remains < 1,500 lines (currently 708 — already met)
  • pipeline-state.sh exists as separate module with include guard (already met, 612 lines)
  • pipeline-stage-executor.sh exists as separate module with include guard (already met, 645 lines)
  • All existing tests pass (npm test green, no regressions)
  • 20+ new unit tests added across both test files
  • Function coverage >80% for pipeline-state.sh (from ~60% → ~95%)
  • Line coverage >80% for pipeline-stage-executor.sh (from ~40% → ~80%)
  • Each module can be sourced independently without side effects (include guards + safe defaults verified)
  • No production code changes — test-only additions

Endpoint Specification / Rate Limiting / Versioning

Not applicable — these are shell modules with function-call interfaces, not HTTP APIs. The interface contracts above define the function signatures and error behavior.

Clone this wiki locally