Skip to content

Pipeline Plan 448

ezigus edited this page Apr 30, 2026 · 1 revision

I now have enough context. Here's the implementation plan.


Implementation Plan: Pipeline Cycling Halt (stuck_cycling)

Files to Modify

File Change
scripts/sw-pipeline.sh Add count_consecutive_test_failures(), add cycling halt check in self_healing_build_test(), expose env var default, add stuck_cycling to status display
scripts/sw-pipeline-test.sh Add unit test for counter function + E2E test for stuck_cycling exit

Root Cause Analysis

What's happening: self_healing_build_test() runs N build→test cycles (BUILD_TEST_RETRIES=3 default). When it exhausts cycles and returns 1, external automation (daemon, autonomous pipeline) re-invokes the pipeline fresh — resetting STUCKNESS_COUNT, RESTART_COUNT, and EXTENSION_COUNT inside sw-loop.sh. The pipeline-state.md log persists but nothing reads it to count cumulative failures.

Why convergence detection doesn't save us: The same-error × 3 detector (consecutive_same_error) resets when error signature changes (e.g., different timestamps, changed assertion counts). Plateau detection resets when prev_fail_count varies. Neither tracks failures across separate self_healing_build_test invocations.

Minimum viable fix: Add a persistent counter that reads pipeline-state.md log history before each build attempt. Since the log survives restarts, it catches cycling across invocations.


Alternatives Considered

Option A — Persistent state file counter (chosen) Parse ### test (ts)\nfailed (...) entries from pipeline-state.md before each build attempt. No new files, reuses existing log grammar.

Trade-offs: + Survives daemon restarts (persistent); + No new file I/O path; + Counter resets naturally when test passes; − Requires parsing the state file on each cycle start (cheap: <1ms sequential read).

Option B — External counter file Write a consecutive-test-failures.txt file in ARTIFACTS_DIR. Increment on test fail, reset on test pass.

Trade-offs: + Simple read/write; − Not automatically cleaned on fresh start; − New artifact that _cleanup_run_artifacts() would need to preserve; − Doesn't help if ARTIFACTS_DIR is wiped on restart.

Option C — Add counter to state file frontmatter Add consecutive_test_failures: N field to pipeline state.

Trade-offs: + Clean data model; − Requires modifying both initialize_state() and write_state() in pipeline-state.sh; − More surface area, higher blast radius.

Decision: Option A is the minimum viable change — no new files, no new state fields, leverages existing log format already parsed elsewhere (see resume_state() recovery logic at line 769).


Risk Analysis

Risk What Could Break Mitigation
Bash regex for BASH_REMATCH Fails on bash 3.2 if pattern has groups Pattern ^###[[:space:]]+([a-z_]+)[[:space:]]+ is POSIX ERE, works in bash 3.2+
State file not yet written First cycle, state file empty — count returns 0 Guard: [[ -f "$state_file" ]] before reading
Review self-healing re-uses self_healing_build_test Second call path at line 1758 also gets the check Check fires on BOTH paths — correct behavior, no regressions
SW_PIPELINE_MAX_BUILD_RETRIES=0 disables check Daemon running with explicit 0 will cycle indefinitely This is the documented override; user opted in
stuck_cycling state blocks pipeline resume Automated restart fails even with SW_PIPELINE_MAX_BUILD_RETRIES=0 Resume state does NOT treat stuck_cycling as terminal — user overrides via env var and resume proceeds, where the check is bypassed
Log format change in future Parser stops counting correctly Parser uses same regex already in resume_state() at line 774 — it's stable

Data Flow and Architecture

sw-pipeline.sh::self_healing_build_test()
  │
  ├─► [TOP OF WHILE LOOP] count_consecutive_test_failures($STATE_FILE)
  │     │
  │     └─► reads pipeline-state.md §## Log
  │           parses: ### test (ts)\ncomplete|failed
  │           returns: N (trailing consecutive "failed" count)
  │
  ├─ if N >= SW_PIPELINE_MAX_BUILD_RETRIES (and > 0):
  │     update_status("stuck_cycling", "build")
  │     log_stage("pipeline", "stuck_cycling: ...")
  │     emit_event("pipeline.stuck_cycling", ...)
  │     return 1
  │
  └─ else: run build → run test → mark result → loop
                                      │
                                      └─► mark_stage_failed("test")
                                            log_stage("test", "failed (...)")
                                            write_state()
                                            ← persists to pipeline-state.md

Counter reset mechanism: mark_stage_complete("test") calls log_stage("test", "complete (...)"). Parser sees complete → resets trailing count to 0. No explicit reset needed.


Implementation Steps

Step 1 — Add SW_PIPELINE_MAX_BUILD_RETRIES default near line 812 (where BUILD_TEST_RETRIES is set):

SW_PIPELINE_MAX_BUILD_RETRIES=${SW_PIPELINE_MAX_BUILD_RETRIES:-3}

Step 2 — Add count_consecutive_test_failures() function in sw-pipeline.sh, immediately before self_healing_build_test() (around line 1422). The function must be Bash 3.2 compatible (no associative arrays, no ${var,,}):

count_consecutive_test_failures() {
    local state_file="${1:-${STATE_FILE:-}}"
    [[ -z "$state_file" || ! -f "$state_file" ]] && echo 0 && return 0
    local in_log=0 current_stage="" outcomes=""
    while IFS= read -r line; do
        if [[ "$line" == "## Log" ]]; then in_log=1; continue; fi
        [[ "$in_log" -eq 0 ]] && continue
        if [[ "$line" =~ ^###[[:space:]]+([a-z_]+)[[:space:]]+ ]]; then
            current_stage="${BASH_REMATCH[1]}"; continue
        fi
        if [[ "$current_stage" == "test" ]]; then
            if [[ "$line" =~ ^complete ]]; then
                outcomes="$outcomes pass"; current_stage=""
            elif [[ "$line" =~ ^failed ]]; then
                outcomes="$outcomes fail"; current_stage=""
            fi
        fi
    done < "$state_file"
    local count=0 word
    for word in $outcomes; do
        if [[ "$word" == "fail" ]]; then count=$((count + 1))
        elif [[ "$word" == "pass" ]]; then count=0; fi
    done
    echo "$count"
}

Step 3 — Add cycling halt check in self_healing_build_test() at the top of the while loop body (after cycle=$((cycle + 1)), before the build runs, ~line 1483):

# Outer cycling halt: persistent consecutive test failure cap
local _max_build_retries="${SW_PIPELINE_MAX_BUILD_RETRIES:-3}"
if [[ "$_max_build_retries" -gt 0 ]]; then
    local _consec_failures
    _consec_failures=$(count_consecutive_test_failures)
    if [[ "$_consec_failures" -ge "$_max_build_retries" ]]; then
        update_status "stuck_cycling" "build"
        log_stage "pipeline" "stuck_cycling: ${_consec_failures} consecutive test failures (cap=${_max_build_retries}). Override: SW_PIPELINE_MAX_BUILD_RETRIES=0"
        write_state
        error "Pipeline halted: ${_consec_failures} consecutive test failures reached cap of ${_max_build_retries}"
        warn "Override: SW_PIPELINE_MAX_BUILD_RETRIES=0 shipwright pipeline resume"
        emit_event "pipeline.stuck_cycling" \
            "issue=${ISSUE_NUMBER:-0}" \
            "consecutive_failures=${_consec_failures}" \
            "cap=${_max_build_retries}" || true
        return 1
    fi
fi

Step 4 — Add stuck_cycling to status display in pipeline_status() around line 3407:

stuck_cycling) status_icon="${YELLOW}${RESET}" ;;

Step 5 — Add E2E and unit tests to sw-pipeline-test.sh:

Test A (test_count_consecutive_test_failures_parsing): Extract count_consecutive_test_failures into a temp script, feed synthetic state files with known failure patterns (0 failures, 3 failures, pass-then-2-fails, etc.), assert correct counts.

Test B (test_stuck_cycling_halts_after_max_build_retries): Full E2E. Set SW_PIPELINE_MAX_BUILD_RETRIES=2. Use mock sw-loop that always commits but test always fails. Pre-seed state file with 1 prior test failure. Run pipeline; assert stuck_cycling in state file on second test failure.

Step 6 — Register both tests in the main() tests array in sw-pipeline-test.sh.


Task Checklist

  • Task 1: Add SW_PIPELINE_MAX_BUILD_RETRIES default near line 812 in sw-pipeline.sh
  • Task 2: Implement count_consecutive_test_failures() function in sw-pipeline.sh before self_healing_build_test()
  • Task 3: Add cycling halt check inside self_healing_build_test() while loop, before each build attempt
  • Task 4: Emit pipeline.stuck_cycling event with consecutive count and cap
  • Task 5: Write diagnostic to state file via log_stage("pipeline", "stuck_cycling: ...")
  • Task 6: Add stuck_cycling case to pipeline_status() display
  • Task 7: Write unit test test_count_consecutive_test_failures_parsing — function extraction + synthetic state files
  • Task 8: Write E2E test test_stuck_cycling_halts_after_max_build_retries — mock always-failing test, pre-seeded state, verify halt
  • Task 9: Register both tests in main() tests array
  • Task 10: Run npm test and verify all tests pass

Testing Approach

Test Pyramid:

  • 1 unit test: extracts the parsing function, tests it against 5+ synthetic state files
  • 1 E2E test: runs real sw-pipeline.sh with mocked build/test environment

Unit test coverage targets:

  • Empty/missing state file → 0 (edge case)
  • State file with no test entries → 0
  • 1 failure → 1
  • 3 consecutive failures → 3
  • 2 passes then 3 failures → 3 (counter resets on pass)
  • Pass after failures → 0

E2E critical path: SW_PIPELINE_MAX_BUILD_RETRIES=2 + 1 pre-seeded failure + always-failing test → stuck_cycling status after 1 additional failure (total=2).


Definition of Done

  • After SW_PIPELINE_MAX_BUILD_RETRIES (default 3) consecutive test failures, pipeline exits with status: stuck_cycling in pipeline-state.md
  • A diagnostic log entry explains the halt and how to override
  • SW_PIPELINE_MAX_BUILD_RETRIES=0 disables the cap (loop runs unbounded)
  • Counter resets to 0 when any test stage succeeds
  • New unit test in sw-pipeline-test.sh validates the parsing function
  • New E2E test in sw-pipeline-test.sh validates the full stuck_cycling exit path
  • npm test passes with no regressions
  • Both the self_healing_build_test path (line 1959) and the self_healing_review_build_test path (line 1758) are covered (both call self_healing_build_test, so the check fires on both)

Clone this wiki locally