Skip to content

Pipeline Plan 232

ezigus edited this page Mar 27, 2026 · 2 revisions

I'll provide the comprehensive implementation plan here. This analyzes the root cause, proposes solutions, and structures the work:


Implementation Plan: Fix Stale Tasks Injected on Pipeline Resume (Issue #232)

Root Cause Analysis (Systematic Debugging)

Evidence Gathered

Primary Cause - Confirmed:

  • pipeline-tasks.md is written by intake stage (line 451-462 in pipeline-stages-intake.sh) with metadata: Issue: ${GITHUB_ISSUE}
  • File is NEVER removed in initialize_state() (line 461 in pipeline-state.sh) — only clears model-routing.log and .plan-failure-sig.txt
  • File is NEVER validated in resume_state() (line 534-610) — blindly restores state without checking metadata
  • File is NEVER validated in stage_build() before injection (line 172 in pipeline-stages-build.sh) — injects without checking issue number

Secondary Cause - Suspected:

  • TEST_CMD is set to echo FAIL; exit 1 (mentioned in issue as existing in test harness at sw-lib-pipeline-stages-test.sh:251)
  • This value somehow leaks into pipeline TEST_CMD instead of being npm test
  • Causes tests to always fail, triggering unnecessary self-healing cycles

Root Cause Hypothesis (Ranked)

  1. VERY HIGH (99%): Task file not cleared/validated on resume — STATE MANAGEMENT BUG

    • Evidence: Three independent validation gaps (initialize, resume, inject) all missing
    • Fix: Add cleanup + validation at each point
  2. HIGH (85%): TEST_CMD environment variable scoping issue

    • Evidence: Test harness defines echo FAIL; exit 1 at line 251
    • Fix: Isolate test command in local variable, not global
  3. MEDIUM (70%): Resume flow doesn't restore clean environment

    • Evidence: resume_state() never validates metadata matches
    • Fix: Add issue metadata check before restoring

Alternatives Considered

Alternative A: "Clear on Initialize + Validate on Resume" ✓ CHOSEN

Approach:

  1. Add rm -f "$TASKS_FILE" in initialize_state()
  2. Add issue validation in resume_state() — extract issue from file, compare to GITHUB_ISSUE
  3. Add safe injection check in stage_build() before using TASKS_FILE

Pros:

  • Minimal changes (~15 production lines across 2 files)
  • Defense-in-depth: three independent validation points catch stale tasks
  • Very low risk: clear, targeted fixes with no refactoring
  • Easy to test and verify

Cons:

  • Validation logic scattered across 2 files (not centralized)
  • Happens at different times in flow

Trade-offs: Simplicity wins over consolidation; perfect for isolated bug


Alternative B: "Metadata-Scoped Task Validation"

Approach:

  • Store commit hash + tree hash in task file
  • On resume, verify code hasn't changed drastically
  • Selectively inject only relevant tasks

Pros:

  • Intelligent: understands which tasks apply to current code
  • Future-proof for multi-issue scenarios

Cons:

  • Complex (~50+ lines)
  • Overkill: pipelines are single-issue
  • Hard to maintain and test

Trade-offs: Over-engineering; not chosen


Alternative C: "Helper Function Consolidation"

Approach:

  • Create validate_tasks_file() helper
  • Called from initialize_state(), resume_state(), stage_build()

Pros:

  • Single source of truth for validation
  • Easier to extend validation later

Cons:

  • More code (~30 lines for helper)
  • Still adds complexity for simple fix

Trade-offs: Good design, but overkill for localized bug


Task Decomposition (with Dependencies)

Production Code Tasks

  1. Task 1: Clear tasks on initialize (no deps)

    • File: scripts/lib/pipeline-state.sh line 461
    • Add: rm -f "$TASKS_FILE" after existing cleanup
    • Lines: 1
    • Blocks: None (independent)
  2. Task 2: Extract issue from task metadata (dep: Task 1)

    • File: scripts/lib/pipeline-state.sh after line 609 in resume_state()
    • Parse: Extract Issue: #NNN from TASKS_FILE second header section
    • Lines: ~6
    • Blocks: Task 3
  3. Task 3: Validate issue matches on resume (dep: Task 2)

    • File: scripts/lib/pipeline-state.sh after Task 2
    • Logic: If extracted issue ≠ current GITHUB_ISSUE, remove file + warn
    • Lines: ~5
    • Blocks: Task 4
  4. Task 4: Validate before task injection (dep: Task 3)

    • File: scripts/lib/pipeline-stages-build.sh line 172
    • Logic: Before injecting TASKS_FILE, verify issue metadata matches $GITHUB_ISSUE
    • Lines: ~6
    • Blocks: None (defensive check)
  5. Task 5: Investigate and fix TEST_CMD leak (no deps)

    • Files: pipeline-stages-build.sh line 447, sw-lib-pipeline-stages-test.sh line 251
    • Investigate: How does echo FAIL; exit 1 get into TEST_CMD?
    • Fix: Isolate test harness value in local variable
    • Lines: ~3-5
    • Blocks: None (independent)

Test Tasks

  1. Task 6: Unit test — initialize clears tasks (dep: Task 1)

    • File: scripts/sw-lib-pipeline-stages-test.sh
    • Test: Create TASKS_FILE, call initialize_state(), verify removed
    • Lines: ~15
  2. Task 7: Unit test — resume validates issue (dep: Task 3)

    • File: scripts/sw-lib-pipeline-stages-test.sh
    • Test: Create tasks with Issue #154, resume with #232, verify removed
    • Lines: ~25
  3. Task 8: Unit test — injection validates issue (dep: Task 4)

    • File: scripts/sw-lib-pipeline-stages-test.sh
    • Test: Create TASKS_FILE with wrong issue, verify not injected
    • Lines: ~20
  4. Task 9: Integration test — full resume flow (dep: Tasks 1-4)

    • File: scripts/sw-lib-pipeline-stages-test.sh
    • Test: Complete scenario with issue change + resume + build
    • Lines: ~35
  5. Task 10: Run full test suite (dep: Tasks 1-9)

    • Command: ./scripts/sw-pipeline-test.sh
    • Verify: All 50+ existing tests pass (no regressions)

Risk Analysis

Risk Impact Prob. Mitigation
Clear valid tasks Build loses context Very Low Issue mismatch only triggers clear; normal flow leaves tasks intact
Validation logic breaks Pipeline crashes Low Use safe string ops, `
Tasks still injected Stale tasks still visible Very Low Three independent guards make bypass impossible
TEST_CMD leak not fixed Tests keep failing Medium Direct code trace in test harness; verify in pipeline test
Existing tests break Regression Low Full test suite before commit; validation only removes mismatched tasks
Performance impact Slow operations Very Low Only string comparisons + file unlink; no loops/API calls

Mitigation Strategy: Each validation point is independent; one broken check doesn't disable others. Tests catch regressions immediately.


Files to Modify

  1. scripts/lib/pipeline-state.sh (~1200 LOC)

    • initialize_state(): +1 line (rm -f)
    • resume_state(): +11 lines (parse + validate)
    • Impact: 15 lines, very low risk
  2. scripts/lib/pipeline-stages-build.sh (~500 LOC)

    • stage_build(): +6 lines validation before injection
    • Impact: 6 lines, low risk
  3. scripts/lib/pipeline-stages.sh OR pipeline-stages-build.sh

    • TEST_CMD handling: +3-5 lines (if leak found)
    • Impact: Minimal if needed
  4. scripts/sw-lib-pipeline-stages-test.sh (~200 LOC)

    • Add 4 new test cases: ~95 lines total
    • Impact: Test-only, no production risk

Testing Approach

Critical Paths to Test

Happy Path (No Issue Change):

  • Resume pipeline for same issue → tasks remain and are injected ✓

Error Case 1 (Issue Changed):

  • Resume for issue #232 with tasks from #154 → stale tasks removed ✓

Error Case 2 (Malformed Task File):

  • Resume with corrupted TASKS_FILE → gracefully skip without crashing ✓

Edge Case 1 (Missing Task File):

  • Resume when TASKS_FILE doesn't exist → continue normally ✓

Edge Case 2 (Multiple Resumes):

  • Resume → modify code → resume again → no accumulation of stale tasks ✓

Test Coverage Targets

  • Unit tests: 4 focused tests (~70 lines) covering each validation point
  • Integration test: 1 scenario test (~35 lines) covering full flow
  • Regression tests: Full existing test suite (50+ tests) must pass

Definition of Done

  • Code Complete

    • Task 1: initialize_state() removes pipeline-tasks.md
    • Task 2: resume_state() extracts issue from metadata
    • Task 3: resume_state() validates and removes stale tasks
    • Task 4: stage_build() validates before injection
    • Task 5: TEST_CMD leak investigated and fixed
  • Tests Pass

    • Task 6: Initialize clear test passes
    • Task 7: Resume validation test passes
    • Task 8: Injection validation test passes
    • Task 9: Full integration test passes
    • Task 10: All 50+ existing tests pass (no regressions)
  • Acceptance Verified

    • Old tasks not injected when issue changes
    • New tasks ARE injected when issue matches
    • Malformed files don't crash pipeline
    • Resume flow unchanged for valid cases
    • TEST_CMD properly isolated (if issue found)
  • Quality Checks

    • Bash 3.2 compatible (no associative arrays, etc.)
    • Proper error handling (all checks use safe patterns)
    • Logging appropriate (info/warn used correctly)
    • No debug code left in

Execution Order

  1. Task 1 — Clear on initialize (1 line, lowest risk, validates immediately)
  2. Task 2-3 — Validate on resume (11 lines, core fix)
  3. Task 4 — Defensive check in build (6 lines, final guard)
  4. Task 5 — Investigate TEST_CMD (parallel, may be small or larger)
  5. Task 6-9 — Tests in order of dependencies (15+25+20+35 = 95 lines)
  6. Task 10 — Full test suite verification (pass/fail decision point)

Estimated Implementation Time: ~60 lines code + 95 lines tests = 155 total lines. Low complexity, high confidence in solution.

Clone this wiki locally