Skip to content

Pipeline Plan 200

Seth Ford edited this page Apr 4, 2026 · 2 revisions

Plan written to .claude/pipeline-artifacts/plan.md.

Summary: The Test Execution Optimization feature is fully implemented and audit-passed. Key components:

  1. testopt_execute() — Facade orchestrator combining all optimization steps into a single entry point
  2. Parallel execution — CPU-aware worker detection (75% of cores, capped 2-8), independent tests run concurrently
  3. Affected-first — Git diff analysis maps changed files to affected tests, prioritizes by historical fail rate
  4. Fast-fail — Stops on first critical failure, parallel failures cascade to skip sequential bucket
  5. Shared-state detection — 6 patterns (hardcoded /tmp, port binding, SQLite, PID/lock, TMPDIR, config sourcing) partition tests into parallel-safe vs sequential buckets
  6. Pipeline integration — Wired into both stage_test() and run_test_gate() with backwards-compatible disable switches
  7. 45 total tests — 20 unit + 25 integration, all passing v +------------------+ +-------------------+ +------------------+ | testopt_init() | |testopt_partition_ | | testopt_report() | | - discover | |shared_state() | | - stats summary | | - load history | | - 6 patterns | | - evidence JSON | | - git diff | | - independent | +------------------+ | - select | | vs shared | | affected | +--------+----------+ | - prioritize | | +------------------+ +----+----+ v v +-----------+ +--------------+ | Parallel | | Sequential | | Bucket | | Bucket | | (N workers| | (fast-fail) | +-----------+ +--------------+

## Interface Contracts

testopt_execute <project_root> <test_cmd> [options] --max-workers=N Override CPU-detected worker count --fast-fail Stop on first failure (default: true) --continue-on-fail Run all tests despite failures --mode=auto|parallel|sequential Execution mode (default: auto) Returns: exit 0 (all pass) | exit 1 (any failure)

testopt_detect_cores() -> stdout: integer (2-8) testopt_partition_shared_state(file...) -> stdout: "SHARED:path" | "INDEPENDENT:path" testopt_select_affected(changed_files...) -> sets AFFECTED_TESTS array testopt_prioritize(tests...) -> stdout: sorted test file paths testopt_run_parallel(--max-workers=N, tests...) -> exit 0|1 testopt_run_with_fast_fail([--continue-on-fail], tests...) -> exit 0|1


## Data Flow

  1. stage_test() or run_test_gate() -> testopt_execute(project_root, test_cmd, --fast-fail)
  2. testopt_init() -> discover *-test.sh files -> load ~/.shipwright/optimization/test-history.jsonl
  3. git diff HEAD~1..HEAD -> CHANGED_FILES -> testopt_select_affected() -> AFFECTED_TESTS
  4. testopt_prioritize() -> sort by fail_rate DESC, duration ASC
  5. testopt_partition_shared_state() -> parallel[] + sequential[]
  6. Phase 1: testopt_run_parallel(parallel_tests, workers=detect_cores*0.75)
  7. Phase 2: testopt_run_with_fast_fail(sequential_tests) -- skipped if Phase 1 fails + fast_fail
  8. Record results -> test-history.jsonl, emit events, write evidence JSON

## Error Boundaries

| Component | Error Handling | Fallback |
|-----------|---------------|----------|
| testopt_init | Catches discovery/history failures | Falls through to raw bash -c test_cmd |
| testopt_execute | <3 test files -> skips optimizer | Runs raw test command directly |
| Parallel runner | Subshell failures caught via temp file | grep FAIL in results file |
| Sequential runner | exit code checked per test | Fast-fail stops on first failure |
| History recording | Errors suppressed (2>/dev/null) | Missing history = no prioritization |

## Files Modified

| File | Changes | Status |
|------|---------|--------|
| scripts/lib/test-optimizer.sh | +229 lines: testopt_detect_cores, testopt_partition_shared_state, testopt_execute | Done |
| scripts/lib/pipeline-stages-build.sh | stage_test() calls optimizer when optimization \!= off | Done |
| scripts/sw-loop.sh | run_test_gate() uses optimizer when SW_TEST_OPTIMIZER \!= false | Done |
| scripts/sw-test-optimizer-integration-test.sh | 438 lines, 25 tests covering all new functions | Done |
| templates/pipelines/*.json (9 files) | Added optimization: auto, fast_fail: true | Done |

## Implementation Steps (Completed)

1. Added testopt_detect_cores() -- CPU detection with 75% utilization, capped [2,8]
2. Added testopt_partition_shared_state() -- 6 patterns: hardcoded /tmp, port binding, SQLite, PID/lock files, singleton TMPDIR, global config sourcing
3. Added testopt_execute() -- single orchestrator entry point combining init, prioritize, partition, parallel+sequential execution, history recording, evidence output
4. Wired stage_test() to call testopt_execute when pipeline config has optimization \!= off
5. Wired run_test_gate() to use optimizer when SW_TEST_OPTIMIZER \!= false
6. Updated all 9 pipeline templates with optimization config
7. Fixed deduplication guard for empty array (Bash 3.2 compat)
8. Fixed mktemp calls to use TMPDIR prefix
9. Fixed parallel runner to use grep-based failure detection (subshell var propagation bug)
10. Created comprehensive integration test suite (25 tests)

## Task Checklist

- [x] Task 1: CPU core detection (testopt_detect_cores) -- Darwin/Linux/fallback, 75% cap
- [x] Task 2: Shared-state partitioning (testopt_partition_shared_state) -- 6 detection patterns
- [x] Task 3: Facade orchestrator (testopt_execute) -- single entry point for optimized execution
- [x] Task 4: Wire into stage_test() in pipeline-stages-build.sh
- [x] Task 5: Wire into run_test_gate() in sw-loop.sh
- [x] Task 6: Update all pipeline templates with optimization config
- [x] Task 7: Integration test suite -- 25 tests covering core detection, partitioning, orchestrator, wiring
- [x] Task 8: Bash 3.2 compatibility fixes (empty array guard, mktemp prefix)
- [x] Task 9: Subshell variable propagation fix in parallel runner
- [x] Task 10: Evidence JSON output for dashboard consumption
- [x] Task 11: Event emission for observability (testopt.parallel_done, testopt.sequential_done, testopt.fail_fast, testopt.recorded)
- [x] Task 12: Backwards compatibility (SW_TEST_OPTIMIZER=false, optimization: off)

## Testing Approach

### Test Pyramid Breakdown
- Unit tests (25 in sw-test-optimizer-integration-test.sh): Core detection, shared-state partitioning, orchestrator execution, pipeline wiring, loop wiring
- Existing unit tests (20 in sw-test-optimizer-test.sh): Discovery, history, affected selection, prioritization, fast-fail, parallel execution
- Integration: Verified via npm test -- all 102+ suites pass (3 pre-existing failures in unrelated modules)

### Coverage Targets
- All public functions tested: detect_cores, partition_shared_state, execute orchestrator
- Error paths: missing project root, <3 tests fallback, empty history, no changes
- Edge cases: all tests shared, all tests independent, single test file

### Critical Paths Tested
- Happy path: 5 independent + 2 shared-state tests -> parallel then sequential execution
- Error: failing test in parallel bucket -> fast-fail skips sequential
- Error: failing test with --continue-on-fail -> runs all tests
- Edge: empty project -> discovers 0 tests -> runs raw command
- Edge: <3 tests -> falls back to raw command

## Acceptance Criteria Verification

| Criterion | Status | Evidence |
|-----------|--------|----------|
| Detect parallelizable test files by scanning for shared state | Done | testopt_partition_shared_state() with 6 patterns |
| Run independent tests in parallel (CPU-based max parallelism) | Done | testopt_run_parallel() + testopt_detect_cores() |
| Analyze git diff to identify affected modules, prioritize tests | Done | testopt_get_changed_files() + testopt_select_affected() + testopt_prioritize() |
| Fast-fail: abort on first critical failure | Done | testopt_run_with_fast_fail() + parallel fast-fail cascade |
| Track time savings in metrics | Done | Events emitted, evidence JSON written |
| Works with fast-test-cmd and full suite modes | Done | run_test_gate() uses optimizer for both modes |

## Definition of Done

- [x] All acceptance criteria met
- [x] All new code is Bash 3.2 compatible
- [x] set -euo pipefail in all scripts
- [x] Event logging via emit_event
- [x] Atomic file writes (tmp + mv pattern)
- [x] 25 new integration tests, all passing
- [x] Backwards compatible (SW_TEST_OPTIMIZER=false disables)
- [x] No regressions in existing test suites
- [x] Audit passed (iteration 1 post-audit)

## Failure Mode Analysis

### 1. Parallel Test Race Conditions (Concurrency Risk)
Risk: Tests classified as independent may actually share state not detected by the 6 patterns.
Mitigation: The shared-state detector is conservative -- any match sends the test to the sequential bucket. Users can force --mode=sequential or set SW_TEST_OPTIMIZER=false.

### 2. Subshell Variable Loss in Parallel Runner (Runtime Failure)
Risk: Background jobs run in subshells -- variable assignments dont propagate to the parent.
Mitigation: Already fixed. The parallel runner writes results to a temp file and uses grep -q FAIL on the results file instead of relying on the all_passed variable from subshells.

### 3. History File Corruption Under Concurrent Writes (Concurrency Risk)
Risk: Multiple parallel pipelines writing to test-history.jsonl simultaneously could interleave lines.
Mitigation: Each write appends a single JSONL line -- atomic at the filesystem level for small writes. The JSONL format is self-healing (corrupt lines are skipped on read).

### 4. CPU Core Over-Detection on Constrained Systems (Scale Risk)
Risk: On systems with many cores but limited memory, spawning 8 parallel bash test processes could cause OOM.
Mitigation: Hard cap at 8 workers. The 75% heuristic and [2,8] clamp provides a conservative default. Users can override with --max-workers=N.

## Baseline Metrics

| Metric | Before Optimization | Target |
|--------|-------------------|--------|
| Full test suite duration | ~1365s | <900s (34% reduction via parallelism) |
| Pipeline test stage | Sequential only | Parallel + affected-first + fast-fail |
| Test execution mode | Single bash -c test_cmd | Partitioned parallel/sequential with prioritization |

## Optimization Targets

- Reduce test stage wall-clock time by 30-50% through parallel execution
- Reduce time-to-first-failure by 70%+ through affected-first prioritization + fast-fail
- Zero regression in test correctness

## Profiling Strategy

- Event data (testopt.parallel_done, testopt.sequential_done) captures actual wall-clock per phase
- Evidence JSON records total/parallel/sequential counts, workers, and exit status
- Historical test-history.jsonl tracks per-test duration trends
- Dashboard /api/metrics/stage-performance surfaces test stage duration trends

## Benchmark Plan

- Before: time npm test on main branch (sequential, no optimization)
- After: time npm test on this branch (with optimizer enabled)
- Success criteria: wall-clock reduction visible in evidence JSON and event data
- Realistic data: full 102+ test suite, not synthetic benchmarks

Clone this wiki locally