Pipeline Plan 200

Plan written to .claude/pipeline-artifacts/plan.md.

Summary: The Test Execution Optimization feature is fully implemented and audit-passed. Key components:

testopt_execute() — Facade orchestrator combining all optimization steps into a single entry point
Parallel execution — CPU-aware worker detection (75% of cores, capped 2-8), independent tests run concurrently
Affected-first — Git diff analysis maps changed files to affected tests, prioritizes by historical fail rate
Fast-fail — Stops on first critical failure, parallel failures cascade to skip sequential bucket
Shared-state detection — 6 patterns (hardcoded /tmp, port binding, SQLite, PID/lock, TMPDIR, config sourcing) partition tests into parallel-safe vs sequential buckets
Pipeline integration — Wired into both stage_test() and run_test_gate() with backwards-compatible disable switches
45 total tests — 20 unit + 25 integration, all passing v +------------------+ +-------------------+ +------------------+ | testopt_init() | |testopt_partition_ | | testopt_report() | | - discover | |shared_state() | | - stats summary | | - load history | | - 6 patterns | | - evidence JSON | | - git diff | | - independent | +------------------+ | - select | | vs shared | | affected | +--------+----------+ | - prioritize | | +------------------+ +----+----+ v v +-----------+ +--------------+ | Parallel | | Sequential | | Bucket | | Bucket | | (N workers| | (fast-fail) | +-----------+ +--------------+


## Interface Contracts

testopt_execute <project_root> <test_cmd> [options] --max-workers=N Override CPU-detected worker count --fast-fail Stop on first failure (default: true) --continue-on-fail Run all tests despite failures --mode=auto|parallel|sequential Execution mode (default: auto) Returns: exit 0 (all pass) | exit 1 (any failure)

testopt_detect_cores() -> stdout: integer (2-8) testopt_partition_shared_state(file...) -> stdout: "SHARED:path" | "INDEPENDENT:path" testopt_select_affected(changed_files...) -> sets AFFECTED_TESTS array testopt_prioritize(tests...) -> stdout: sorted test file paths testopt_run_parallel(--max-workers=N, tests...) -> exit 0|1 testopt_run_with_fast_fail([--continue-on-fail], tests...) -> exit 0|1


## Data Flow

stage_test() or run_test_gate() -> testopt_execute(project_root, test_cmd, --fast-fail)
testopt_init() -> discover *-test.sh files -> load ~/.shipwright/optimization/test-history.jsonl
git diff HEAD~1..HEAD -> CHANGED_FILES -> testopt_select_affected() -> AFFECTED_TESTS
testopt_prioritize() -> sort by fail_rate DESC, duration ASC
testopt_partition_shared_state() -> parallel[] + sequential[]
Phase 1: testopt_run_parallel(parallel_tests, workers=detect_cores*0.75)
Phase 2: testopt_run_with_fast_fail(sequential_tests) -- skipped if Phase 1 fails + fast_fail
Record results -> test-history.jsonl, emit events, write evidence JSON


## Error Boundaries

| Component | Error Handling | Fallback |
|-----------|---------------|----------|
| testopt_init | Catches discovery/history failures | Falls through to raw bash -c test_cmd |
| testopt_execute | <3 test files -> skips optimizer | Runs raw test command directly |
| Parallel runner | Subshell failures caught via temp file | grep FAIL in results file |
| Sequential runner | exit code checked per test | Fast-fail stops on first failure |
| History recording | Errors suppressed (2>/dev/null) | Missing history = no prioritization |

## Files Modified

| File | Changes | Status |
|------|---------|--------|
| scripts/lib/test-optimizer.sh | +229 lines: testopt_detect_cores, testopt_partition_shared_state, testopt_execute | Done |
| scripts/lib/pipeline-stages-build.sh | stage_test() calls optimizer when optimization \!= off | Done |
| scripts/sw-loop.sh | run_test_gate() uses optimizer when SW_TEST_OPTIMIZER \!= false | Done |
| scripts/sw-test-optimizer-integration-test.sh | 438 lines, 25 tests covering all new functions | Done |
| templates/pipelines/*.json (9 files) | Added optimization: auto, fast_fail: true | Done |

## Implementation Steps (Completed)

1. Added testopt_detect_cores() -- CPU detection with 75% utilization, capped [2,8]
2. Added testopt_partition_shared_state() -- 6 patterns: hardcoded /tmp, port binding, SQLite, PID/lock files, singleton TMPDIR, global config sourcing
3. Added testopt_execute() -- single orchestrator entry point combining init, prioritize, partition, parallel+sequential execution, history recording, evidence output
4. Wired stage_test() to call testopt_execute when pipeline config has optimization \!= off
5. Wired run_test_gate() to use optimizer when SW_TEST_OPTIMIZER \!= false
6. Updated all 9 pipeline templates with optimization config
7. Fixed deduplication guard for empty array (Bash 3.2 compat)
8. Fixed mktemp calls to use TMPDIR prefix
9. Fixed parallel runner to use grep-based failure detection (subshell var propagation bug)
10. Created comprehensive integration test suite (25 tests)

## Task Checklist

- [x] Task 1: CPU core detection (testopt_detect_cores) -- Darwin/Linux/fallback, 75% cap
- [x] Task 2: Shared-state partitioning (testopt_partition_shared_state) -- 6 detection patterns
- [x] Task 3: Facade orchestrator (testopt_execute) -- single entry point for optimized execution
- [x] Task 4: Wire into stage_test() in pipeline-stages-build.sh
- [x] Task 5: Wire into run_test_gate() in sw-loop.sh
- [x] Task 6: Update all pipeline templates with optimization config
- [x] Task 7: Integration test suite -- 25 tests covering core detection, partitioning, orchestrator, wiring
- [x] Task 8: Bash 3.2 compatibility fixes (empty array guard, mktemp prefix)
- [x] Task 9: Subshell variable propagation fix in parallel runner
- [x] Task 10: Evidence JSON output for dashboard consumption
- [x] Task 11: Event emission for observability (testopt.parallel_done, testopt.sequential_done, testopt.fail_fast, testopt.recorded)
- [x] Task 12: Backwards compatibility (SW_TEST_OPTIMIZER=false, optimization: off)

## Testing Approach

### Test Pyramid Breakdown
- Unit tests (25 in sw-test-optimizer-integration-test.sh): Core detection, shared-state partitioning, orchestrator execution, pipeline wiring, loop wiring
- Existing unit tests (20 in sw-test-optimizer-test.sh): Discovery, history, affected selection, prioritization, fast-fail, parallel execution
- Integration: Verified via npm test -- all 102+ suites pass (3 pre-existing failures in unrelated modules)

### Coverage Targets
- All public functions tested: detect_cores, partition_shared_state, execute orchestrator
- Error paths: missing project root, <3 tests fallback, empty history, no changes
- Edge cases: all tests shared, all tests independent, single test file

### Critical Paths Tested
- Happy path: 5 independent + 2 shared-state tests -> parallel then sequential execution
- Error: failing test in parallel bucket -> fast-fail skips sequential
- Error: failing test with --continue-on-fail -> runs all tests
- Edge: empty project -> discovers 0 tests -> runs raw command
- Edge: <3 tests -> falls back to raw command

## Acceptance Criteria Verification

| Criterion | Status | Evidence |
|-----------|--------|----------|
| Detect parallelizable test files by scanning for shared state | Done | testopt_partition_shared_state() with 6 patterns |
| Run independent tests in parallel (CPU-based max parallelism) | Done | testopt_run_parallel() + testopt_detect_cores() |
| Analyze git diff to identify affected modules, prioritize tests | Done | testopt_get_changed_files() + testopt_select_affected() + testopt_prioritize() |
| Fast-fail: abort on first critical failure | Done | testopt_run_with_fast_fail() + parallel fast-fail cascade |
| Track time savings in metrics | Done | Events emitted, evidence JSON written |
| Works with fast-test-cmd and full suite modes | Done | run_test_gate() uses optimizer for both modes |

## Definition of Done

- [x] All acceptance criteria met
- [x] All new code is Bash 3.2 compatible
- [x] set -euo pipefail in all scripts
- [x] Event logging via emit_event
- [x] Atomic file writes (tmp + mv pattern)
- [x] 25 new integration tests, all passing
- [x] Backwards compatible (SW_TEST_OPTIMIZER=false disables)
- [x] No regressions in existing test suites
- [x] Audit passed (iteration 1 post-audit)

## Failure Mode Analysis

### 1. Parallel Test Race Conditions (Concurrency Risk)
Risk: Tests classified as independent may actually share state not detected by the 6 patterns.
Mitigation: The shared-state detector is conservative -- any match sends the test to the sequential bucket. Users can force --mode=sequential or set SW_TEST_OPTIMIZER=false.

### 2. Subshell Variable Loss in Parallel Runner (Runtime Failure)
Risk: Background jobs run in subshells -- variable assignments dont propagate to the parent.
Mitigation: Already fixed. The parallel runner writes results to a temp file and uses grep -q FAIL on the results file instead of relying on the all_passed variable from subshells.

### 3. History File Corruption Under Concurrent Writes (Concurrency Risk)
Risk: Multiple parallel pipelines writing to test-history.jsonl simultaneously could interleave lines.
Mitigation: Each write appends a single JSONL line -- atomic at the filesystem level for small writes. The JSONL format is self-healing (corrupt lines are skipped on read).

### 4. CPU Core Over-Detection on Constrained Systems (Scale Risk)
Risk: On systems with many cores but limited memory, spawning 8 parallel bash test processes could cause OOM.
Mitigation: Hard cap at 8 workers. The 75% heuristic and [2,8] clamp provides a conservative default. Users can override with --max-workers=N.

## Baseline Metrics

| Metric | Before Optimization | Target |
|--------|-------------------|--------|
| Full test suite duration | ~1365s | <900s (34% reduction via parallelism) |
| Pipeline test stage | Sequential only | Parallel + affected-first + fast-fail |
| Test execution mode | Single bash -c test_cmd | Partitioned parallel/sequential with prioritization |

## Optimization Targets

- Reduce test stage wall-clock time by 30-50% through parallel execution
- Reduce time-to-first-failure by 70%+ through affected-first prioritization + fast-fail
- Zero regression in test correctness

## Profiling Strategy

- Event data (testopt.parallel_done, testopt.sequential_done) captures actual wall-clock per phase
- Evidence JSON records total/parallel/sequential counts, workers, and exit status
- Historical test-history.jsonl tracks per-test duration trends
- Dashboard /api/metrics/stage-performance surfaces test stage duration trends

## Benchmark Plan

- Before: time npm test on main branch (sequential, no optimization)
- After: time npm test on this branch (with optimizer enabled)
- Success criteria: wall-clock reduction visible in evidence JSON and event data
- Realistic data: full 102+ test suite, not synthetic benchmarks

Pipeline Plan 200

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally