-
Notifications
You must be signed in to change notification settings - Fork 1
Pipeline Design 200
ADR written to .claude/pipeline-artifacts/design.md (291 lines). Key architectural decisions documented:
- Static grep-based shared-state detection (6 patterns) over runtime isolation or manual annotation — conservative, portable, zero dependencies
- Temp file + grep for parallel result propagation — solves the bash subshell variable loss problem without IPC
- CPU-aware worker cap [2, 8] — scales to the machine, doesn't overwhelm constrained systems
-
Facade with raw
bash -cfallback — optimizer is purely additive, never worse than baseline
Four alternatives rejected with rationale: Vitest native (wrong layer), GNU parallel (no intelligence), Makefile deps (maintenance cost), Docker isolation (overhead exceeds gains).
” disable via SW_TEST_OPTIMIZER=false or optimization: off in pipeline config
- Must not break existing test correctness — parallel tests that share state must be detected and run sequentially
- History/prioritization must be append-only and self-healing (corrupt JSONL lines skipped on read)
Introduce a test execution optimizer (scripts/lib/test-optimizer.sh) as a library sourced by both pipeline stages and the loop harness. The optimizer implements a four-phase execution model:
┌──────────────────────────�
│ Entry Points │
│ stage_test() │
│ run_test_gate() │
└──────────┬───────────────┘
│
┌──────────▼───────────────�
│ testopt_execute() │
│ Facade orchestrator │
│ - parses options │
│ - gates on <3 tests │
│ - falls back on init err │
└──────────┬───────────────┘
│
┌────────────────┼────────────────�
│ │ │
┌──────────▼──────� ┌──────▼──────� ┌───────▼────────�
│ Phase 1: Init │ │ Phase 2: │ │ Phase 3: │
│ testopt_init() │ │ Prioritize │ │ Partition │
│ - discover │ │ testopt_ │ │ testopt_ │
│ *-test.sh │ │ prioritize()│ │ partition_ │
│ - load history │ │ - fail_rate │ │ shared_state() │
│ - git diff │ │ DESC │ │ - 6 patterns │
│ - select │ │ - duration │ │ → parallel[] │
│ affected │ │ ASC │ │ → sequential[] │
└──────────┬──────┘ └──────┬──────┘ └───────┬────────┘
└────────────────┼────────────────┘
│
┌────────────────┼───────────────�
│ │
┌──────────▼──────────� ┌──────────────▼──────────�
│ Phase 4a: Parallel │ │ Phase 4b: Sequential │
│ testopt_run_parallel│ │ testopt_run_with_ │
│ - N workers (2-8) │ │ fast_fail │
│ - dir-grouped │ │ - stop on first failure │
│ - temp-file results │ │ - skip if 4a failed + │
└──────────┬──────────┘ │ fast_fail enabled │
│ └──────────────┬──────────┘
└────────────────┬───────────────┘
│
┌──────────▼───────────────�
│ Output │
│ - history → JSONL append │
│ - evidence → JSON file │
│ - events → emit_event() │
│ - report → stdout │
└───────────────────────────┘
1. Shared-state detection via static grep analysis (not runtime isolation)
- Context: Need to determine which tests can run in parallel without race conditions.
-
Decision: Grep each test file for 6 patterns indicating shared state: hardcoded
/tmppaths, port binding, SQLite files, PID/lock files, singleton TMPDIR assignments, and global config sourcing. - Alternatives rejected: (a) Runtime sandboxing with namespaces/cgroups — too heavy, not portable to macOS. (b) Manual annotation — requires maintainer discipline, gets stale.
-
Consequences: False positives send safe tests to the sequential bucket (slower but correct). False negatives would cause flaky parallel failures. The patterns are intentionally conservative — Pattern 6 (config sourcing) captures anything that
sources a config file, which over-classifies but prevents subtle global-state mutation bugs.
2. Subshell variable propagation via temp file + grep (not direct variable)
-
Context: Background jobs in
testopt_run_parallel()run in subshells. Variable assignments (all_passed=false) don't propagate back to the parent. -
Decision: Each background job writes
"<file> PASS|FAIL <duration>"lines to a shared temp file. Parent checksgrep -q ' FAIL 'on the results file. -
Consequences: Simple, correct, no IPC mechanisms. The temp file may see interleaved writes from concurrent jobs, but each line is a single
echowhich is atomic for small writes on both Linux and macOS.
3. CPU-aware worker detection with hard cap at 8
- Context: Need to scale parallelism to the machine without overwhelming constrained systems.
-
Decision: Detect cores via
sysctl -n hw.ncpu(Darwin) //proc/cpuinfo(Linux) /nproc, use 75%, clamp to [2, 8]. - Consequences: 8-core machine gets 6 workers. 2-core CI runner gets 2 workers. Memory-constrained systems with many cores still get capped at 8.
4. Facade pattern with fallback to raw bash -c
- Context: The optimizer must never be worse than doing nothing.
-
Decision:
testopt_execute()falls back tobash -c "$test_cmd"when: (a) init fails, (b) fewer than 3 test files discovered, or (c) the optimizer is disabled via config. - Consequences: Zero risk of regression for edge cases. The optimizer is purely additive.
1. stage_test() reads pipeline config: optimization \!= "off" → calls testopt_execute()
run_test_gate() reads SW_TEST_OPTIMIZER \!= "false" → calls testopt_execute()
2. testopt_execute(".", "npm test", "--fast-fail")
└── testopt_init(".")
├── find *-test.sh *_test.sh test_*.sh → DISCOVERED_TESTS[]
├── read ~/.shipwright/optimization/test-history.jsonl → TEST_HISTORY[]
├── git diff HEAD~1..HEAD → CHANGED_FILES[]
└── testopt_select_affected() → AFFECTED_TESTS[] (directory + source matching)
3. testopt_prioritize(AFFECTED_TESTS)
└── for each test: score = (fail_rate * 10000) - duration_s
└── sort -rn → highest-fail-rate, fastest-duration first
4. testopt_partition_shared_state(prioritized_tests)
└── grep each file for 6 patterns → "SHARED:<path>" or "INDEPENDENT:<path>"
├── INDEPENDENT → parallel_tests[]
└── SHARED → sequential_tests[]
5. Phase 4a: testopt_run_parallel(parallel_tests, workers=detect_cores*0.75)
└── group tests by directory → background subshells → wait → grep results file
6. Phase 4b: testopt_run_with_fast_fail(sequential_tests)
└── skipped entirely if Phase 4a failed AND fast_fail=true
└── otherwise runs one-by-one, breaks on first failure
7. Record results → test-history.jsonl (JSONL append)
Write evidence → $ARTIFACTS_DIR/test-optimizer-evidence.json
Emit events → testopt.parallel_done, testopt.sequential_done, testopt.fail_fast
# Main entry point — called by pipeline and loop
testopt_execute <project_root> <test_cmd> [options...]
Options:
--max-workers=N # int, override CPU-detected worker count (2-8)
--fast-fail # bool, stop on first failure (default)
--continue-on-fail # bool, run all tests despite failures
--mode=auto|parallel|sequential # execution mode (default: auto)
Returns: exit 0 (all pass) | exit 1 (any failure)
Errors: falls back to raw bash -c on init failure
# CPU detection — platform-aware core counting
testopt_detect_cores() -> stdout: integer (2-8)
Errors: returns 4 (safe default) on detection failure
# Shared-state classification — static analysis of test files
testopt_partition_shared_state(file...) -> stdout: "SHARED:<path>" | "INDEPENDENT:<path>"
Errors: non-existent files classified as INDEPENDENT
# Affected test selection — git-diff-driven test filtering
testopt_select_affected(changed_files...) -> sets AFFECTED_TESTS global array
Errors: empty changed_files → returns all discovered tests
# Priority ordering — fail-rate weighted sort
testopt_prioritize(tests...) -> stdout: sorted test file paths (one per line)
Errors: missing history → all tests score equally
# Parallel runner — background subshell execution
testopt_run_parallel(--max-workers=N, tests...) -> exit 0|1
Errors: writes FAIL lines to temp file, checked via grep
# Sequential runner — fast-fail execution
testopt_run_with_fast_fail([--continue-on-fail], tests...) -> exit 0|1
Errors: non-existent test files silently skipped| Component | Handles | Propagation | Fallback |
|---|---|---|---|
testopt_execute() |
Init failures, <3 tests | Returns raw bash -c exit code |
Transparent fallthrough to original behavior |
testopt_init() |
Missing project root, no git, no history | Logs warning, continues with empty arrays | All tests treated as affected, no prioritization |
testopt_run_parallel() |
Subshell crashes, missing test files | Writes FAIL to results temp file |
grep -q ' FAIL ' on results file catches all |
testopt_run_with_fast_fail() |
Test exit code != 0 | Breaks loop (fast-fail) or continues (flag) | Returns 1 with failed test name on stdout |
testopt_record_history() |
Write failures, missing directory | Suppressed via 2>/dev/null || true
|
Missing history = no prioritization (graceful) |
testopt_partition_shared_state() |
Non-existent files | Classified as INDEPENDENT | Conservative — false positives go sequential |
-
Pros: Vitest already supports
--pool threads/forks, parallel file execution, and--reporterfor structured output. Would work natively withnpm test. -
Cons: This project's test suite is 102+ bash test scripts, not Vitest test files. The
npm testcommand dispatches to these bash scripts. Vitest parallelism would only help if tests were.ts/.jsfiles. Doesn't address affected-first or shared-state detection for bash scripts. - Why rejected: Wrong layer. The optimization target is bash-level test file execution, not Node-level test runner internals.
-
Pros: Battle-tested parallel execution. Simple:
find *-test.sh | parallel -j$(nproc) bash {}. -
Cons: No shared-state detection — would run all tests in parallel including those that fight over ports/files. No affected-first prioritization. No fast-fail cascade between parallel and sequential phases. Adds a dependency (
parallelnot installed by default on macOS). - Why rejected: Lacks the intelligence layer (prioritization, partitioning, history). Would need the same detection code wrapped around it anyway.
-
Pros: Explicit dependency declaration between test files.
make -jhandles parallelism natively. - Cons: Requires maintaining a Makefile with test dependencies — high maintenance burden. Every new test file needs a rule. No automatic shared-state detection. Foreign to the existing bash-centric architecture.
- Why rejected: Maintenance cost too high for 102+ test files. Static declaration gets stale.
- Pros: Perfect isolation — no shared-state concerns. Every test gets a clean filesystem.
- Cons: Container startup overhead (~2-5s per test) would negate parallelism gains. 102 containers would require significant memory. Not available in all CI environments. Massive complexity increase.
- Why rejected: Overhead exceeds the time savings from parallelism.
| File | Purpose |
|---|---|
scripts/lib/test-optimizer.sh (741 lines) |
Core library: detection, partitioning, prioritization, execution, history, reporting |
scripts/sw-test-optimizer-integration-test.sh (438 lines) |
25 integration tests covering all new functions |
| File | Change |
|---|---|
scripts/lib/pipeline-stages-build.sh:569-584 |
stage_test() reads optimization from pipeline config, calls testopt_execute() when not off
|
scripts/sw-loop.sh:997-1000 |
run_test_gate() checks SW_TEST_OPTIMIZER, calls testopt_execute() when not false
|
templates/pipelines/*.json (9 files) |
Added "optimization": "auto", "fast_fail": true to test stage config |
- None new. Uses only:
bash,jq,grep,find,sort,awk,mktemp,sysctl/nproc,git
-
Shared-state false negatives — Pattern 6 (config sourcing) is broad but not exhaustive. A test could share state via an unconventional mechanism (e.g., writing to a well-known path without matching any of the 6 patterns). Mitigation:
--mode=sequentialoverride andSW_TEST_OPTIMIZER=falsekill switch. -
Concurrent JSONL writes — Multiple parallel pipelines (worktrees) appending to
~/.shipwright/optimization/test-history.jsonlsimultaneously. Small single-line appends are atomic on Linux/macOS for typical filesystem block sizes, but not guaranteed. Mitigation: JSONL format is self-healing — corrupt lines are skipped on read. -
History file unbounded growth —
test-history.jsonlgrows indefinitely. For 102 tests * 10 runs/day * 365 days = ~372K lines. Mitigation: not yet implemented. Future work: add rotation or tail-N windowing.
- All 9 pipeline templates contain
"optimization": "auto"and"fast_fail": truein test stage config -
stage_test()callstestopt_execute()whenoptimization \!= "off"— atpipeline-stages-build.sh:576-581 -
run_test_gate()callstestopt_execute()whenSW_TEST_OPTIMIZER \!= "false"— atsw-loop.sh:997-1000 - Backwards compatible:
SW_TEST_OPTIMIZER=falsebypasses optimizer in loop - Backwards compatible:
optimization: "off"bypasses optimizer in pipeline - Fallback on <3 test files — at
test-optimizer.sh:621 - Fallback on init failure — at
test-optimizer.sh:614-618 - CPU detection clamped to [2, 8] — at
test-optimizer.sh:517-518 - 6 shared-state patterns implemented — at
test-optimizer.sh:537-567 - Parallel runner uses temp-file + grep for result propagation — at
test-optimizer.sh:382-413 - Evidence JSON written to
$ARTIFACTS_DIR/test-optimizer-evidence.json - Events emitted:
testopt.parallel_done,testopt.sequential_done,testopt.fail_fast,testopt.recorded - All 25 integration tests pass
- All 20 existing unit tests pass
- Pipeline template config discoverable via
shipwright templates list
| Layer | Count | Coverage Target | What's Tested |
|---|---|---|---|
| Unit | 20 (sw-test-optimizer-test.sh) |
Core library functions | Discovery, history load/query, affected selection, prioritization sort, fast-fail, parallel execution |
| Integration | 25 (sw-test-optimizer-integration-test.sh) |
New functions + wiring |
testopt_detect_cores, testopt_partition_shared_state, testopt_execute orchestrator, stage_test() wiring, run_test_gate() wiring |
| E2E | 0 (covered by existing npm test) |
Full pipeline flow | Optimizer activates during normal npm test runs — 102+ suites exercise the real path |
Coverage targets: 100% of public functions have direct tests. Error paths (missing root, <3 tests, empty history) explicitly tested. Edge cases (all-shared, all-independent, single file) covered.
Happy path: 5 independent + 2 shared-state tests → parallel phase runs first with N workers → sequential phase runs second → all pass → exit 0, evidence JSON written.
Error cases:
- Failing test in parallel bucket + fast-fail → sequential bucket skipped entirely → exit 1 with
testopt.fail_fastevent emitted - Failing test with
--continue-on-fail→ all tests run regardless → exit 1 with full results
Edge cases:
- Zero test files discovered → falls through to
bash -c "$test_cmd"(no optimizer overhead) - All tests classified as SHARED → parallel bucket empty, sequential bucket gets everything → behaves like original sequential execution
| Metric | Current Value | Source |
|---|---|---|
| Full test suite wall-clock | ~1365s |
memory/metrics.json (2026-04-04 baseline) |
| Execution mode | Sequential only | Single bash -c "$test_cmd"
|
| Time to first failure | Up to ~1365s (worst case) | No early termination |
| Metric | Target | Rationale |
|---|---|---|
| Full suite wall-clock | <900s (34% reduction) | Parallelism across ~75% of tests classified as independent |
| Time to first failure | <200s (70%+ reduction) | Affected-first prioritization + fast-fail stops early |
| Test correctness | Zero regressions | Shared-state partitioning prevents parallel flakes |
-
Wall-clock per phase:
testopt.parallel_doneandtestopt.sequential_doneevents capture duration per phase -
Evidence JSON:
test-optimizer-evidence.jsonrecords total/parallel/sequential counts, workers, mode, exit code per run -
Historical trending:
test-history.jsonlaccumulates per-test duration and pass/fail data across runs -
Dashboard integration:
/api/metrics/stage-performancesurfaces test stage duration trends from event data
| Step | Method | Success Criteria |
|---|---|---|
| Before |
time npm test on main branch |
Record wall-clock baseline (~1365s) |
| After |
time npm test on feature branch |
Wall-clock < 900s |
| Verify | Read test-optimizer-evidence.json
|
parallel_tests > 0, workers > 1, exit_code: 0
|
| Regression check | Compare test pass counts | Same number of PASS/FAIL as main branch |