Pipeline Design 200

ADR written to .claude/pipeline-artifacts/design.md (291 lines). Key architectural decisions documented:

Static grep-based shared-state detection (6 patterns) over runtime isolation or manual annotation â€” conservative, portable, zero dependencies
Temp file + grep for parallel result propagation â€” solves the bash subshell variable loss problem without IPC
CPU-aware worker cap [2, 8] â€” scales to the machine, doesn't overwhelm constrained systems
Facade with raw bash -c fallback â€” optimizer is purely additive, never worse than baseline

Four alternatives rejected with rationale: Vitest native (wrong layer), GNU parallel (no intelligence), Makefile deps (maintenance cost), Docker isolation (overhead exceeds gains). ” disable via SW_TEST_OPTIMIZER=false or optimization: off in pipeline config

Must not break existing test correctness â€” parallel tests that share state must be detected and run sequentially
History/prioritization must be append-only and self-healing (corrupt JSONL lines skipped on read)

Decision

Introduce a test execution optimizer (scripts/lib/test-optimizer.sh) as a library sourced by both pipeline stages and the loop harness. The optimizer implements a four-phase execution model:

Component Diagram

                    â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”�
                    â”‚   Entry Points            â”‚
                    â”‚  stage_test()             â”‚
                    â”‚  run_test_gate()          â”‚
                    â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜
                               â”‚
                    â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â–¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”�
                    â”‚  testopt_execute()        â”‚
                    â”‚  Facade orchestrator      â”‚
                    â”‚  - parses options         â”‚
                    â”‚  - gates on <3 tests      â”‚
                    â”‚  - falls back on init err â”‚
                    â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜
                               â”‚
              â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”�
              â”‚                â”‚                 â”‚
   â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â–¼â”€â”€â”€â”€â”€â”€â”� â”Œâ”€â”€â”€â”€â”€â”€â–¼â”€â”€â”€â”€â”€â”€â”� â”Œâ”€â”€â”€â”€â”€â”€â”€â–¼â”€â”€â”€â”€â”€â”€â”€â”€â”�
   â”‚ Phase 1: Init   â”‚ â”‚ Phase 2:    â”‚ â”‚ Phase 3:       â”‚
   â”‚ testopt_init()  â”‚ â”‚ Prioritize  â”‚ â”‚ Partition      â”‚
   â”‚ - discover      â”‚ â”‚ testopt_    â”‚ â”‚ testopt_       â”‚
   â”‚   *-test.sh     â”‚ â”‚ prioritize()â”‚ â”‚ partition_     â”‚
   â”‚ - load history  â”‚ â”‚ - fail_rate â”‚ â”‚ shared_state() â”‚
   â”‚ - git diff      â”‚ â”‚   DESC      â”‚ â”‚ - 6 patterns   â”‚
   â”‚ - select        â”‚ â”‚ - duration  â”‚ â”‚ â†’ parallel[]   â”‚
   â”‚   affected      â”‚ â”‚   ASC       â”‚ â”‚ â†’ sequential[] â”‚
   â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”˜ â””â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”˜ â””â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”˜
              â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜
                               â”‚
              â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”�
              â”‚                                â”‚
   â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â–¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”�      â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â–¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”�
   â”‚ Phase 4a: Parallel  â”‚      â”‚ Phase 4b: Sequential    â”‚
   â”‚ testopt_run_parallelâ”‚      â”‚ testopt_run_with_       â”‚
   â”‚ - N workers (2-8)   â”‚      â”‚ fast_fail               â”‚
   â”‚ - dir-grouped       â”‚      â”‚ - stop on first failure â”‚
   â”‚ - temp-file results â”‚      â”‚ - skip if 4a failed +   â”‚
   â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜      â”‚   fast_fail enabled     â”‚
              â”‚                 â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜
              â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜
                               â”‚
                    â”Œâ”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â–¼â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”�
                    â”‚  Output                   â”‚
                    â”‚  - history â†’ JSONL append  â”‚
                    â”‚  - evidence â†’ JSON file    â”‚
                    â”‚  - events â†’ emit_event()   â”‚
                    â”‚  - report â†’ stdout         â”‚
                    â””â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”˜

Key Design Decisions

1. Shared-state detection via static grep analysis (not runtime isolation)

Context: Need to determine which tests can run in parallel without race conditions.
Decision: Grep each test file for 6 patterns indicating shared state: hardcoded /tmp paths, port binding, SQLite files, PID/lock files, singleton TMPDIR assignments, and global config sourcing.
Alternatives rejected: (a) Runtime sandboxing with namespaces/cgroups â€” too heavy, not portable to macOS. (b) Manual annotation â€” requires maintainer discipline, gets stale.
Consequences: False positives send safe tests to the sequential bucket (slower but correct). False negatives would cause flaky parallel failures. The patterns are intentionally conservative â€” Pattern 6 (config sourcing) captures anything that sources a config file, which over-classifies but prevents subtle global-state mutation bugs.

2. Subshell variable propagation via temp file + grep (not direct variable)

Context: Background jobs in testopt_run_parallel() run in subshells. Variable assignments (all_passed=false) don't propagate back to the parent.
Decision: Each background job writes "<file> PASS|FAIL <duration>" lines to a shared temp file. Parent checks grep -q ' FAIL ' on the results file.
Consequences: Simple, correct, no IPC mechanisms. The temp file may see interleaved writes from concurrent jobs, but each line is a single echo which is atomic for small writes on both Linux and macOS.

3. CPU-aware worker detection with hard cap at 8

Context: Need to scale parallelism to the machine without overwhelming constrained systems.
Decision: Detect cores via sysctl -n hw.ncpu (Darwin) / /proc/cpuinfo (Linux) / nproc, use 75%, clamp to [2, 8].
Consequences: 8-core machine gets 6 workers. 2-core CI runner gets 2 workers. Memory-constrained systems with many cores still get capped at 8.

4. Facade pattern with fallback to raw bash -c

Context: The optimizer must never be worse than doing nothing.
Decision: testopt_execute() falls back to bash -c "$test_cmd" when: (a) init fails, (b) fewer than 3 test files discovered, or (c) the optimizer is disabled via config.
Consequences: Zero risk of regression for edge cases. The optimizer is purely additive.

Data Flow

1. stage_test() reads pipeline config: optimization \!= "off" â†’ calls testopt_execute()
   run_test_gate() reads SW_TEST_OPTIMIZER \!= "false" â†’ calls testopt_execute()

2. testopt_execute(".", "npm test", "--fast-fail")
   â””â”€â”€ testopt_init(".")
       â”œâ”€â”€ find *-test.sh *_test.sh test_*.sh â†’ DISCOVERED_TESTS[]
       â”œâ”€â”€ read ~/.shipwright/optimization/test-history.jsonl â†’ TEST_HISTORY[]
       â”œâ”€â”€ git diff HEAD~1..HEAD â†’ CHANGED_FILES[]
       â””â”€â”€ testopt_select_affected() â†’ AFFECTED_TESTS[] (directory + source matching)

3. testopt_prioritize(AFFECTED_TESTS)
   â””â”€â”€ for each test: score = (fail_rate * 10000) - duration_s
       â””â”€â”€ sort -rn â†’ highest-fail-rate, fastest-duration first

4. testopt_partition_shared_state(prioritized_tests)
   â””â”€â”€ grep each file for 6 patterns â†’ "SHARED:<path>" or "INDEPENDENT:<path>"
       â”œâ”€â”€ INDEPENDENT â†’ parallel_tests[]
       â””â”€â”€ SHARED â†’ sequential_tests[]

5. Phase 4a: testopt_run_parallel(parallel_tests, workers=detect_cores*0.75)
   â””â”€â”€ group tests by directory â†’ background subshells â†’ wait â†’ grep results file

6. Phase 4b: testopt_run_with_fast_fail(sequential_tests)
   â””â”€â”€ skipped entirely if Phase 4a failed AND fast_fail=true
   â””â”€â”€ otherwise runs one-by-one, breaks on first failure

7. Record results â†’ test-history.jsonl (JSONL append)
   Write evidence â†’ $ARTIFACTS_DIR/test-optimizer-evidence.json
   Emit events â†’ testopt.parallel_done, testopt.sequential_done, testopt.fail_fast

Interface Contracts

# Main entry point â€” called by pipeline and loop
testopt_execute <project_root> <test_cmd> [options...]
  Options:
    --max-workers=N         # int, override CPU-detected worker count (2-8)
    --fast-fail             # bool, stop on first failure (default)
    --continue-on-fail      # bool, run all tests despite failures
    --mode=auto|parallel|sequential  # execution mode (default: auto)
  Returns: exit 0 (all pass) | exit 1 (any failure)
  Errors: falls back to raw bash -c on init failure

# CPU detection â€” platform-aware core counting
testopt_detect_cores() -> stdout: integer (2-8)
  Errors: returns 4 (safe default) on detection failure

# Shared-state classification â€” static analysis of test files
testopt_partition_shared_state(file...) -> stdout: "SHARED:<path>" | "INDEPENDENT:<path>"
  Errors: non-existent files classified as INDEPENDENT

# Affected test selection â€” git-diff-driven test filtering
testopt_select_affected(changed_files...) -> sets AFFECTED_TESTS global array
  Errors: empty changed_files â†’ returns all discovered tests

# Priority ordering â€” fail-rate weighted sort
testopt_prioritize(tests...) -> stdout: sorted test file paths (one per line)
  Errors: missing history â†’ all tests score equally

# Parallel runner â€” background subshell execution
testopt_run_parallel(--max-workers=N, tests...) -> exit 0|1
  Errors: writes FAIL lines to temp file, checked via grep

# Sequential runner â€” fast-fail execution
testopt_run_with_fast_fail([--continue-on-fail], tests...) -> exit 0|1
  Errors: non-existent test files silently skipped

Error Boundaries

Component	Handles	Propagation	Fallback
`testopt_execute()`	Init failures, <3 tests	Returns raw `bash -c` exit code	Transparent fallthrough to original behavior
`testopt_init()`	Missing project root, no git, no history	Logs warning, continues with empty arrays	All tests treated as affected, no prioritization
`testopt_run_parallel()`	Subshell crashes, missing test files	Writes FAIL to results temp file	`grep -q ' FAIL '` on results file catches all
`testopt_run_with_fast_fail()`	Test exit code != 0	Breaks loop (fast-fail) or continues (flag)	Returns 1 with failed test name on stdout
`testopt_record_history()`	Write failures, missing directory	Suppressed via `2>/dev/null \|\| true`	Missing history = no prioritization (graceful)
`testopt_partition_shared_state()`	Non-existent files	Classified as INDEPENDENT	Conservative â€” false positives go sequential

Alternatives Considered

1. Vitest Native Parallelism (Node-Level)

Pros: Vitest already supports --pool threads/forks, parallel file execution, and --reporter for structured output. Would work natively with npm test.
Cons: This project's test suite is 102+ bash test scripts, not Vitest test files. The npm test command dispatches to these bash scripts. Vitest parallelism would only help if tests were .ts/.js files. Doesn't address affected-first or shared-state detection for bash scripts.
Why rejected: Wrong layer. The optimization target is bash-level test file execution, not Node-level test runner internals.

2. GNU Parallel / xargs -P

Pros: Battle-tested parallel execution. Simple: find *-test.sh | parallel -j$(nproc) bash {}.
Cons: No shared-state detection â€” would run all tests in parallel including those that fight over ports/files. No affected-first prioritization. No fast-fail cascade between parallel and sequential phases. Adds a dependency (parallel not installed by default on macOS).
Why rejected: Lacks the intelligence layer (prioritization, partitioning, history). Would need the same detection code wrapped around it anyway.

3. Makefile-Based Dependency Graph

Pros: Explicit dependency declaration between test files. make -j handles parallelism natively.
Cons: Requires maintaining a Makefile with test dependencies â€” high maintenance burden. Every new test file needs a rule. No automatic shared-state detection. Foreign to the existing bash-centric architecture.
Why rejected: Maintenance cost too high for 102+ test files. Static declaration gets stale.

4. Container-Based Isolation (Docker per test)

Pros: Perfect isolation â€” no shared-state concerns. Every test gets a clean filesystem.
Cons: Container startup overhead (~2-5s per test) would negate parallelism gains. 102 containers would require significant memory. Not available in all CI environments. Massive complexity increase.
Why rejected: Overhead exceeds the time savings from parallelism.

Implementation Plan

Files Created

File	Purpose
`scripts/lib/test-optimizer.sh` (741 lines)	Core library: detection, partitioning, prioritization, execution, history, reporting
`scripts/sw-test-optimizer-integration-test.sh` (438 lines)	25 integration tests covering all new functions

Files Modified

File	Change
`scripts/lib/pipeline-stages-build.sh:569-584`	`stage_test()` reads `optimization` from pipeline config, calls `testopt_execute()` when not `off`
`scripts/sw-loop.sh:997-1000`	`run_test_gate()` checks `SW_TEST_OPTIMIZER`, calls `testopt_execute()` when not `false`
`templates/pipelines/*.json` (9 files)	Added `"optimization": "auto", "fast_fail": true` to test stage config

Dependencies

None new. Uses only: bash, jq, grep, find, sort, awk, mktemp, sysctl/nproc, git

Risk Areas

Shared-state false negatives â€” Pattern 6 (config sourcing) is broad but not exhaustive. A test could share state via an unconventional mechanism (e.g., writing to a well-known path without matching any of the 6 patterns). Mitigation: --mode=sequential override and SW_TEST_OPTIMIZER=false kill switch.
Concurrent JSONL writes â€” Multiple parallel pipelines (worktrees) appending to ~/.shipwright/optimization/test-history.jsonl simultaneously. Small single-line appends are atomic on Linux/macOS for typical filesystem block sizes, but not guaranteed. Mitigation: JSONL format is self-healing â€” corrupt lines are skipped on read.
History file unbounded growth â€” test-history.jsonl grows indefinitely. For 102 tests * 10 runs/day * 365 days = ~372K lines. Mitigation: not yet implemented. Future work: add rotation or tail-N windowing.

Validation Criteria

Test Pyramid Breakdown

Layer	Count	Coverage Target	What's Tested
Unit	20 (`sw-test-optimizer-test.sh`)	Core library functions	Discovery, history load/query, affected selection, prioritization sort, fast-fail, parallel execution
Integration	25 (`sw-test-optimizer-integration-test.sh`)	New functions + wiring	`testopt_detect_cores`, `testopt_partition_shared_state`, `testopt_execute` orchestrator, `stage_test()` wiring, `run_test_gate()` wiring
E2E	0 (covered by existing `npm test`)	Full pipeline flow	Optimizer activates during normal `npm test` runs â€” 102+ suites exercise the real path

Coverage targets: 100% of public functions have direct tests. Error paths (missing root, <3 tests, empty history) explicitly tested. Edge cases (all-shared, all-independent, single file) covered.

Critical Paths Tested

Happy path: 5 independent + 2 shared-state tests â†’ parallel phase runs first with N workers â†’ sequential phase runs second â†’ all pass â†’ exit 0, evidence JSON written.

Error cases:

Failing test in parallel bucket + fast-fail â†’ sequential bucket skipped entirely â†’ exit 1 with testopt.fail_fast event emitted
Failing test with --continue-on-fail â†’ all tests run regardless â†’ exit 1 with full results

Edge cases:

Zero test files discovered â†’ falls through to bash -c "$test_cmd" (no optimizer overhead)
All tests classified as SHARED â†’ parallel bucket empty, sequential bucket gets everything â†’ behaves like original sequential execution

Baseline Metrics

Metric	Current Value	Source
Full test suite wall-clock	~1365s	`memory/metrics.json` (2026-04-04 baseline)
Execution mode	Sequential only	Single `bash -c "$test_cmd"`
Time to first failure	Up to ~1365s (worst case)	No early termination

Optimization Targets

Metric	Target	Rationale
Full suite wall-clock	<900s (34% reduction)	Parallelism across ~75% of tests classified as independent
Time to first failure	<200s (70%+ reduction)	Affected-first prioritization + fast-fail stops early
Test correctness	Zero regressions	Shared-state partitioning prevents parallel flakes

Profiling Strategy

Wall-clock per phase: testopt.parallel_done and testopt.sequential_done events capture duration per phase
Evidence JSON: test-optimizer-evidence.json records total/parallel/sequential counts, workers, mode, exit code per run
Historical trending: test-history.jsonl accumulates per-test duration and pass/fail data across runs
Dashboard integration: /api/metrics/stage-performance surfaces test stage duration trends from event data

Benchmark Plan

Step	Method	Success Criteria
Before	`time npm test` on main branch	Record wall-clock baseline (~1365s)
After	`time npm test` on feature branch	Wall-clock < 900s
Verify	Read `test-optimizer-evidence.json`	`parallel_tests > 0`, `workers > 1`, `exit_code: 0`
Regression check	Compare test pass counts	Same number of PASS/FAIL as main branch

Pipeline Design 200

Decision

Component Diagram

Key Design Decisions

Data Flow

Interface Contracts

Error Boundaries

Alternatives Considered

1. Vitest Native Parallelism (Node-Level)

2. GNU Parallel / xargs -P

3. Makefile-Based Dependency Graph

4. Container-Based Isolation (Docker per test)

Implementation Plan

Files Created

Files Modified

Dependencies

Risk Areas

Validation Criteria

Test Pyramid Breakdown

Critical Paths Tested

Baseline Metrics

Optimization Targets

Profiling Strategy

Benchmark Plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!