Skip to content

Pipeline Design 200

Seth Ford edited this page Apr 4, 2026 · 2 revisions

ADR written to .claude/pipeline-artifacts/design.md.

Key design decisions:

  1. Facade Orchestrator — Single new testopt_execute() function as the sole integration point, called by both stage_test() and run_test_gate(). Falls back to raw bash -c when the optimizer can't help (non-shell tests, tiny suites).

  2. Shared-state partitioning — 6 grep patterns classify tests as SHARED (sequential) or INDEPENDENT (parallel). False negatives self-heal via the history system.

  3. Fork exhaustion prevention — 75% of cores, hard capped at [2, 8], using the existing jobs -r throttle.

  4. Full backwards compatibilitySW_TEST_OPTIMIZER=false bypasses everything. Non-shell test commands pass through untouched.

  5. Three new functions in test-optimizer.sh: testopt_detect_cores(), testopt_partition_shared_state(), testopt_execute(). Five files modified total, one new test file created.

Expected gains: ~2.5x speedup via parallelism, 50% reduction in failure-case duration via fast-fail, 80-90% reduction for incremental changes via affected-test selection. -test/full-test alternation (FAST_TEST_INTERVAL) must be preserved

  • Parallel test execution must avoid fork-bombing (historical failure pattern)

Decision

Architecture: Facade Orchestrator Pattern

Add a single new function testopt_execute() to test-optimizer.sh that serves as the sole integration point. Both stage_test() and run_test_gate() call this one function instead of raw bash -c. The orchestrator internally sequences: init → discover → select-affected → prioritize → partition (shared-state vs independent) → execute (parallel for independent, sequential for shared-state) → record history → report.

Component Diagram

┌─────────────────────────────────────────────────────────────────┐
│                        Callers                                   │
│  ┌──────────────────────┐    ┌────────────────────────────────┐ │
│  │ stage_test()          │    │ run_test_gate()                │ │
│  │ pipeline-stages-      │    │ sw-loop.sh:961                 │ │
│  │ build.sh:546          │    │                                │ │
│  └──────────┬───────────┘    └──────────────┬─────────────────┘ │
│             │                                │                   │
│             ▼                                ▼                   │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │              testopt_execute()  [NEW]                     │   │
│  │  Entry point: project_root, test_cmd, options             │   │
│  └──────────┬──────────┬──────────┬──────────┬──────────────┘   │
│             │          │          │          │                   │
│             ▼          ▼          ▼          ▼                   │
│  ┌─────────┐ ┌────────┐ ┌───────┐ ┌────────────────────────┐   │
│  │ detect  │ │ select │ │ prio- │ │ partition & execute     │   │
│  │ cores   │ │ affect-│ │ ritize│ │ ┌──────┐ ┌───────────┐ │   │
│  │ [NEW]   │ │ ed     │ │       │ │ │parall│ │sequential │ │   │
│  └─────────┘ └────────┘ └───────┘ │ │ (ind)│ │(shared st)│ │   │
│                                    │ └──────┘ └───────────┘ │   │
│                                    └────────────────────────┘   │
│                                              │                   │
│                                              ▼                   │
│                               ┌──────────────────────┐          │
│                               │ record_history +     │          │
│                               │ report + evidence    │          │
│                               └──────────────────────┘          │
└─────────────────────────────────────────────────────────────────┘

Interface Contracts

// New functions added to test-optimizer.sh

/**
 * Detect available CPU cores, return 75% (capped 2-8)
 * @returns number via stdout
 * Errors: returns 4 (safe default) on detection failure
 */
function testopt_detect_cores(): number  // stdout: integer

/**
 * Scan test files for shared-state indicators
 * @param test_files - space-separated list of test file paths
 * @returns lines of "SHARED:<file>" or "INDEPENDENT:<file>" to stdout
 * Errors: treats scan failure as INDEPENDENT (safe default)
 */
function testopt_partition_shared_state(test_files: string[]): string[]  // stdout

/**
 * Single entry point for optimized test execution
 * @param project_root - path to project root
 * @param test_cmd - fallback command (used if no shell test files found)
 * @param --max-workers=N - override core detection (optional)
 * @param --fast-fail - stop on first failure (default: true)
 * @param --continue-on-fail - run all tests even on failure
 * @param --mode=parallel|sequential|auto - execution mode (default: auto)
 * @returns exit code 0 on all pass, 1 on any failure
 * Side effects: writes test evidence JSON, updates history JSONL
 * Errors: falls back to raw bash -c "$test_cmd" on any internal error
 */
function testopt_execute(project_root: string, test_cmd: string, ...opts: string[]): number

Data Flow

  1. Caller invokes testopt_execute "." "$test_cmd" --fast-fail
  2. Core detection: sysctl -n hw.ncpu (macOS) / nproc (Linux) → multiply by 0.75 → clamp to [2, 8]
  3. Init: testopt_init discovers test files, loads history, identifies changed files via git diff
  4. Decision gate: If discovered test count < 3 OR test files are not shell scripts (.sh), fall through to raw bash -c "$test_cmd" — the optimizer adds no value for small suites or non-shell runners (vitest, jest, etc.)
  5. Prioritize: Score by (fail_rate * 100) * 100 - duration, highest-risk tests first
  6. Partition: Scan each test file for shared-state patterns (hardcoded /tmp/ paths without $$, port bind/listen, sqlite3 file locks, global PID files, filesystem singletons). Files matching any pattern → sequential bucket; rest → parallel bucket
  7. Execute parallel bucket via existing testopt_run_parallel --max-workers=$cores
  8. Execute sequential bucket via existing testopt_run_with_fast_fail
  9. If either bucket fails and --fast-fail is set, skip remaining sequential tests
  10. Record results to ~/.shipwright/optimization/test-history.jsonl
  11. Write evidence to $ARTIFACTS_DIR/test-optimizer-evidence.json or $LOG_DIR/test-evidence-iter-${ITERATION}.json
  12. Return aggregate exit code

Error Boundaries

Component Error Handling Propagation
testopt_detect_cores Returns default 4 on any failure Silent — caller gets safe default
testopt_partition_shared_state Treats unreadable files as INDEPENDENT Silent — worst case runs flaky test in parallel (test itself should handle)
testopt_execute (init failure) Falls back to raw bash -c "$test_cmd" Logs warning, runs unoptimized
testopt_execute (parallel failure) Collects exit codes from wait Propagates as exit code 1
History write failure `
stage_test integration Catches testopt_execute failure, proceeds to dark factory features Failure propagates normally after holdout/mutation/causal analysis
run_test_gate integration Sets TEST_PASSED=false on failure Same as current behavior

Shared-State Detection Patterns (6 patterns)

# 1. Hardcoded /tmp paths without $$ or mktemp (collision risk)
grep -l '/tmp/[a-zA-Z]' "$file" | grep -v '\$\$\|mktemp'

# 2. Port binding (tests fighting for same port)
grep -l 'bind\|listen\|EADDRINUSE\|:808[0-9]' "$file"

# 3. SQLite file locks (non-WAL concurrent access)
grep -l 'sqlite3.*\.db\|\.sqlite' "$file"

# 4. PID files or lock files
grep -l '\.pid\|\.lock\|flock' "$file"

# 5. Singleton temp directories (shared across tests)
grep -l 'TMPDIR=\|TMP_DIR=' "$file" | grep -v 'mktemp -d'

# 6. Global state mutation (sourcing shared config that sets globals)
grep -l 'source.*config\|\..*config\.sh' "$file"

Configuration

Pipeline config (templates/pipelines/*.json):

{
  "stages": [{
    "id": "test",
    "config": {
      "optimization": "auto",
      "max_workers": 0,
      "fast_fail": true
    }
  }]
}

Environment variables (for loop/ad-hoc):

  • SW_TEST_OPTIMIZER=true|false|auto — master switch (default: auto, meaning enabled when test-optimizer.sh is available)
  • SW_TEST_MAX_WORKERS=N — override core detection

Pipeline template discoverability: The optimization field will be present in all templates' test stage config. shipwright templates list (which reads templates/pipelines/*.json) will surface this naturally.

Alternatives Considered

1. Replace test-optimizer.sh with vitest's native parallel mode

  • Pros: vitest already handles parallelism, worker isolation, and affected-test selection (--changed). Zero new code.
  • Cons: Only works for JS/TS projects using vitest. Shipwright is polyglot — it orchestrates bash test suites across any language. The 121+ bash test suites cannot use vitest parallelism. Does not solve the pipeline integration problem.

2. GNU parallel as the execution engine

  • Pros: Battle-tested parallelism, job control, retry, structured output.
  • Cons: Not installed by default on macOS. Adds an external dependency. The current jobs -r approach in testopt_run_parallel works for the scale we need (2-8 workers, dozens of test files). Over-engineering for the problem size.

3. Full rewrite of run_test_gate/stage_test to always use optimizer

  • Pros: Simpler code path — one way to run tests everywhere.
  • Cons: Breaks backwards compatibility. Many users pass custom --test-cmd strings that are not shell scripts (e.g., npm test, cargo test). The optimizer's file-level discovery doesn't apply to these. A facade that falls through to raw execution is safer.

4. Move test optimization into the pipeline composer (intelligence layer)

  • Pros: Centralized configuration, could use ML-based predictions.
  • Cons: The composer generates static pipeline JSON at spawn time. Test optimization needs runtime decisions (which files changed THIS iteration, current CPU load). Wrong abstraction layer.

Implementation Plan

Files to modify

  • scripts/lib/test-optimizer.sh — Add testopt_detect_cores(), testopt_partition_shared_state(), testopt_execute()
  • scripts/lib/pipeline-stages-build.sh — Wire stage_test() to call testopt_execute when optimization is enabled
  • scripts/sw-loop.sh — Wire run_test_gate() to call testopt_execute for shell test suites when SW_TEST_OPTIMIZER is set
  • templates/pipelines/*.json — Add optimization config field to test stage in all 8 templates

Files to create

  • scripts/sw-test-optimizer-integration-test.sh — Integration tests for the new testopt_execute orchestrator and wiring into pipeline/loop

Dependencies

  • None new. Uses only sysctl/nproc (already available on target platforms), grep, jq, bash builtins.

Risk areas

  1. Fork exhaustion (HIGH): The historical failures.json documents fork failures under parallel load. Mitigation: hard cap at 8 workers, default 75% of cores (capped at 2 minimum). The while [[ $(jobs -r | wc -l) -ge "$max_workers" ]] throttle in testopt_run_parallel already prevents unbounded forking.
  2. Shared-state false negatives (MEDIUM): The 6 grep patterns may miss custom shared-state patterns (e.g., a test that uses a named pipe or System V semaphore). Mitigation: partition defaults to INDEPENDENT, so worst case a flaky test runs in parallel and fails — the fast-fail catches it, and the next run records the failure in history, increasing its priority score for sequential execution next time. Self-healing via the history system.
  3. Test command fallthrough (LOW): Non-shell test commands (e.g., npm test) must bypass the optimizer cleanly. Mitigation: the decision gate checks if discovered tests are .sh files and if count >= 3 before engaging optimization. Otherwise falls through to raw execution.
  4. Race condition in parallel result file (LOW): Multiple background jobs write to $tmp_results concurrently. Mitigation: the existing >> append is atomic for lines < PIPE_BUF (4096 bytes on all target platforms). Each line is well under that limit.

Validation Criteria

  • testopt_detect_cores returns integer in [2, 8] on macOS and Linux
  • testopt_partition_shared_state correctly classifies test files with hardcoded /tmp/foo as SHARED and files using mktemp as INDEPENDENT
  • testopt_execute with 10+ mock test files runs parallel bucket before sequential bucket, respects --max-workers, and returns correct aggregate exit code
  • testopt_execute falls back to raw bash -c "$test_cmd" when fewer than 3 shell test files are discovered
  • stage_test() uses optimizer when pipeline config has "optimization": "auto" and test-optimizer.sh is loaded
  • stage_test() bypasses optimizer when config says "optimization": "off"
  • run_test_gate() uses optimizer when SW_TEST_OPTIMIZER=true and test command invokes shell scripts
  • run_test_gate() preserves fast-test/full-test alternation (FAST_TEST_INTERVAL) — optimizer only applies to the selected command
  • Dark factory features (holdout, mutation, causal) still execute after optimized test run in stage_test()
  • SW_TEST_OPTIMIZER=false completely bypasses all optimization in both code paths
  • All 8 pipeline templates include "optimization": "auto" in test stage config
  • shipwright templates list surfaces the optimization field (existing behavior — templates list reads JSON)
  • History JSONL accumulates across runs and influences prioritization order in subsequent executions
  • No fork exhaustion under parallel execution with max_workers=8 on a standard CI machine

Testing Strategy

Test Pyramid Breakdown

Unit tests (8): Core detection, shared-state partitioning (6 patterns individually), worker clamping, history integration Integration tests (4): testopt_execute end-to-end with mock test files, pipeline stage_test wiring, loop run_test_gate wiring, fallback-to-raw behavior E2E tests (1): Full optimizer flow with real git repo, changed files, affected selection, parallel execution, history recording

Total: 13 new tests in scripts/sw-test-optimizer-integration-test.sh, complementing the existing 8 tests in scripts/sw-test-optimizer-test.sh.

Coverage Targets

  • testopt_detect_cores: 100% branch coverage (macOS path, Linux path, fallback path)
  • testopt_partition_shared_state: All 6 patterns tested individually + combined
  • testopt_execute: Happy path, fallback path, fast-fail path, continue-on-fail path
  • Pipeline/loop wiring: enabled path, disabled path, missing-module path

Critical Paths to Test

Happy path: 10 mock test files → detect 4 cores → partition 7 independent + 3 shared → parallel bucket runs 4-at-a-time → sequential bucket runs with fast-fail → all pass → exit 0

Error case 1: Parallel test fails → fast-fail stops sequential bucket → exit 1 → history records failure

Error case 2: testopt_init fails (no git repo) → falls back to raw bash -c "$test_cmd" → logs warning → exit code from raw command

Edge case 1: Zero test files discovered → immediate fallback to raw command

Edge case 2: All tests classified as SHARED → no parallel execution → purely sequential with fast-fail


Performance

Baseline Metrics

  • Full suite: ~1575s (26 min) across 121+ bash test files (from metrics.json, 2026-03-09)
  • Current execution: strictly sequential, single process
  • No parallelism, no affected-test selection, no prioritization in pipeline/loop

Optimization Targets

  • p50 test stage duration: Reduce from ~1575s to ~600s (2.5x speedup via 4-worker parallelism on typical 4-core CI)
  • Fast-fail savings: When a test fails early in a 121-test suite, skip remaining tests — expected 50% reduction in failure-case duration
  • Affected-test selection: On incremental changes touching 1-3 files, run only ~10-20 affected tests instead of 121+ — expected 80-90% reduction

Profiling Strategy

  • Instrument testopt_execute with date +%s timestamps at each phase (detect, init, partition, parallel, sequential, report)
  • Write phase timings to evidence JSON for post-mortem analysis
  • Compare test_duration_s in pipeline events before/after optimization is enabled
  • The DORA metrics system (sw-dora.sh) already tracks test_duration_s — optimization gains will be visible in dashboards

Benchmark Plan

  • Before: Run sw-test-optimizer-test.sh + sw-pipeline-test.sh with SW_TEST_OPTIMIZER=false — record total duration
  • After: Same suites with SW_TEST_OPTIMIZER=true — record total duration
  • Success criteria: Parallel execution reduces wall-clock time by at least 1.5x on a 4+ core machine; fast-fail reduces failure-case time by at least 30%
  • Data volume: Test with realistic suite (121+ files), not just unit test fixtures

Clone this wiki locally