Skip to content

Pipeline Design 232

Seth Ford edited this page Mar 9, 2026 · 2 revisions

Design: Test Infrastructure Pre-Flight Validation Gate

Context

Shipwright pipelines currently detect a test command during intake (detect_test_cmd() in scripts/lib/pipeline-detection.sh) but perform zero validation of the underlying test infrastructure. When test files are missing, non-executable, lack required harness boilerplate (PASS=0, FAIL=0, set -euo pipefail), or have syntax errors, the pipeline proceeds through plan and design stages (which each invoke Claude API calls costing time and tokens) before the build loop eventually fails. Historical failure data from failures.json documents 6 recurring patterns — mock artifact generation failures, JSON validation issues, signal handling problems, and resource exhaustion — all of which could be caught earlier.

Constraints:

  • All code must be Bash 3.2 compatible (no declare -A, readarray, ${var,,})
  • Must follow the existing scripts/lib/pipeline-*.sh module pattern (module guard, sourced from sw-pipeline.sh)
  • Shellcheck is not installed in all environments
  • Non-bash projects (TypeScript, Go, Python) should not be penalized by bash-specific harness checks
  • The check must be fast — it runs at intake, before any Claude API calls

Decision

Add a new library module scripts/lib/pipeline-test-validation.sh exposing a single public function validate_test_infrastructure(). This function runs 6 layered checks ordered cheapest-first, writes a structured JSON report to .claude/pipeline-artifacts/test-validation.json, and returns non-zero on critical failure.

Data Flow

stage_intake()
  └─ detect_test_cmd()          → sets $TEST_CMD
  └─ validate_test_infrastructure()
       ├─ _tv_check_test_cmd()           → verify $TEST_CMD is non-empty
       ├─ _tv_find_test_files()          → glob for *-test.sh, *.test.*, *_test.*, test_*.*
       ├─ _tv_check_executable()         → verify .sh files have +x (bash only)
       ├─ _tv_check_harness_patterns()   → grep PASS=0, FAIL=0, set -euo pipefail (bash only)
       ├─ _tv_check_shellcheck()         → run shellcheck if available, warn if not
       └─ _tv_check_mock_patterns()      → verify mktemp -d usage in tests creating mocks
       └─ _tv_write_report()             → JSON report via jq

Error Handling

  • Critical failures (checks 1-4): cause validate_test_infrastructure() to return 1. Intake catches this, logs an actionable error with fix instructions, and suggests --skip-test-validation as escape hatch.
  • Warnings (checks 5-6): logged but do not fail validation. Shellcheck absence is a warning. Mock pattern absence is a warning for files that create temp directories.
  • The report always gets written (even on failure) so downstream tooling can parse it.

Skip Mechanism

--skip-test-validation sets SKIP_TEST_VALIDATION=true. When set, validate_test_infrastructure() writes a report with "skipped": true and returns 0 immediately.

Integration Pattern

The call in pipeline-stages-intake.sh uses type guard (consistent with skill_analyze_issue on line 101):

if type validate_test_infrastructure >/dev/null 2>&1; then
    if ! validate_test_infrastructure; then
        error "Test infrastructure validation failed (use --skip-test-validation to bypass)"
        return 1
    fi
fi

This ensures the pipeline degrades gracefully if the module isn't present (e.g., older Shipwright installations mid-upgrade).

Module Structure

Internal functions use _tv_ prefix (test validation) to avoid namespace collisions — consistent with _tv_ being unused in the codebase. Module guard follows _PIPELINE_TEST_VALIDATION_LOADED pattern identical to _PIPELINE_STAGES_INTAKE_LOADED, _PIPELINE_DETECTION_LOADED, etc.

Alternatives Considered

  1. Inline in stage_intake() — Pros: No new files, simple. / Cons: Bloats the 119-line intake function with ~150 lines of validation logic, makes unit testing impossible without running the full intake flow, violates the module separation pattern every other pipeline lib follows.

  2. Standalone CLI command (sw-test-validation.sh) — Pros: Runnable independently via shipwright test-validation, useful outside pipeline context. / Cons: Requires CLI routing in scripts/sw (620 lines already), over-engineers what is fundamentally an internal pre-flight check. The validation depends on pipeline globals ($PROJECT_ROOT, $TEST_CMD, $ARTIFACTS_DIR) which would need parameter passing or env setup.

  3. Run a fast test execution instead of static checks — Pros: Catches runtime issues static analysis can't. / Cons: Slow (test suites take 1066s per metrics.json), defeats the purpose of a fast pre-flight gate, may have side effects (file creation, network calls).

Implementation Plan

  • Files to create:

    • scripts/lib/pipeline-test-validation.sh — ~120 lines, 7 functions (1 public + 6 internal checks + 1 report writer)
    • scripts/sw-lib-pipeline-test-validation-test.sh — ~250 lines, 18 tests
  • Files to modify:

    • scripts/sw-pipeline.sh — Add SKIP_TEST_VALIDATION=false default (after line 178, near other boolean defaults), add source statement (after line 58, with other pipeline lib sources)
    • scripts/lib/pipeline-cli.sh — Add --skip-test-validation) SKIP_TEST_VALIDATION=true; shift ;; in parse_args() (after line 159), add help text in show_help() (after line 61)
    • scripts/lib/pipeline-stages-intake.sh — Add validation call after test detection block (after line 65, between steps 3 and 4)
    • package.json — Register sw-lib-pipeline-test-validation-test.sh in test scripts
  • Dependencies: None new. Uses jq (already required), shellcheck (optional), standard find/grep.

  • Risk areas:

    • False positives on non-Shipwright projects: Harness pattern checks (PASS=0, FAIL=0) are Shipwright-specific. Must gate these behind .sh file type detection — non-bash test files skip harness checks entirely.
    • Performance on large repos: find for test files could be slow in monorepos. Mitigate by limiting search depth (-maxdepth 3) and using fast glob patterns.
    • Shellcheck exit codes: Shellcheck returns different exit codes for different severity levels. Must map correctly (exit 1 = errors vs exit 0 = clean vs not-found).
    • jq availability: Report writing depends on jq. If jq is missing, the check should still work but skip report generation (log a warning). However, jq is a hard prerequisite for Shipwright already (sw-doctor.sh verifies this).

Validation Criteria

  • validate_test_infrastructure() returns 1 when $TEST_CMD is empty
  • validate_test_infrastructure() returns 1 when no test files match expected patterns
  • validate_test_infrastructure() returns 1 when .sh test files lack +x permission
  • validate_test_infrastructure() returns 1 when .sh test files lack PASS=0/FAIL=0 counters
  • Shellcheck integration runs when binary is available, skips with warning when absent
  • Mock pattern check warns (but does not fail) when mktemp -d is absent
  • --skip-test-validation produces {"skipped": true} report and returns 0
  • Validation report is valid JSON at .claude/pipeline-artifacts/test-validation.json
  • Non-bash test files (.test.ts, .test.py) bypass harness pattern checks
  • Intake stage fails with actionable error message referencing --skip-test-validation
  • Test suite has 18+ tests: 5 positive, 7 negative, 4 edge cases, 2 integration
  • All new code passes shellcheck --shell=bash with no errors
  • All new code is Bash 3.2 compatible (no associative arrays, readarray, ${var,,})

Test Pyramid Breakdown

  • Unit tests (12-13): Each of the 6 _tv_* check functions tested with both pass and fail inputs. Test _tv_write_report() output format. Test skip mechanism. ~70% of test count.
  • Integration tests (3-4): Full validate_test_infrastructure() flow with valid project, invalid project, mixed valid/invalid files, non-bash project. ~20% of test count.
  • E2E test (1-2): Verify stage_intake() actually calls validation and fails appropriately. Verify --skip-test-validation flag reaches the function. ~10% of test count.

Total: ~18 tests

Coverage Targets

  • 100% branch coverage on validate_test_infrastructure() main flow (skip path, success path, failure path)
  • All 6 check functions: at least 1 pass and 1 fail scenario each
  • Report JSON schema validated in at least 2 tests (success report, failure report)
  • CLI flag propagation tested end-to-end

Critical Paths to Test

Happy path: Mock project directory with valid .sh test files containing all harness patterns, executable permissions → all 6 checks pass → report shows "valid": true → function returns 0.

Error cases:

  1. $TEST_CMD is empty after detect_test_cmd()test_cmd_detected check fails → returns 1 with "no test command detected"
  2. Test file exists but lacks +x permission → executable_permissions check fails → returns 1 with "chmod +x" suggestion
  3. Test file missing PASS=0 counter → harness_patterns check fails → returns 1 with pattern example

Edge cases:

  1. shellcheck binary not in $PATH → check skipped with warning, overall validation still passes if other checks pass
  2. Project has only .test.ts files, no .sh tests → harness/executable/mock checks skipped entirely, only test_cmd and test_files_exist run
  3. Empty project directory with no test files at all → test_files_exist fails → returns 1
  4. --skip-test-validation flag → writes {"skipped": true} report, returns 0 immediately without running any checks

Clone this wiki locally