-
Notifications
You must be signed in to change notification settings - Fork 1
Pipeline Design 232
Shipwright pipelines currently detect a test command during intake (detect_test_cmd() in scripts/lib/pipeline-detection.sh) but perform zero validation of the underlying test infrastructure. When test files are missing, non-executable, lack required harness boilerplate (PASS=0, FAIL=0, set -euo pipefail), or have syntax errors, the pipeline proceeds through plan and design stages (which each invoke Claude API calls costing time and tokens) before the build loop eventually fails. Historical failure data from failures.json documents 6 recurring patterns — mock artifact generation failures, JSON validation issues, signal handling problems, and resource exhaustion — all of which could be caught earlier.
Constraints:
- All code must be Bash 3.2 compatible (no
declare -A,readarray,${var,,}) - Must follow the existing
scripts/lib/pipeline-*.shmodule pattern (module guard, sourced fromsw-pipeline.sh) - Shellcheck is not installed in all environments
- Non-bash projects (TypeScript, Go, Python) should not be penalized by bash-specific harness checks
- The check must be fast — it runs at intake, before any Claude API calls
Add a new library module scripts/lib/pipeline-test-validation.sh exposing a single public function validate_test_infrastructure(). This function runs 6 layered checks ordered cheapest-first, writes a structured JSON report to .claude/pipeline-artifacts/test-validation.json, and returns non-zero on critical failure.
stage_intake()
└─ detect_test_cmd() → sets $TEST_CMD
└─ validate_test_infrastructure()
├─ _tv_check_test_cmd() → verify $TEST_CMD is non-empty
├─ _tv_find_test_files() → glob for *-test.sh, *.test.*, *_test.*, test_*.*
├─ _tv_check_executable() → verify .sh files have +x (bash only)
├─ _tv_check_harness_patterns() → grep PASS=0, FAIL=0, set -euo pipefail (bash only)
├─ _tv_check_shellcheck() → run shellcheck if available, warn if not
└─ _tv_check_mock_patterns() → verify mktemp -d usage in tests creating mocks
└─ _tv_write_report() → JSON report via jq
-
Critical failures (checks 1-4): cause
validate_test_infrastructure()to return 1. Intake catches this, logs an actionable error with fix instructions, and suggests--skip-test-validationas escape hatch. - Warnings (checks 5-6): logged but do not fail validation. Shellcheck absence is a warning. Mock pattern absence is a warning for files that create temp directories.
- The report always gets written (even on failure) so downstream tooling can parse it.
--skip-test-validation sets SKIP_TEST_VALIDATION=true. When set, validate_test_infrastructure() writes a report with "skipped": true and returns 0 immediately.
The call in pipeline-stages-intake.sh uses type guard (consistent with skill_analyze_issue on line 101):
if type validate_test_infrastructure >/dev/null 2>&1; then
if ! validate_test_infrastructure; then
error "Test infrastructure validation failed (use --skip-test-validation to bypass)"
return 1
fi
fiThis ensures the pipeline degrades gracefully if the module isn't present (e.g., older Shipwright installations mid-upgrade).
Internal functions use _tv_ prefix (test validation) to avoid namespace collisions — consistent with _tv_ being unused in the codebase. Module guard follows _PIPELINE_TEST_VALIDATION_LOADED pattern identical to _PIPELINE_STAGES_INTAKE_LOADED, _PIPELINE_DETECTION_LOADED, etc.
-
Inline in
stage_intake()— Pros: No new files, simple. / Cons: Bloats the 119-line intake function with ~150 lines of validation logic, makes unit testing impossible without running the full intake flow, violates the module separation pattern every other pipeline lib follows. -
Standalone CLI command (
sw-test-validation.sh) — Pros: Runnable independently viashipwright test-validation, useful outside pipeline context. / Cons: Requires CLI routing inscripts/sw(620 lines already), over-engineers what is fundamentally an internal pre-flight check. The validation depends on pipeline globals ($PROJECT_ROOT,$TEST_CMD,$ARTIFACTS_DIR) which would need parameter passing or env setup. -
Run a fast test execution instead of static checks — Pros: Catches runtime issues static analysis can't. / Cons: Slow (test suites take 1066s per
metrics.json), defeats the purpose of a fast pre-flight gate, may have side effects (file creation, network calls).
-
Files to create:
-
scripts/lib/pipeline-test-validation.sh— ~120 lines, 7 functions (1 public + 6 internal checks + 1 report writer) -
scripts/sw-lib-pipeline-test-validation-test.sh— ~250 lines, 18 tests
-
-
Files to modify:
-
scripts/sw-pipeline.sh— AddSKIP_TEST_VALIDATION=falsedefault (after line 178, near other boolean defaults), add source statement (after line 58, with other pipeline lib sources) -
scripts/lib/pipeline-cli.sh— Add--skip-test-validation) SKIP_TEST_VALIDATION=true; shift ;;inparse_args()(after line 159), add help text inshow_help()(after line 61) -
scripts/lib/pipeline-stages-intake.sh— Add validation call after test detection block (after line 65, between steps 3 and 4) -
package.json— Registersw-lib-pipeline-test-validation-test.shin test scripts
-
-
Dependencies: None new. Uses
jq(already required),shellcheck(optional), standardfind/grep. -
Risk areas:
-
False positives on non-Shipwright projects: Harness pattern checks (
PASS=0,FAIL=0) are Shipwright-specific. Must gate these behind.shfile type detection — non-bash test files skip harness checks entirely. -
Performance on large repos:
findfor test files could be slow in monorepos. Mitigate by limiting search depth (-maxdepth 3) and using fast glob patterns. - Shellcheck exit codes: Shellcheck returns different exit codes for different severity levels. Must map correctly (exit 1 = errors vs exit 0 = clean vs not-found).
-
jqavailability: Report writing depends onjq. Ifjqis missing, the check should still work but skip report generation (log a warning). However,jqis a hard prerequisite for Shipwright already (sw-doctor.shverifies this).
-
False positives on non-Shipwright projects: Harness pattern checks (
-
validate_test_infrastructure()returns 1 when$TEST_CMDis empty -
validate_test_infrastructure()returns 1 when no test files match expected patterns -
validate_test_infrastructure()returns 1 when.shtest files lack+xpermission -
validate_test_infrastructure()returns 1 when.shtest files lackPASS=0/FAIL=0counters - Shellcheck integration runs when binary is available, skips with warning when absent
- Mock pattern check warns (but does not fail) when
mktemp -dis absent -
--skip-test-validationproduces{"skipped": true}report and returns 0 - Validation report is valid JSON at
.claude/pipeline-artifacts/test-validation.json - Non-bash test files (
.test.ts,.test.py) bypass harness pattern checks - Intake stage fails with actionable error message referencing
--skip-test-validation - Test suite has 18+ tests: 5 positive, 7 negative, 4 edge cases, 2 integration
- All new code passes
shellcheck --shell=bashwith no errors - All new code is Bash 3.2 compatible (no associative arrays,
readarray,${var,,})
-
Unit tests (12-13): Each of the 6
_tv_*check functions tested with both pass and fail inputs. Test_tv_write_report()output format. Test skip mechanism. ~70% of test count. -
Integration tests (3-4): Full
validate_test_infrastructure()flow with valid project, invalid project, mixed valid/invalid files, non-bash project. ~20% of test count. -
E2E test (1-2): Verify
stage_intake()actually calls validation and fails appropriately. Verify--skip-test-validationflag reaches the function. ~10% of test count.
Total: ~18 tests
- 100% branch coverage on
validate_test_infrastructure()main flow (skip path, success path, failure path) - All 6 check functions: at least 1 pass and 1 fail scenario each
- Report JSON schema validated in at least 2 tests (success report, failure report)
- CLI flag propagation tested end-to-end
Happy path: Mock project directory with valid .sh test files containing all harness patterns, executable permissions → all 6 checks pass → report shows "valid": true → function returns 0.
Error cases:
-
$TEST_CMDis empty afterdetect_test_cmd()→test_cmd_detectedcheck fails → returns 1 with "no test command detected" - Test file exists but lacks
+xpermission →executable_permissionscheck fails → returns 1 with "chmod +x" suggestion - Test file missing
PASS=0counter →harness_patternscheck fails → returns 1 with pattern example
Edge cases:
-
shellcheckbinary not in$PATH→ check skipped with warning, overall validation still passes if other checks pass - Project has only
.test.tsfiles, no.shtests → harness/executable/mock checks skipped entirely, only test_cmd and test_files_exist run - Empty project directory with no test files at all →
test_files_existfails → returns 1 -
--skip-test-validationflag → writes{"skipped": true}report, returns 0 immediately without running any checks