-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Design 175
Now I have enough context. Here's the ADR:
Shipwright pipelines currently accept any GitHub issue regardless of content quality. A poorly written issue ("fix things", empty body, no code references) burns 12+ minutes of pipeline budget before failing at the build stage due to ambiguity. The intake stage (scripts/lib/pipeline-stages-intake.sh) fetches issue metadata (lines 13-48) but performs no quality assessment before proceeding to branch creation and task detection.
Constraints:
- Bash 3.2 compatibility (no associative arrays, no
${var,,}) - Must not break
--goal-only pipelines (noISSUE_BODYto validate) - Must integrate with existing
emit_event()andgh_comment_issue()helpers - Must follow the
scripts/lib/*.shdouble-source guard pattern (seehelpers.sh:30) - Exit code 2 for "check condition failed" per
helpers.shconvention (line 10) - Daemon pipelines must handle blocked issues gracefully (skip, not retry)
Standalone validation module at scripts/lib/issue-validation.sh. Pure bash heuristic scoring (no AI calls) invoked from stage_intake() after issue metadata fetch (line 48) and before task type detection (line 51).
Four dimensions, 25 points each, plus a vagueness penalty:
- Description length (0-25): Graduated by character count (<50=0, 50-150=10, 150-500=20, 500+=25)
- Structure (0-25): Headings (+5), bullet/numbered lists (+5), acceptance criteria section (+10), code blocks (+5)
- Specificity (0-25): File path references (+10), function/class names (+5), error messages/stack traces (+5), reproduction steps (+5)
- Code references (0-25): File extensions (+10), line numbers (+5), code fences (+5), directory references (+5)
- Vagueness penalty (0 to -20): -5 per vague phrase detected ("make it better", "improve performance", "fix things", etc.), capped at -20
stage_intake()
→ fetch issue metadata (existing, lines 13-48)
→ validate_issue_quality(ISSUE_BODY, GOAL, ISSUE_LABELS)
→ _score_description_length()
→ _score_structure()
→ _score_specificity()
→ _score_code_references()
→ _detect_vague_phrases()
→ sets VALIDATION_SCORE, VALIDATION_FEEDBACK
→ returns 0 (pass) or 2 (fail)
→ on failure: gh_comment_issue() with feedback, gh_add_labels() "needs-info", emit_event "intake.validation_failed", return 2
→ on pass: emit_event "intake.validation_passed", continue to task type detection
- Issue labels contain
skip-validation,hotfix, orurgent -
SHIPWRIGHT_SKIP_VALIDATION=1environment variable - Pipeline flag
--skip-gates(existing mechanism)
Configurable via pipeline config key intake.validation_threshold, default 60. Read with _config_get_int (existing helper from config.sh).
- Validation failure returns exit 2 (check-condition-failed convention)
- GitHub API unavailability: validation still runs locally, feedback posting fails silently (existing
gh_comment_issuehandles$NO_GITHUB) - Empty
ISSUE_BODY: scores 0 on all dimensions, feedback says "Issue has no description" - Goal-only pipelines: validation block is skipped entirely (
if [[ -n "$ISSUE_BODY" ]])
-
Inline validation in
stage_intake()— Pros: No new files, simpler sourcing / Cons: Bloats a 116-line function to 200+, individual heuristics untestable in isolation, violates the module pattern used by all otherscripts/lib/*.shfiles - AI-powered validation (Claude scores the issue) — Pros: More nuanced semantic understanding / Cons: Adds ~30s latency and ~$0.05 cost per pipeline start, defeats the goal of preventing wasted budget, creates a dependency on API availability at the earliest pipeline stage
- External GitHub Action/webhook — Pros: Validates before pipeline even starts / Cons: Requires infrastructure setup, doesn't integrate with pipeline event system, can't use bypass labels in pipeline context
-
Files to create:
-
scripts/lib/issue-validation.sh— Core validation module with double-source guard, scoring functions, bypass logic, feedback builder -
scripts/sw-issue-validation-test.sh— Unit test suite for all scoring functions and composite validation
-
-
Files to modify:
-
scripts/lib/pipeline-stages-intake.sh— Insert validation call after line 48 (issue metadata fetch), before line 51 (task type detection). Source the new module. -
config/event-schema.json— Addintake.validation_passedandintake.validation_failedevent types -
scripts/sw-pipeline-test.sh— Add integration testtest_intake_validation_blocks_low_quality
-
-
Dependencies: None new. Uses existing
helpers.sh(emit_event, output helpers),pipeline-github.sh(gh_comment_issue, gh_add_labels),config.sh(_config_get_int) -
Risk areas:
- False positives: Well-intentioned short issues (e.g., "Typo in README line 42") could score below threshold. Mitigated by bypass labels and tunable threshold.
-
Daemon retry loops: If daemon retries a blocked issue, it burns slots. Must emit
intake.validation_failedevent so daemon dispatch can recognize and skip. -
Bash string matching edge cases: Vague phrase detection uses
grep -iwith word boundaries; regex must be portable across BSD/GNU grep. Usegrep -iEwith\b-free patterns (use[^a-z]boundaries instead for Bash 3.2/BSD compat).
- Empty issue body (title only) produces score 0 and returns exit 2
- Issue with < 50 char body scores below threshold and is blocked
- Well-structured issue (headings, acceptance criteria, code refs, 500+ chars) scores >= 80
- Each vague phrase ("make it better", "fix things") reduces score by 5, capped at -20
- Issues with
skip-validation/hotfix/urgentlabel bypass validation (score=100) -
SHIPWRIGHT_SKIP_VALIDATION=1env var bypasses validation - Feedback posted to GitHub issue lists specific deficiencies with actionable suggestions
-
intake.validation_passedevent emitted with score on success -
intake.validation_failedevent emitted with score and threshold on failure -
--goal-only pipelines (no--issue) skip validation entirely - All existing tests pass (
npm test) with no regressions - All scoring functions work under Bash 3.2 (no associative arrays, no
${var,,})
-
Unit tests (~12 tests in
sw-issue-validation-test.sh): Each_score_*function tested with known inputs (empty, minimal, rich), vague phrase detection, bypass conditions, threshold override, feedback message content, compositevalidate_issue_quality()with realistic issue bodies -
Integration tests (~2 tests in
sw-pipeline-test.sh): Pipeline blocks on low-quality mock issue (exit 2, correct output); pipeline proceeds on high-quality mock issue -
E2E tests (0): Not applicable — validation is deterministic string processing with no external dependencies beyond mocked
gh
- Unit: 100% of scoring functions and bypass paths (this is pure logic with no external deps)
- Integration: Happy path (pass) and blocking path (fail) through
stage_intake()
- Happy path: Rich issue with acceptance criteria, code references → score >= 80 → pipeline proceeds
-
Error case 1: Empty body → score 0 → exit 2 → feedback posted →
needs-infolabel added - Error case 2: Vague issue ("make it better, improve performance, fix things") → penalty applied → score below threshold → blocked
-
Edge case 1: Issue with
hotfixlabel + empty body → bypass → score 100 → pipeline proceeds - Edge case 2: Borderline issue scoring exactly at threshold (60) → passes (>= comparison)