Pipeline Design 175

Now I have enough context. Here's the ADR:

Design: Issue pre-flight validation with actionability scoring

Context

Shipwright pipelines currently accept any GitHub issue regardless of content quality. A poorly written issue ("fix things", empty body, no code references) burns 12+ minutes of pipeline budget before failing at the build stage due to ambiguity. The intake stage (scripts/lib/pipeline-stages-intake.sh) fetches issue metadata (lines 13-48) but performs no quality assessment before proceeding to branch creation and task detection.

Constraints:

Bash 3.2 compatibility (no associative arrays, no ${var,,})
Must not break --goal-only pipelines (no ISSUE_BODY to validate)
Must integrate with existing emit_event() and gh_comment_issue() helpers
Must follow the scripts/lib/*.sh double-source guard pattern (see helpers.sh:30)
Exit code 2 for "check condition failed" per helpers.sh convention (line 10)
Daemon pipelines must handle blocked issues gracefully (skip, not retry)

Decision

Standalone validation module at scripts/lib/issue-validation.sh. Pure bash heuristic scoring (no AI calls) invoked from stage_intake() after issue metadata fetch (line 48) and before task type detection (line 51).

Scoring Model (0-100)

Four dimensions, 25 points each, plus a vagueness penalty:

Description length (0-25): Graduated by character count (<50=0, 50-150=10, 150-500=20, 500+=25)
Structure (0-25): Headings (+5), bullet/numbered lists (+5), acceptance criteria section (+10), code blocks (+5)
Specificity (0-25): File path references (+10), function/class names (+5), error messages/stack traces (+5), reproduction steps (+5)
Code references (0-25): File extensions (+10), line numbers (+5), code fences (+5), directory references (+5)
Vagueness penalty (0 to -20): -5 per vague phrase detected ("make it better", "improve performance", "fix things", etc.), capped at -20

Data Flow

stage_intake()
  → fetch issue metadata (existing, lines 13-48)
  → validate_issue_quality(ISSUE_BODY, GOAL, ISSUE_LABELS)
      → _score_description_length()
      → _score_structure()
      → _score_specificity()
      → _score_code_references()
      → _detect_vague_phrases()
      → sets VALIDATION_SCORE, VALIDATION_FEEDBACK
      → returns 0 (pass) or 2 (fail)
  → on failure: gh_comment_issue() with feedback, gh_add_labels() "needs-info", emit_event "intake.validation_failed", return 2
  → on pass: emit_event "intake.validation_passed", continue to task type detection

Bypass Conditions (returns score=100, skip all checks)

Issue labels contain skip-validation, hotfix, or urgent
SHIPWRIGHT_SKIP_VALIDATION=1 environment variable
Pipeline flag --skip-gates (existing mechanism)

Threshold

Configurable via pipeline config key intake.validation_threshold, default 60. Read with _config_get_int (existing helper from config.sh).

Error Handling

Validation failure returns exit 2 (check-condition-failed convention)
GitHub API unavailability: validation still runs locally, feedback posting fails silently (existing gh_comment_issue handles $NO_GITHUB)
Empty ISSUE_BODY: scores 0 on all dimensions, feedback says "Issue has no description"
Goal-only pipelines: validation block is skipped entirely (if [[ -n "$ISSUE_BODY" ]])

Alternatives Considered

Inline validation in stage_intake() — Pros: No new files, simpler sourcing / Cons: Bloats a 116-line function to 200+, individual heuristics untestable in isolation, violates the module pattern used by all other scripts/lib/*.sh files
AI-powered validation (Claude scores the issue) — Pros: More nuanced semantic understanding / Cons: Adds ~30s latency and ~$0.05 cost per pipeline start, defeats the goal of preventing wasted budget, creates a dependency on API availability at the earliest pipeline stage
External GitHub Action/webhook — Pros: Validates before pipeline even starts / Cons: Requires infrastructure setup, doesn't integrate with pipeline event system, can't use bypass labels in pipeline context

Implementation Plan

Files to create:
- scripts/lib/issue-validation.sh — Core validation module with double-source guard, scoring functions, bypass logic, feedback builder
- scripts/sw-issue-validation-test.sh — Unit test suite for all scoring functions and composite validation
Files to modify:
- scripts/lib/pipeline-stages-intake.sh — Insert validation call after line 48 (issue metadata fetch), before line 51 (task type detection). Source the new module.
- config/event-schema.json — Add intake.validation_passed and intake.validation_failed event types
- scripts/sw-pipeline-test.sh — Add integration test test_intake_validation_blocks_low_quality
Dependencies: None new. Uses existing helpers.sh (emit_event, output helpers), pipeline-github.sh (gh_comment_issue, gh_add_labels), config.sh (_config_get_int)
Risk areas:
- False positives: Well-intentioned short issues (e.g., "Typo in README line 42") could score below threshold. Mitigated by bypass labels and tunable threshold.
- Daemon retry loops: If daemon retries a blocked issue, it burns slots. Must emit intake.validation_failed event so daemon dispatch can recognize and skip.
- Bash string matching edge cases: Vague phrase detection uses grep -i with word boundaries; regex must be portable across BSD/GNU grep. Use grep -iE with \b-free patterns (use [^a-z] boundaries instead for Bash 3.2/BSD compat).

Validation Criteria

Test Pyramid Breakdown

Unit tests (~12 tests in sw-issue-validation-test.sh): Each _score_* function tested with known inputs (empty, minimal, rich), vague phrase detection, bypass conditions, threshold override, feedback message content, composite validate_issue_quality() with realistic issue bodies
Integration tests (~2 tests in sw-pipeline-test.sh): Pipeline blocks on low-quality mock issue (exit 2, correct output); pipeline proceeds on high-quality mock issue
E2E tests (0): Not applicable — validation is deterministic string processing with no external dependencies beyond mocked gh

Coverage Targets

Unit: 100% of scoring functions and bypass paths (this is pure logic with no external deps)
Integration: Happy path (pass) and blocking path (fail) through stage_intake()

Critical Paths to Test

Happy path: Rich issue with acceptance criteria, code references → score >= 80 → pipeline proceeds
Error case 1: Empty body → score 0 → exit 2 → feedback posted → needs-info label added
Error case 2: Vague issue ("make it better, improve performance, fix things") → penalty applied → score below threshold → blocked
Edge case 1: Issue with hotfix label + empty body → bypass → score 100 → pipeline proceeds
Edge case 2: Borderline issue scoring exactly at threshold (60) → passes (>= comparison)

Pipeline Design 175

Design: Issue pre-flight validation with actionability scoring

Context

Decision

Scoring Model (0-100)

Data Flow

Bypass Conditions (returns score=100, skip all checks)

Threshold

Error Handling

Alternatives Considered

Implementation Plan

Validation Criteria

Test Pyramid Breakdown

Coverage Targets

Critical Paths to Test

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally