Skip to content

Pipeline Design 175

ezigus edited this page Mar 17, 2026 · 1 revision

Now I have enough context. Here's the ADR:

Design: Issue pre-flight validation with actionability scoring

Context

Shipwright pipelines currently accept any GitHub issue regardless of content quality. A poorly written issue ("fix things", empty body, no code references) burns 12+ minutes of pipeline budget before failing at the build stage due to ambiguity. The intake stage (scripts/lib/pipeline-stages-intake.sh) fetches issue metadata (lines 13-48) but performs no quality assessment before proceeding to branch creation and task detection.

Constraints:

  • Bash 3.2 compatibility (no associative arrays, no ${var,,})
  • Must not break --goal-only pipelines (no ISSUE_BODY to validate)
  • Must integrate with existing emit_event() and gh_comment_issue() helpers
  • Must follow the scripts/lib/*.sh double-source guard pattern (see helpers.sh:30)
  • Exit code 2 for "check condition failed" per helpers.sh convention (line 10)
  • Daemon pipelines must handle blocked issues gracefully (skip, not retry)

Decision

Standalone validation module at scripts/lib/issue-validation.sh. Pure bash heuristic scoring (no AI calls) invoked from stage_intake() after issue metadata fetch (line 48) and before task type detection (line 51).

Scoring Model (0-100)

Four dimensions, 25 points each, plus a vagueness penalty:

  • Description length (0-25): Graduated by character count (<50=0, 50-150=10, 150-500=20, 500+=25)
  • Structure (0-25): Headings (+5), bullet/numbered lists (+5), acceptance criteria section (+10), code blocks (+5)
  • Specificity (0-25): File path references (+10), function/class names (+5), error messages/stack traces (+5), reproduction steps (+5)
  • Code references (0-25): File extensions (+10), line numbers (+5), code fences (+5), directory references (+5)
  • Vagueness penalty (0 to -20): -5 per vague phrase detected ("make it better", "improve performance", "fix things", etc.), capped at -20

Data Flow

stage_intake()
  → fetch issue metadata (existing, lines 13-48)
  → validate_issue_quality(ISSUE_BODY, GOAL, ISSUE_LABELS)
      → _score_description_length()
      → _score_structure()
      → _score_specificity()
      → _score_code_references()
      → _detect_vague_phrases()
      → sets VALIDATION_SCORE, VALIDATION_FEEDBACK
      → returns 0 (pass) or 2 (fail)
  → on failure: gh_comment_issue() with feedback, gh_add_labels() "needs-info", emit_event "intake.validation_failed", return 2
  → on pass: emit_event "intake.validation_passed", continue to task type detection

Bypass Conditions (returns score=100, skip all checks)

  • Issue labels contain skip-validation, hotfix, or urgent
  • SHIPWRIGHT_SKIP_VALIDATION=1 environment variable
  • Pipeline flag --skip-gates (existing mechanism)

Threshold

Configurable via pipeline config key intake.validation_threshold, default 60. Read with _config_get_int (existing helper from config.sh).

Error Handling

  • Validation failure returns exit 2 (check-condition-failed convention)
  • GitHub API unavailability: validation still runs locally, feedback posting fails silently (existing gh_comment_issue handles $NO_GITHUB)
  • Empty ISSUE_BODY: scores 0 on all dimensions, feedback says "Issue has no description"
  • Goal-only pipelines: validation block is skipped entirely (if [[ -n "$ISSUE_BODY" ]])

Alternatives Considered

  1. Inline validation in stage_intake() — Pros: No new files, simpler sourcing / Cons: Bloats a 116-line function to 200+, individual heuristics untestable in isolation, violates the module pattern used by all other scripts/lib/*.sh files
  2. AI-powered validation (Claude scores the issue) — Pros: More nuanced semantic understanding / Cons: Adds ~30s latency and ~$0.05 cost per pipeline start, defeats the goal of preventing wasted budget, creates a dependency on API availability at the earliest pipeline stage
  3. External GitHub Action/webhook — Pros: Validates before pipeline even starts / Cons: Requires infrastructure setup, doesn't integrate with pipeline event system, can't use bypass labels in pipeline context

Implementation Plan

  • Files to create:

    • scripts/lib/issue-validation.sh — Core validation module with double-source guard, scoring functions, bypass logic, feedback builder
    • scripts/sw-issue-validation-test.sh — Unit test suite for all scoring functions and composite validation
  • Files to modify:

    • scripts/lib/pipeline-stages-intake.sh — Insert validation call after line 48 (issue metadata fetch), before line 51 (task type detection). Source the new module.
    • config/event-schema.json — Add intake.validation_passed and intake.validation_failed event types
    • scripts/sw-pipeline-test.sh — Add integration test test_intake_validation_blocks_low_quality
  • Dependencies: None new. Uses existing helpers.sh (emit_event, output helpers), pipeline-github.sh (gh_comment_issue, gh_add_labels), config.sh (_config_get_int)

  • Risk areas:

    • False positives: Well-intentioned short issues (e.g., "Typo in README line 42") could score below threshold. Mitigated by bypass labels and tunable threshold.
    • Daemon retry loops: If daemon retries a blocked issue, it burns slots. Must emit intake.validation_failed event so daemon dispatch can recognize and skip.
    • Bash string matching edge cases: Vague phrase detection uses grep -i with word boundaries; regex must be portable across BSD/GNU grep. Use grep -iE with \b-free patterns (use [^a-z] boundaries instead for Bash 3.2/BSD compat).

Validation Criteria

  • Empty issue body (title only) produces score 0 and returns exit 2
  • Issue with < 50 char body scores below threshold and is blocked
  • Well-structured issue (headings, acceptance criteria, code refs, 500+ chars) scores >= 80
  • Each vague phrase ("make it better", "fix things") reduces score by 5, capped at -20
  • Issues with skip-validation / hotfix / urgent label bypass validation (score=100)
  • SHIPWRIGHT_SKIP_VALIDATION=1 env var bypasses validation
  • Feedback posted to GitHub issue lists specific deficiencies with actionable suggestions
  • intake.validation_passed event emitted with score on success
  • intake.validation_failed event emitted with score and threshold on failure
  • --goal-only pipelines (no --issue) skip validation entirely
  • All existing tests pass (npm test) with no regressions
  • All scoring functions work under Bash 3.2 (no associative arrays, no ${var,,})

Test Pyramid Breakdown

  • Unit tests (~12 tests in sw-issue-validation-test.sh): Each _score_* function tested with known inputs (empty, minimal, rich), vague phrase detection, bypass conditions, threshold override, feedback message content, composite validate_issue_quality() with realistic issue bodies
  • Integration tests (~2 tests in sw-pipeline-test.sh): Pipeline blocks on low-quality mock issue (exit 2, correct output); pipeline proceeds on high-quality mock issue
  • E2E tests (0): Not applicable — validation is deterministic string processing with no external dependencies beyond mocked gh

Coverage Targets

  • Unit: 100% of scoring functions and bypass paths (this is pure logic with no external deps)
  • Integration: Happy path (pass) and blocking path (fail) through stage_intake()

Critical Paths to Test

  • Happy path: Rich issue with acceptance criteria, code references → score >= 80 → pipeline proceeds
  • Error case 1: Empty body → score 0 → exit 2 → feedback posted → needs-info label added
  • Error case 2: Vague issue ("make it better, improve performance, fix things") → penalty applied → score below threshold → blocked
  • Edge case 1: Issue with hotfix label + empty body → bypass → score 100 → pipeline proceeds
  • Edge case 2: Borderline issue scoring exactly at threshold (60) → passes (>= comparison)

Clone this wiki locally