Skip to content

Pipeline Design 187

Seth Ford edited this page Mar 1, 2026 · 1 revision

Now I have a thorough understanding of the codebase. Let me write the ADR.


Design: Build Loop Error Message Actionability Scorer and Auto-Enhancement

Context

The Shipwright build loop (scripts/sw-loop.sh, 3366 lines) captures test failures into error-summary.json via write_error_summary(), then injects those errors into the next Claude iteration prompt via compose_prompt(). Currently, error lines are passed verbatim — vague errors like "FAIL something went wrong" consume iteration budget without giving the agent enough signal to fix the root cause. High-actionability errors (with file paths, line numbers, expected/got detail) resolve faster, but the agent has no way to distinguish or enrich the two.

Constraints:

  • Bash 3.2 compatible (no associative arrays, readarray, ${var,,})
  • set -euo pipefail required
  • Must not touch compose_prompt() — it's a 300-line function with 20 injection points and is the highest-risk modification surface in the loop
  • jq is a required dependency (already in sw-doctor.sh checks)
  • Atomic file writes via tmp+mv (project convention)
  • Library double-source guard pattern (_SW_*_LOADED)

Decision

Approach: JSON enrichment between write and read

Insert a new library scripts/lib/error-actionability.sh that operates on error-summary.json after write_error_summary() and before compose_prompt() reads it. This is a pure data-layer enhancement — compose_prompt() continues to read .error_lines[] unchanged, but those lines now contain richer context.

Component Diagram

                        sw-loop.sh main loop
                              │
                    ┌─────────┴──────────┐
                    │  run_test_gate()    │
                    │  write_error_summary│ ── writes error-summary.json
                    │          │          │
                    │  ┌───────▼────────┐ │
                    │  │ enhance_error_ │ │ ◄── NEW (2 lines added)
                    │  │ summary()      │ │
                    │  └───────┬────────┘ │
                    │          │          │
                    │  compose_prompt()   │ ── reads error-summary.json (unchanged)
                    └─────────┬──────────┘
                              │
                    lib/error-actionability.sh
                    ┌─────────────────────────┐
                    │ score_error_line()       │ ── pure function: string → int
                    │ categorize_error()       │ ── pure function: string → enum
                    │ score_error_summary()    │ ── file → aggregate int
                    │ enhance_error_line()     │ ── string → enriched string
                    │ enhance_error_summary()  │ ── orchestrator: score, enhance, emit
                    └─────────────────────────┘

Data flows inward: the library depends only on jq, git (optional), and emit_event (optional). It has no dependency on sw-loop.sh internals.

Scoring Rubric (0-100, additive)

Signal Points Detection
File path present 25 Regex: path/to/file.ext patterns
Line number present 20 Regex: :123, line N, (N:N)
Specific error type 20 Known types: TypeError, SyntaxError, ENOENT, etc.
Actionable detail 20 Keywords: expected, got, missing, not defined
Fix suggestion 15 Patterns: did you mean, try, consider

Enhancement threshold: score < 70. Lines scoring >= 70 have enough signal for the agent to act. Lines below 70 get:

  1. Category prefix: [syntax], [runtime], [assertion], [dependency], [build], [unknown]
  2. Recently-changed files context (from git diff --name-only HEAD~1) for lines scoring < 45

Interface Contracts

// score_error_line(line: string) → score: int (0-100)
// Pure function. No side effects. No file I/O.
// Precondition: line is a non-empty string
// Postcondition: 0 <= score <= 100

// categorize_error(line: string) → category: "syntax" | "runtime" | "assertion" | "dependency" | "build" | "unknown"
// Pure function. No side effects.

// score_error_summary(error_json_path: string) → aggregate_score: int (0-100)
// Reads file. Returns 100 if file missing/empty/zero errors.
// Error contract: never fails — returns 100 on any I/O error

// enhance_error_line(line: string, log_dir: string) → enhanced_line: string
// May call `git diff` if score < 45 and git is available.
// Returns original line unchanged if score >= 70.

// enhance_error_summary(log_dir: string) → void (modifies error-summary.json in place)
// Main entry point. Reads/writes error-summary.json atomically.
// Emits event: error.actionability_scored { score, error_count, enhanced, iteration }
// Error contract: returns 0 on any failure (graceful no-op)

Data Flow: Error Enhancement

1. Test fails → write_error_summary() creates:
   {
     "iteration": 3,
     "error_count": 2,
     "error_lines": ["FAIL something broke", "Error: test failed"],
     "test_cmd": "npm test"
   }

2. enhance_error_summary() reads it, scores each line:
   - "FAIL something broke" → score 0 (no file, no line, no type, no detail)
   - "Error: test failed"  → score 0

3. Aggregate score: 0 (< 70 threshold) → enhance

4. Enhanced JSON written atomically:
   {
     "iteration": 3,
     "error_count": 2,
     "error_lines": [
       "[unknown] FAIL something broke (recently changed: src/app.ts, tests/app.test.ts)",
       "[unknown] Error: test failed (recently changed: src/app.ts, tests/app.test.ts)"
     ],
     "original_error_lines": ["FAIL something broke", "Error: test failed"],
     "actionability_score": 0,
     "score_breakdown": [
       {"line": "FAIL something broke", "score": 0, "category": "unknown"},
       {"line": "Error: test failed", "score": 0, "category": "unknown"}
     ],
     "test_cmd": "npm test"
   }

5. compose_prompt() reads .error_lines[] (now enriched) — no code change needed

Error Boundaries

Component Error Handling Propagation
score_error_line Cannot fail — all regex via grep -q with `
categorize_error Falls through to "unknown" category Never propagates errors
score_error_summary Returns 100 on missing file, bad JSON, zero errors Swallows jq failures via 2>/dev/null || echo "0"
enhance_error_line git diff failures caught with || true Returns original line on any failure
enhance_error_summary Atomic write: tmp file + mv, cleanup on failure. Returns 0 always. Never causes loop iteration to fail
sw-loop.sh integration Guarded with type enhance_error_summary >/dev/null 2>&1 Missing library = no-op, zero impact

Key invariant: The enhancer can never cause a build loop iteration to fail. Every code path returns 0.

Alternatives Considered

  1. Modify compose_prompt() directly — Pros: Single function handles everything, no new library. / Cons: compose_prompt() is 300 lines with 20 injection sections; adding scoring logic makes it harder to test and maintain. The plan explicitly avoids this, and the existing architecture strongly favors library decomposition.

  2. Post-process in the prompt template (string replacement) — Pros: No JSON modification. / Cons: Would require parsing error lines from a heredoc, regex on prompt text is fragile, and the compose_prompt output isn't a structured format to reliably modify.

  3. LLM-based error enhancement (call Claude to rewrite errors) — Pros: Better quality enhancement than regex. / Cons: Adds API cost per iteration, adds latency (300ms+ per call), creates a dependency on Claude availability inside the loop, and violates the project's convention of keeping loop overhead minimal.

  4. Inline scoring in write_error_summary() — Pros: Single function, no new library needed. / Cons: write_error_summary() would grow to handle scoring, categorization, git context, and JSON enrichment — violating single responsibility. Library approach makes unit testing trivial.

Implementation Plan

  • Files to create:

    • scripts/lib/error-actionability.sh — Core library (scorer, categorizer, enhancer)
    • scripts/sw-lib-error-actionability-test.sh — Test suite (scoring, categorization, enhancement, edge cases)
  • Files to modify:

    • scripts/sw-loop.sh — 2 additions: source the library (near other lib sources, ~line 35), call enhance_error_summary (after write_error_summary, ~line 3164)
    • package.json — 1 line: register test suite
  • Dependencies: None new. Uses jq (already required), git (optional, for recently-changed files).

  • Risk areas:

    1. JSON escaping in enhance_error_summary — Error lines may contain quotes, backslashes, special characters. The plan uses sed 's/\\/\\\\/g; s/"/\\"/g' for manual escaping when building line_scores JSON. This is fragile — a line containing single quotes, newlines, or control characters could break the JSON. Mitigation: jq --arg should be used for any string→JSON conversion. The plan's approach of manual string escaping for the line_scores JSON array construction is the primary risk.
    2. git diff HEAD~1 on fresh repos — First commit has no parent. The 2>/dev/null || true guard handles this, but produces empty context.
    3. Performance on large error sets — Each error line spawns 5-7 grep subprocesses for scoring + 5 more for categorization. With 10 error lines, that's ~120 process forks. Acceptable for the loop's iteration cadence (minutes between iterations), but worth noting.
    4. Race condition on error-summary.json — Atomic write (tmp+mv) prevents corruption. No concurrent readers during write window since the loop is single-threaded.

Validation Criteria

  • score_error_line returns 0 for "FAIL something went wrong" (zero signals)
  • score_error_line returns >70 for "TypeError: Cannot read property 'x' of undefined at src/app.ts:42" (file + line + type + detail)
  • categorize_error correctly classifies all 6 categories (syntax, runtime, assertion, dependency, build, unknown)
  • score_error_summary returns 100 for missing/empty/zero-error files
  • enhance_error_summary adds actionability_score, score_breakdown fields to JSON
  • enhance_error_summary replaces error_lines and preserves original_error_lines when score < 70
  • enhance_error_summary does NOT add original_error_lines when score >= 70 (no unnecessary enhancement)
  • Enhanced error lines have [category] prefix
  • Error lines with score < 45 include (recently changed: ...) context
  • Malformed JSON, missing files, empty log dirs all handled gracefully (return 0)
  • emit_event "error.actionability_scored" fires with correct fields
  • sw-loop-test.sh passes with zero regressions after integration
  • Library sources conditionally ([[ -f ... ]] && source)
  • Enhancement call guarded with type check (no-op if library missing)
  • All bash is 3.2 compatible (no associative arrays, readarray, ${var,,})
  • JSON writes use atomic tmp+mv pattern

Sections Not Applicable

Component Hierarchy / State Management / Accessibility Checklist / Responsive Breakpoints: This feature is a bash shell library operating on JSON files in a CLI tool's build loop. There are no frontend components, UI elements, or browser-rendered output. The "frontend issue" designation in the skill guidance does not apply to this feature.

Clone this wiki locally