-
Notifications
You must be signed in to change notification settings - Fork 1
Pipeline Design 187
Now I have a thorough understanding of the codebase. Let me write the ADR.
The Shipwright build loop (scripts/sw-loop.sh, 3366 lines) captures test failures into error-summary.json via write_error_summary(), then injects those errors into the next Claude iteration prompt via compose_prompt(). Currently, error lines are passed verbatim — vague errors like "FAIL something went wrong" consume iteration budget without giving the agent enough signal to fix the root cause. High-actionability errors (with file paths, line numbers, expected/got detail) resolve faster, but the agent has no way to distinguish or enrich the two.
Constraints:
- Bash 3.2 compatible (no associative arrays,
readarray,${var,,}) -
set -euo pipefailrequired - Must not touch
compose_prompt()— it's a 300-line function with 20 injection points and is the highest-risk modification surface in the loop -
jqis a required dependency (already insw-doctor.shchecks) - Atomic file writes via tmp+mv (project convention)
- Library double-source guard pattern (
_SW_*_LOADED)
Insert a new library scripts/lib/error-actionability.sh that operates on error-summary.json after write_error_summary() and before compose_prompt() reads it. This is a pure data-layer enhancement — compose_prompt() continues to read .error_lines[] unchanged, but those lines now contain richer context.
sw-loop.sh main loop
│
┌─────────┴──────────┐
│ run_test_gate() │
│ write_error_summary│ ── writes error-summary.json
│ │ │
│ ┌───────▼────────┐ │
│ │ enhance_error_ │ │ ◄── NEW (2 lines added)
│ │ summary() │ │
│ └───────┬────────┘ │
│ │ │
│ compose_prompt() │ ── reads error-summary.json (unchanged)
└─────────┬──────────┘
│
lib/error-actionability.sh
┌─────────────────────────┐
│ score_error_line() │ ── pure function: string → int
│ categorize_error() │ ── pure function: string → enum
│ score_error_summary() │ ── file → aggregate int
│ enhance_error_line() │ ── string → enriched string
│ enhance_error_summary() │ ── orchestrator: score, enhance, emit
└─────────────────────────┘
Data flows inward: the library depends only on jq, git (optional), and emit_event (optional). It has no dependency on sw-loop.sh internals.
| Signal | Points | Detection |
|---|---|---|
| File path present | 25 | Regex: path/to/file.ext patterns |
| Line number present | 20 | Regex: :123, line N, (N:N)
|
| Specific error type | 20 | Known types: TypeError, SyntaxError, ENOENT, etc. |
| Actionable detail | 20 | Keywords: expected, got, missing, not defined
|
| Fix suggestion | 15 | Patterns: did you mean, try, consider
|
Enhancement threshold: score < 70. Lines scoring >= 70 have enough signal for the agent to act. Lines below 70 get:
- Category prefix:
[syntax],[runtime],[assertion],[dependency],[build],[unknown] - Recently-changed files context (from
git diff --name-only HEAD~1) for lines scoring < 45
// score_error_line(line: string) → score: int (0-100)
// Pure function. No side effects. No file I/O.
// Precondition: line is a non-empty string
// Postcondition: 0 <= score <= 100
// categorize_error(line: string) → category: "syntax" | "runtime" | "assertion" | "dependency" | "build" | "unknown"
// Pure function. No side effects.
// score_error_summary(error_json_path: string) → aggregate_score: int (0-100)
// Reads file. Returns 100 if file missing/empty/zero errors.
// Error contract: never fails — returns 100 on any I/O error
// enhance_error_line(line: string, log_dir: string) → enhanced_line: string
// May call `git diff` if score < 45 and git is available.
// Returns original line unchanged if score >= 70.
// enhance_error_summary(log_dir: string) → void (modifies error-summary.json in place)
// Main entry point. Reads/writes error-summary.json atomically.
// Emits event: error.actionability_scored { score, error_count, enhanced, iteration }
// Error contract: returns 0 on any failure (graceful no-op)1. Test fails → write_error_summary() creates:
{
"iteration": 3,
"error_count": 2,
"error_lines": ["FAIL something broke", "Error: test failed"],
"test_cmd": "npm test"
}
2. enhance_error_summary() reads it, scores each line:
- "FAIL something broke" → score 0 (no file, no line, no type, no detail)
- "Error: test failed" → score 0
3. Aggregate score: 0 (< 70 threshold) → enhance
4. Enhanced JSON written atomically:
{
"iteration": 3,
"error_count": 2,
"error_lines": [
"[unknown] FAIL something broke (recently changed: src/app.ts, tests/app.test.ts)",
"[unknown] Error: test failed (recently changed: src/app.ts, tests/app.test.ts)"
],
"original_error_lines": ["FAIL something broke", "Error: test failed"],
"actionability_score": 0,
"score_breakdown": [
{"line": "FAIL something broke", "score": 0, "category": "unknown"},
{"line": "Error: test failed", "score": 0, "category": "unknown"}
],
"test_cmd": "npm test"
}
5. compose_prompt() reads .error_lines[] (now enriched) — no code change needed
| Component | Error Handling | Propagation |
|---|---|---|
score_error_line |
Cannot fail — all regex via grep -q with ` |
|
categorize_error |
Falls through to "unknown" category |
Never propagates errors |
score_error_summary |
Returns 100 on missing file, bad JSON, zero errors |
Swallows jq failures via 2>/dev/null || echo "0"
|
enhance_error_line |
git diff failures caught with || true
|
Returns original line on any failure |
enhance_error_summary |
Atomic write: tmp file + mv, cleanup on failure. Returns 0 always. | Never causes loop iteration to fail |
sw-loop.sh integration |
Guarded with type enhance_error_summary >/dev/null 2>&1
|
Missing library = no-op, zero impact |
Key invariant: The enhancer can never cause a build loop iteration to fail. Every code path returns 0.
-
Modify
compose_prompt()directly — Pros: Single function handles everything, no new library. / Cons:compose_prompt()is 300 lines with 20 injection sections; adding scoring logic makes it harder to test and maintain. The plan explicitly avoids this, and the existing architecture strongly favors library decomposition. -
Post-process in the prompt template (string replacement) — Pros: No JSON modification. / Cons: Would require parsing error lines from a heredoc, regex on prompt text is fragile, and the compose_prompt output isn't a structured format to reliably modify.
-
LLM-based error enhancement (call Claude to rewrite errors) — Pros: Better quality enhancement than regex. / Cons: Adds API cost per iteration, adds latency (300ms+ per call), creates a dependency on Claude availability inside the loop, and violates the project's convention of keeping loop overhead minimal.
-
Inline scoring in
write_error_summary()— Pros: Single function, no new library needed. / Cons:write_error_summary()would grow to handle scoring, categorization, git context, and JSON enrichment — violating single responsibility. Library approach makes unit testing trivial.
-
Files to create:
-
scripts/lib/error-actionability.sh— Core library (scorer, categorizer, enhancer) -
scripts/sw-lib-error-actionability-test.sh— Test suite (scoring, categorization, enhancement, edge cases)
-
-
Files to modify:
-
scripts/sw-loop.sh— 2 additions: source the library (near other lib sources, ~line 35), callenhance_error_summary(afterwrite_error_summary, ~line 3164) -
package.json— 1 line: register test suite
-
-
Dependencies: None new. Uses
jq(already required),git(optional, for recently-changed files). -
Risk areas:
-
JSON escaping in
enhance_error_summary— Error lines may contain quotes, backslashes, special characters. The plan usessed 's/\\/\\\\/g; s/"/\\"/g'for manual escaping when buildingline_scoresJSON. This is fragile — a line containing single quotes, newlines, or control characters could break the JSON. Mitigation:jq --argshould be used for any string→JSON conversion. The plan's approach of manual string escaping for theline_scoresJSON array construction is the primary risk. -
git diff HEAD~1on fresh repos — First commit has no parent. The2>/dev/null || trueguard handles this, but produces empty context. -
Performance on large error sets — Each error line spawns 5-7
grepsubprocesses for scoring + 5 more for categorization. With 10 error lines, that's ~120 process forks. Acceptable for the loop's iteration cadence (minutes between iterations), but worth noting. - Race condition on error-summary.json — Atomic write (tmp+mv) prevents corruption. No concurrent readers during write window since the loop is single-threaded.
-
JSON escaping in
-
score_error_linereturns 0 for"FAIL something went wrong"(zero signals) -
score_error_linereturns >70 for"TypeError: Cannot read property 'x' of undefined at src/app.ts:42"(file + line + type + detail) -
categorize_errorcorrectly classifies all 6 categories (syntax, runtime, assertion, dependency, build, unknown) -
score_error_summaryreturns 100 for missing/empty/zero-error files -
enhance_error_summaryaddsactionability_score,score_breakdownfields to JSON -
enhance_error_summaryreplaceserror_linesand preservesoriginal_error_lineswhen score < 70 -
enhance_error_summarydoes NOT addoriginal_error_lineswhen score >= 70 (no unnecessary enhancement) - Enhanced error lines have
[category]prefix - Error lines with score < 45 include
(recently changed: ...)context - Malformed JSON, missing files, empty log dirs all handled gracefully (return 0)
-
emit_event "error.actionability_scored"fires with correct fields -
sw-loop-test.shpasses with zero regressions after integration - Library sources conditionally (
[[ -f ... ]] && source) - Enhancement call guarded with
typecheck (no-op if library missing) - All bash is 3.2 compatible (no associative arrays, readarray,
${var,,}) - JSON writes use atomic tmp+mv pattern
Component Hierarchy / State Management / Accessibility Checklist / Responsive Breakpoints: This feature is a bash shell library operating on JSON files in a CLI tool's build loop. There are no frontend components, UI elements, or browser-rendered output. The "frontend issue" designation in the skill guidance does not apply to this feature.