-
Notifications
You must be signed in to change notification settings - Fork 1
Pipeline Plan 184
Issue: #184
Branch: feat/-failure-root-cause-classifier-with-auto-184
Complexity: Standard
Estimated files: 7 modified, 1 new
Minimum viable change: Wire the existing lib/root-cause.sh classifier (427 lines, already on this branch) into the daemon's failure handling path, add historical pattern learning from events.jsonl, create a dashboard breakdown visualization, and expose a CLI command.
Implicit requirements not stated:
- The classifier library already exists but isn't called from the daemon — the critical integration is missing
- Historical learning needs to feed back into classification confidence (not just regex)
- Dashboard needs both an API endpoint and frontend component
- CLI command needed for standalone
shipwright root-causeusage
Acceptance criteria (from issue):
- Failure classifier analyzes error-log.jsonl and categorizes root cause — LIBRARY EXISTS
- Decision tree trained on historical failure patterns from events.jsonl — NEEDS IMPLEMENTATION
- Platform bugs trigger automatic GitHub issue creation — LIBRARY EXISTS, NEEDS DAEMON WIRING
- Dashboard shows failure breakdown by category — NEEDS IMPLEMENTATION
- Reduce repeat platform failures by >30% — MEASURABLE VIA LEARNING SYSTEM
Approach A: Enhance existing library + add integration points (CHOSEN)
- Pros: Minimal blast radius (7 files), builds on 427-line library with 25+ tests, test suite already passes
- Cons: Regex-based classification has limits vs ML-based approach
- Blast radius: 7 files modified, 1 new
- Complexity: Low-medium
Approach B: ML-based classifier using Claude API calls
- Pros: More sophisticated, context-aware classification
- Cons: Over-engineered for bash, adds API cost per failure, adds latency to failure handling path, fragile in offline/local mode
- Blast radius: 15+ files
- Complexity: High
Approach C: Build entirely new classification system
- Pros: Clean design from scratch
- Cons: Discards 427 lines of working code + 374 lines of tests, massive waste
- Blast radius: 20+ files
- Complexity: Very high
Decision: Approach A — The library is feature-complete. The gap is purely integration: wiring it into the daemon, adding historical learning, and surfacing data in the dashboard.
| Risk | Impact | Likelihood | Mitigation |
|---|---|---|---|
| Daemon integration breaks failure handling | High | Low | Wrap all rootcause calls in ` |
| GitHub issue spam from auto-creation | Medium | Low | Already gated: confidence >70% + dedup via cksum signature |
| events.jsonl too large for analysis | Low | Medium | Read only last 200 entries, use tail not cat
|
| Dashboard endpoint performance | Low | Low | Aggregate at read time, cache results |
| Root cause misclassification | Medium | Medium | Historical learning improves over time, unknown defaults to code_bug |
| File | Lines | Status |
|---|---|---|
scripts/lib/root-cause.sh |
427 | Complete — 7 categories, classify/analyze/create_issue/suggest_fix/learn/report/main |
scripts/sw-root-cause-test.sh |
374 | Complete — 25+ tests covering all functions |
scripts/sw-pipeline.sh:58 |
1 line | Sources root-cause.sh |
package.json |
1 line | Test suite registered |
| Component | File | Description |
|---|---|---|
| CLI entry point |
scripts/sw-root-cause.sh (NEW) |
Standalone shipwright root-cause command |
| CLI router | scripts/sw |
Add root-cause dispatch |
| Daemon integration | scripts/lib/daemon-failure.sh |
Wire rootcause_main() into daemon_on_failure()
|
| Historical learning | scripts/lib/root-cause.sh |
New function: rootcause_analyze_history() using events.jsonl |
| Enhanced failure comment | scripts/lib/daemon-failure.sh |
Include root cause + fix suggestions in GitHub comment |
| Dashboard API | dashboard/server.ts |
GET /api/root-cause/breakdown endpoint |
| Dashboard frontend | dashboard/src/views/insights.ts |
Failure breakdown visualization |
| Dashboard types | dashboard/src/types/api.ts |
RootCauseBreakdown interface |
| Dashboard API wrapper | dashboard/src/core/api.ts |
fetchRootCauseBreakdown() function |
-
scripts/lib/root-cause.sh— Addrootcause_analyze_history()for events.jsonl historical learning, enhancerootcause_classify()to incorporate historical confidence boosting -
scripts/lib/daemon-failure.sh— Wirerootcause_main()intodaemon_on_failure(), enhance failure comments with classification -
scripts/sw— Addroot-causecommand dispatch to CLI router -
dashboard/server.ts— AddGET /api/root-cause/breakdownendpoint -
dashboard/src/views/insights.ts— Add failure breakdown by category visualization -
dashboard/src/types/api.ts— AddRootCauseBreakdownTypeScript interface -
dashboard/src/core/api.ts— AddfetchRootCauseBreakdown()API wrapper
-
scripts/sw-root-cause.sh— CLI entry point forshipwright root-causecommand (classify, analyze, report, history subcommands)
File: scripts/lib/root-cause.sh
Add rootcause_analyze_history() function after rootcause_analyze_error_log() (after line 159). This function:
- Reads last 200 entries from
~/.shipwright/events.jsonlwhere type matchesdaemon.failure_classifiedormemory.failure - Groups by failure class/category
- Computes frequency distribution and recency weighting
- Returns JSON with historical patterns and confidence adjustments
Enhance rootcause_classify() to call historical analysis when available:
- After regex classification, check if
~/.shipwright/optimization/root-causes.jsonlhas matching patterns - If a pattern has been seen 3+ times with the same category, boost confidence by 5%
- If a pattern was previously classified differently, flag as "disputed" in evidence
rootcause_analyze_history() {
local events_file="${HOME}/.shipwright/events.jsonl"
local learn_file="${HOME}/.shipwright/optimization/root-causes.jsonl"
# Analyze learned classifications
[[ ! -f "$learn_file" ]] && { echo '{"total":0,"categories":{},"trends":{}}'; return 0; }
# Category distribution from historical data
local dist
dist=$(tail -200 "$learn_file" 2>/dev/null | jq -s '
group_by(.category) |
map({key: .[0].category, value: length}) |
from_entries
' 2>/dev/null || echo '{}')
# Recent trend (last 24h vs last 7d)
local recent_counts
recent_counts=$(tail -200 "$learn_file" 2>/dev/null | jq -s --arg cutoff_1d "..." --arg cutoff_7d "..." '
{
last_24h: [.[] | select(.recorded_at > $cutoff_1d)] | length,
last_7d: [.[] | select(.recorded_at > $cutoff_7d)] | length,
platform_bugs_24h: [.[] | select(.recorded_at > $cutoff_1d and .category == "platform_bug")] | length,
platform_bugs_7d: [.[] | select(.recorded_at > $cutoff_7d and .category == "platform_bug")] | length
}
' 2>/dev/null || echo '{}')
local total
total=$(wc -l < "$learn_file" 2>/dev/null | tr -d ' ' || echo "0")
jq -n --arg total "$total" --argjson categories "$dist" --argjson trends "$recent_counts" \
'{total: ($total | tonumber), categories: $categories, trends: $trends}'
}Also add rootcause_boost_from_history() — a helper that checks learned patterns against the current error message to adjust confidence:
rootcause_boost_from_history() {
local error_msg="${1:-}"
local current_category="${2:-}"
local learn_file="${HOME}/.shipwright/optimization/root-causes.jsonl"
[[ ! -f "$learn_file" ]] && { echo "0"; return 0; }
# Check how many times similar errors mapped to this category
local error_sig
error_sig=$(echo "$error_msg" | head -c 100 | cksum | awk '{print $1}')
local matching
matching=$(grep -c "$(echo "$error_msg" | head -c 50 | sed 's/[^a-zA-Z0-9 ]//g' | head -c 30)" "$learn_file" 2>/dev/null || echo "0")
# Boost: +5 if seen 3+ times, +10 if seen 10+ times
if [[ "$matching" -ge 10 ]]; then
echo "10"
elif [[ "$matching" -ge 3 ]]; then
echo "5"
else
echo "0"
fi
}File: scripts/lib/daemon-failure.sh
Integration point: After line 198 (record_failure_class "$failure_class") and before line 201 (retry escalation).
Source root-cause.sh at the top of daemon-failure.sh (after module guard):
# Root cause classifier (optional — degrades gracefully)
[[ -f "$SCRIPT_DIR/lib/root-cause.sh" ]] && source "$SCRIPT_DIR/lib/root-cause.sh" 2>/dev/null || trueAdd root cause classification block after record_failure_class:
# ── Root cause classification (Issue #184) ──
local root_cause_result=""
local root_cause_category="unknown"
local root_cause_confidence=0
local root_cause_fix=""
if type rootcause_main >/dev/null 2>&1; then
local error_tail=""
local log_path="$LOG_DIR/issue-${issue_num}.log"
[[ -f "$log_path" ]] && error_tail=$(tail -200 "$log_path" 2>/dev/null || true)
if [[ -n "$error_tail" ]]; then
root_cause_result=$(rootcause_main "$error_tail" "$failure_class" "$exit_code" 2>/dev/null || echo "")
if [[ -n "$root_cause_result" ]]; then
root_cause_category=$(echo "$root_cause_result" | jq -r '.classification.category // "unknown"' 2>/dev/null || echo "unknown")
root_cause_confidence=$(echo "$root_cause_result" | jq -r '.classification.confidence // 0' 2>/dev/null || echo "0")
root_cause_fix=$(echo "$root_cause_result" | jq -r '.fix.suggestions // ""' 2>/dev/null || echo "")
daemon_log INFO "Root cause: ${root_cause_category} (${root_cause_confidence}% confidence)"
emit_event "daemon.root_cause_classified" \
"issue=$issue_num" \
"category=$root_cause_category" \
"confidence=$root_cause_confidence" \
"daemon_class=$failure_class"
fi
fi
fiEnhance the retry comment (around line 289-301) to include root cause: Add after the existing retry table:
${root_cause_category:+
**Root Cause:** \`${root_cause_category}\` (${root_cause_confidence}% confidence)
${root_cause_fix:+**Suggested Fix:** ${root_cause_fix}}}
Enhance the final failure comment (around line 371-391) to include root cause classification: Add a new row to the table and a section:
| Root Cause | \`${root_cause_category}\` (${root_cause_confidence}% confidence) |
And after the log details block:
${root_cause_fix:+
### 🔍 Root Cause Analysis
**Category:** \`${root_cause_category}\`
**Confidence:** ${root_cause_confidence}%
**Suggestions:** ${root_cause_fix}
}
File: scripts/sw-root-cause.sh (NEW)
Standard Shipwright script structure with subcommands:
-
classify <error_message> [--stage <stage>]— Classify a single error -
analyze— Analyze error-log.jsonl for patterns -
report— Generate root cause analytics report -
history— Show historical pattern analysis from events.jsonl -
help— Usage info
#!/usr/bin/env bash
# ╔═══════════════════════════════════════════════════════════════════════════╗
# ║ shipwright root-cause — Failure Root Cause Classification & Analytics ║
# ╚═══════════════════════════════════════════════════════════════════════════╝
set -euo pipefail
VERSION="3.2.4"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/lib/helpers.sh" 2>/dev/null || true
source "$SCRIPT_DIR/lib/root-cause.sh"
case "${1:-help}" in
classify)
shift
local error_msg="${1:-}"
local stage="${2:-unknown}"
rootcause_main "$error_msg" "$stage" "1"
;;
analyze)
rootcause_analyze_error_log
;;
report)
rootcause_report
;;
history)
rootcause_analyze_history
;;
help|--help|-h)
# show_help
;;
esacFile: scripts/sw
Add to the command dispatch case statement (alphabetically near other r commands):
root-cause) exec "$SCRIPT_DIR/sw-root-cause.sh" "$@" ;;File: dashboard/server.ts
Add new endpoint after the existing /api/memory/patterns endpoint (around line 3808):
// Root cause failure breakdown
app.get("/api/root-cause/breakdown", async (req) => {
const url = new URL(req.url);
const days = parseInt(url.searchParams.get("days") || "30");
// Read from root-causes.jsonl (learning system output)
const rcFile = path.join(os.homedir(), ".shipwright/optimization/root-causes.jsonl");
let classifications: Array<{category: string; confidence: number; message: string; recorded_at: string}> = [];
try {
const content = await Bun.file(rcFile).text();
const cutoff = new Date(Date.now() - days * 86400000).toISOString();
classifications = content.trim().split("\n")
.filter(Boolean)
.map(line => { try { return JSON.parse(line); } catch { return null; } })
.filter((e): e is NonNullable<typeof e> => e !== null && e.recorded_at > cutoff);
} catch { /* no data yet */ }
// Aggregate by category
const byCategory: Record<string, number> = {};
const byDay: Record<string, Record<string, number>> = {};
for (const c of classifications) {
byCategory[c.category] = (byCategory[c.category] || 0) + 1;
const day = c.recorded_at?.substring(0, 10) || "unknown";
if (!byDay[day]) byDay[day] = {};
byDay[day][c.category] = (byDay[day][c.category] || 0) + 1;
}
// Platform bug trend
const now = Date.now();
const platformBugs24h = classifications.filter(c =>
c.category === "platform_bug" &&
new Date(c.recorded_at).getTime() > now - 86400000
).length;
const platformBugs7d = classifications.filter(c =>
c.category === "platform_bug" &&
new Date(c.recorded_at).getTime() > now - 7 * 86400000
).length;
return Response.json({
total: classifications.length,
breakdown: byCategory,
daily: byDay,
trends: {
platform_bugs_24h: platformBugs24h,
platform_bugs_7d: platformBugs7d,
trend: platformBugs7d > 0
? (platformBugs24h * 7 > platformBugs7d ? "increasing" : "stable_or_decreasing")
: "no_data"
},
top_errors: classifications
.slice(-20)
.reverse()
.map(c => ({ category: c.category, confidence: c.confidence, message: c.message?.substring(0, 100) }))
});
});File: dashboard/src/types/api.ts
Add interface:
export interface RootCauseBreakdown {
total: number;
breakdown: Record<string, number>;
daily: Record<string, Record<string, number>>;
trends: {
platform_bugs_24h: number;
platform_bugs_7d: number;
trend: "increasing" | "stable_or_decreasing" | "no_data";
};
top_errors: Array<{
category: string;
confidence: number;
message: string;
}>;
}File: dashboard/src/core/api.ts
Add function:
export const fetchRootCauseBreakdown = (days = 30) =>
request<RootCauseBreakdown>(`/api/root-cause/breakdown?days=${days}`);File: dashboard/src/views/insights.ts
Add to the parallel API calls in the Insights tab:
- Add
fetchRootCauseBreakdown()to the parallel fetch calls - Add a "Root Cause Breakdown" card to the Insights view
The card renders:
- Bar chart (CSS-only, no external deps) showing category distribution
- Platform bug trend indicator (increasing/stable)
- Top 5 recent errors with category badges
- Color-coded by category (platform_bug=rose, code_bug=amber, infra_issue=cyan, etc.)
File: scripts/sw-root-cause-test.sh
Add tests for the new functions:
-
test_analyze_history_empty— handles missing history file -
test_analyze_history_with_data— returns correct distribution -
test_boost_from_history— returns correct confidence boost -
test_cli_classify— standalone CLI classify works -
test_cli_report— standalone CLI report works
- Task 1: Add
rootcause_analyze_history()androotcause_boost_from_history()toscripts/lib/root-cause.sh - Task 2: Enhance
rootcause_classify()to incorporate historical confidence boosting - Task 3: Wire root cause classifier into
daemon_on_failure()inscripts/lib/daemon-failure.sh - Task 4: Enhance daemon failure/retry GitHub comments with root cause classification
- Task 5: Create
scripts/sw-root-cause.shCLI entry point - Task 6: Add
root-causedispatch toscripts/swCLI router - Task 7: Add
GET /api/root-cause/breakdownendpoint todashboard/server.ts - Task 8: Add
RootCauseBreakdownTypeScript interface todashboard/src/types/api.ts - Task 9: Add
fetchRootCauseBreakdown()todashboard/src/core/api.ts - Task 10: Add failure breakdown visualization to
dashboard/src/views/insights.ts - Task 11: Add tests for new functions in
scripts/sw-root-cause-test.sh - Task 12: Run
sw-root-cause-test.shandsw-lib-daemon-failure-test.shto verify
Task 1 → Task 2 (history functions needed before classify enhancement)
Task 1 → Task 3 (library must be complete before daemon wiring)
Task 3 → Task 4 (daemon integration before comment enhancement)
Task 5 depends on Task 1 (CLI wraps library functions)
Task 6 depends on Task 5 (router needs entry point)
Task 8 → Task 9 → Task 10 (types → API → UI)
Task 7 is independent (server-side endpoint)
Task 11 depends on Tasks 1-2 (tests for new functions)
Task 12 depends on all other tasks
- Run existing
sw-root-cause-test.sh— all 25+ tests must pass - Add new tests for
rootcause_analyze_history()androotcause_boost_from_history() - Run
sw-lib-daemon-failure-test.sh— existing tests must still pass - Test daemon integration by verifying
rootcause_mainis called (mock via function override)
- Verify CLI
shipwright root-cause classify "rate limit 429"returns correct JSON - Verify
shipwright root-cause reportproduces formatted output - Verify dashboard endpoint returns valid JSON structure
# Core classifier tests
./scripts/sw-root-cause-test.sh
# Daemon failure handling tests
./scripts/sw-lib-daemon-failure-test.sh
# Dashboard API tests (if server running)
./scripts/sw-server-api-test.sh-
scripts/lib/root-cause.shhasrootcause_analyze_history()androotcause_boost_from_history() -
rootcause_classify()incorporates historical confidence boosting -
daemon_on_failure()callsrootcause_main()on every failure - Daemon retry comments include root cause category + confidence + suggestions
- Daemon final failure comments include root cause analysis section
- Platform bugs with >70% confidence auto-create GitHub issues (already in library)
-
shipwright root-causeCLI command works with classify/analyze/report/history subcommands - Dashboard API returns failure breakdown by category
- Dashboard Insights tab shows failure breakdown visualization
- All existing tests pass (
sw-root-cause-test.sh,sw-lib-daemon-failure-test.sh) - New tests cover historical analysis and confidence boosting
- Events emitted:
daemon.root_cause_classifiedwith category, confidence, daemon_class
Query parameters:
-
days(optional, default: 30) — Number of days of history to include
Response (200 OK):
{
"total": 47,
"breakdown": {
"code_bug": 22,
"platform_bug": 8,
"infra_issue": 7,
"rate_limit": 5,
"context_exhaustion": 3,
"config_error": 1,
"external_dep": 1
},
"daily": {
"2026-03-09": {"code_bug": 3, "platform_bug": 1},
"2026-03-08": {"code_bug": 2, "infra_issue": 1}
},
"trends": {
"platform_bugs_24h": 1,
"platform_bugs_7d": 5,
"trend": "stable_or_decreasing"
},
"top_errors": [
{"category": "code_bug", "confidence": 85, "message": "AssertionError: expected 'foo'..."}
]
}Error responses:
- 500:
{"error": {"code": "INTERNAL_ERROR", "message": "Failed to read root cause data"}}
Not applicable — internal dashboard endpoint, not public API.
No versioning needed — internal API following existing dashboard patterns.
- Most likely: Previous plan stage produced empty plan.md (context exhaustion or timeout) — confirmed by reading the file (empty). The library and tests already exist from the WIP commit, so the plan stage just needs to produce the plan document, not recreate the implementation.
- Possible: Previous attempt tried to re-implement everything from scratch instead of recognizing existing code — mitigated by this plan explicitly building on existing work.
- Unlikely: Fundamental architectural issue — the feature is straightforward integration work.
Evidence gathered:
-
plan.mdwas empty (line 1 only) -
lib/root-cause.sh(427 lines) and test suite (374 lines) exist and are complete -
sw-pipeline.sh:58already sources the library -
daemon-failure.shhas no rootcause references — the integration gap
Fix strategy: This plan documents the integration work needed. It does NOT re-implement the library — it builds on the existing 427-line implementation.
Verification plan: Run sw-root-cause-test.sh after each change; run sw-lib-daemon-failure-test.sh after daemon integration.