Pipeline Plan 184

Implementation Plan: Failure Root Cause Classifier with Automated Platform Issue Creation

Issue: #184 Branch: feat/-failure-root-cause-classifier-with-auto-184 Complexity: Standard Estimated files: 7 modified, 1 new

Brainstorming & Design Decisions

Requirements Clarity

Minimum viable change: Wire the existing lib/root-cause.sh classifier (427 lines, already on this branch) into the daemon's failure handling path, add historical pattern learning from events.jsonl, create a dashboard breakdown visualization, and expose a CLI command.

Implicit requirements not stated:

The classifier library already exists but isn't called from the daemon — the critical integration is missing
Historical learning needs to feed back into classification confidence (not just regex)
Dashboard needs both an API endpoint and frontend component
CLI command needed for standalone shipwright root-cause usage

Acceptance criteria (from issue):

Failure classifier analyzes error-log.jsonl and categorizes root cause — LIBRARY EXISTS
Decision tree trained on historical failure patterns from events.jsonl — NEEDS IMPLEMENTATION
Platform bugs trigger automatic GitHub issue creation — LIBRARY EXISTS, NEEDS DAEMON WIRING
Dashboard shows failure breakdown by category — NEEDS IMPLEMENTATION
Reduce repeat platform failures by >30% — MEASURABLE VIA LEARNING SYSTEM

Alternatives Considered

Approach A: Enhance existing library + add integration points (CHOSEN)

Pros: Minimal blast radius (7 files), builds on 427-line library with 25+ tests, test suite already passes
Cons: Regex-based classification has limits vs ML-based approach
Blast radius: 7 files modified, 1 new
Complexity: Low-medium

Approach B: ML-based classifier using Claude API calls

Pros: More sophisticated, context-aware classification
Cons: Over-engineered for bash, adds API cost per failure, adds latency to failure handling path, fragile in offline/local mode
Blast radius: 15+ files
Complexity: High

Approach C: Build entirely new classification system

Pros: Clean design from scratch
Cons: Discards 427 lines of working code + 374 lines of tests, massive waste
Blast radius: 20+ files
Complexity: Very high

Decision: Approach A — The library is feature-complete. The gap is purely integration: wiring it into the daemon, adding historical learning, and surfacing data in the dashboard.

Risk Analysis

Risk	Impact	Likelihood	Mitigation
Daemon integration breaks failure handling	High	Low	Wrap all rootcause calls in `
GitHub issue spam from auto-creation	Medium	Low	Already gated: confidence >70% + dedup via cksum signature
events.jsonl too large for analysis	Low	Medium	Read only last 200 entries, use `tail` not `cat`
Dashboard endpoint performance	Low	Low	Aggregate at read time, cache results
Root cause misclassification	Medium	Medium	Historical learning improves over time, unknown defaults to code_bug

Current State Assessment

What EXISTS (from WIP commit `6d70188`):

File	Lines	Status
`scripts/lib/root-cause.sh`	427	Complete — 7 categories, classify/analyze/create_issue/suggest_fix/learn/report/main
`scripts/sw-root-cause-test.sh`	374	Complete — 25+ tests covering all functions
`scripts/sw-pipeline.sh:58`	1 line	Sources root-cause.sh
`package.json`	1 line	Test suite registered

What's MISSING:

Component	File	Description
CLI entry point	`scripts/sw-root-cause.sh` (NEW)	Standalone `shipwright root-cause` command
CLI router	`scripts/sw`	Add `root-cause` dispatch
Daemon integration	`scripts/lib/daemon-failure.sh`	Wire `rootcause_main()` into `daemon_on_failure()`
Historical learning	`scripts/lib/root-cause.sh`	New function: `rootcause_analyze_history()` using events.jsonl
Enhanced failure comment	`scripts/lib/daemon-failure.sh`	Include root cause + fix suggestions in GitHub comment
Dashboard API	`dashboard/server.ts`	`GET /api/root-cause/breakdown` endpoint
Dashboard frontend	`dashboard/src/views/insights.ts`	Failure breakdown visualization
Dashboard types	`dashboard/src/types/api.ts`	`RootCauseBreakdown` interface
Dashboard API wrapper	`dashboard/src/core/api.ts`	`fetchRootCauseBreakdown()` function

Files to Modify

Modified Files (7):

scripts/lib/root-cause.sh — Add rootcause_analyze_history() for events.jsonl historical learning, enhance rootcause_classify() to incorporate historical confidence boosting
scripts/lib/daemon-failure.sh — Wire rootcause_main() into daemon_on_failure(), enhance failure comments with classification
scripts/sw — Add root-cause command dispatch to CLI router
dashboard/server.ts — Add GET /api/root-cause/breakdown endpoint
dashboard/src/views/insights.ts — Add failure breakdown by category visualization
dashboard/src/types/api.ts — Add RootCauseBreakdown TypeScript interface
dashboard/src/core/api.ts — Add fetchRootCauseBreakdown() API wrapper

New Files (1):

scripts/sw-root-cause.sh — CLI entry point for shipwright root-cause command (classify, analyze, report, history subcommands)

Implementation Steps

Step 1: Add Historical Pattern Analysis to root-cause.sh

File: scripts/lib/root-cause.sh

Add rootcause_analyze_history() function after rootcause_analyze_error_log() (after line 159). This function:

Reads last 200 entries from ~/.shipwright/events.jsonl where type matches daemon.failure_classified or memory.failure
Groups by failure class/category
Computes frequency distribution and recency weighting
Returns JSON with historical patterns and confidence adjustments

Enhance rootcause_classify() to call historical analysis when available:

After regex classification, check if ~/.shipwright/optimization/root-causes.jsonl has matching patterns
If a pattern has been seen 3+ times with the same category, boost confidence by 5%
If a pattern was previously classified differently, flag as "disputed" in evidence

rootcause_analyze_history() {
    local events_file="${HOME}/.shipwright/events.jsonl"
    local learn_file="${HOME}/.shipwright/optimization/root-causes.jsonl"

    # Analyze learned classifications
    [[ ! -f "$learn_file" ]] && { echo '{"total":0,"categories":{},"trends":{}}'; return 0; }

    # Category distribution from historical data
    local dist
    dist=$(tail -200 "$learn_file" 2>/dev/null | jq -s '
        group_by(.category) |
        map({key: .[0].category, value: length}) |
        from_entries
    ' 2>/dev/null || echo '{}')

    # Recent trend (last 24h vs last 7d)
    local recent_counts
    recent_counts=$(tail -200 "$learn_file" 2>/dev/null | jq -s --arg cutoff_1d "..." --arg cutoff_7d "..." '
        {
            last_24h: [.[] | select(.recorded_at > $cutoff_1d)] | length,
            last_7d: [.[] | select(.recorded_at > $cutoff_7d)] | length,
            platform_bugs_24h: [.[] | select(.recorded_at > $cutoff_1d and .category == "platform_bug")] | length,
            platform_bugs_7d: [.[] | select(.recorded_at > $cutoff_7d and .category == "platform_bug")] | length
        }
    ' 2>/dev/null || echo '{}')

    local total
    total=$(wc -l < "$learn_file" 2>/dev/null | tr -d ' ' || echo "0")

    jq -n --arg total "$total" --argjson categories "$dist" --argjson trends "$recent_counts" \
        '{total: ($total | tonumber), categories: $categories, trends: $trends}'
}

Also add rootcause_boost_from_history() — a helper that checks learned patterns against the current error message to adjust confidence:

rootcause_boost_from_history() {
    local error_msg="${1:-}"
    local current_category="${2:-}"
    local learn_file="${HOME}/.shipwright/optimization/root-causes.jsonl"

    [[ ! -f "$learn_file" ]] && { echo "0"; return 0; }

    # Check how many times similar errors mapped to this category
    local error_sig
    error_sig=$(echo "$error_msg" | head -c 100 | cksum | awk '{print $1}')

    local matching
    matching=$(grep -c "$(echo "$error_msg" | head -c 50 | sed 's/[^a-zA-Z0-9 ]//g' | head -c 30)" "$learn_file" 2>/dev/null || echo "0")

    # Boost: +5 if seen 3+ times, +10 if seen 10+ times
    if [[ "$matching" -ge 10 ]]; then
        echo "10"
    elif [[ "$matching" -ge 3 ]]; then
        echo "5"
    else
        echo "0"
    fi
}

Step 2: Wire Root Cause into Daemon Failure Handler

File: scripts/lib/daemon-failure.sh

Integration point: After line 198 (record_failure_class "$failure_class") and before line 201 (retry escalation).

Source root-cause.sh at the top of daemon-failure.sh (after module guard):

# Root cause classifier (optional — degrades gracefully)
[[ -f "$SCRIPT_DIR/lib/root-cause.sh" ]] && source "$SCRIPT_DIR/lib/root-cause.sh" 2>/dev/null || true

Add root cause classification block after record_failure_class:

    # ── Root cause classification (Issue #184) ──
    local root_cause_result=""
    local root_cause_category="unknown"
    local root_cause_confidence=0
    local root_cause_fix=""
    if type rootcause_main >/dev/null 2>&1; then
        local error_tail=""
        local log_path="$LOG_DIR/issue-${issue_num}.log"
        [[ -f "$log_path" ]] && error_tail=$(tail -200 "$log_path" 2>/dev/null || true)

        if [[ -n "$error_tail" ]]; then
            root_cause_result=$(rootcause_main "$error_tail" "$failure_class" "$exit_code" 2>/dev/null || echo "")
            if [[ -n "$root_cause_result" ]]; then
                root_cause_category=$(echo "$root_cause_result" | jq -r '.classification.category // "unknown"' 2>/dev/null || echo "unknown")
                root_cause_confidence=$(echo "$root_cause_result" | jq -r '.classification.confidence // 0' 2>/dev/null || echo "0")
                root_cause_fix=$(echo "$root_cause_result" | jq -r '.fix.suggestions // ""' 2>/dev/null || echo "")
                daemon_log INFO "Root cause: ${root_cause_category} (${root_cause_confidence}% confidence)"
                emit_event "daemon.root_cause_classified" \
                    "issue=$issue_num" \
                    "category=$root_cause_category" \
                    "confidence=$root_cause_confidence" \
                    "daemon_class=$failure_class"
            fi
        fi
    fi

Enhance the retry comment (around line 289-301) to include root cause: Add after the existing retry table:

${root_cause_category:+
**Root Cause:** \`${root_cause_category}\` (${root_cause_confidence}% confidence)
${root_cause_fix:+**Suggested Fix:** ${root_cause_fix}}}

Enhance the final failure comment (around line 371-391) to include root cause classification: Add a new row to the table and a section:

| Root Cause | \`${root_cause_category}\` (${root_cause_confidence}% confidence) |

And after the log details block:

${root_cause_fix:+
### 🔍 Root Cause Analysis

**Category:** \`${root_cause_category}\`
**Confidence:** ${root_cause_confidence}%
**Suggestions:** ${root_cause_fix}
}

Step 3: Create CLI Entry Point

File: scripts/sw-root-cause.sh (NEW)

Standard Shipwright script structure with subcommands:

classify <error_message> [--stage <stage>] — Classify a single error
analyze — Analyze error-log.jsonl for patterns
report — Generate root cause analytics report
history — Show historical pattern analysis from events.jsonl
help — Usage info

#!/usr/bin/env bash
# ╔═══════════════════════════════════════════════════════════════════════════╗
# ║  shipwright root-cause — Failure Root Cause Classification & Analytics   ║
# ╚═══════════════════════════════════════════════════════════════════════════╝
set -euo pipefail
VERSION="3.2.4"
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$SCRIPT_DIR/lib/helpers.sh" 2>/dev/null || true
source "$SCRIPT_DIR/lib/root-cause.sh"

case "${1:-help}" in
    classify)
        shift
        local error_msg="${1:-}"
        local stage="${2:-unknown}"
        rootcause_main "$error_msg" "$stage" "1"
        ;;
    analyze)
        rootcause_analyze_error_log
        ;;
    report)
        rootcause_report
        ;;
    history)
        rootcause_analyze_history
        ;;
    help|--help|-h)
        # show_help
        ;;
esac

Step 4: Add CLI Router Dispatch

File: scripts/sw

Add to the command dispatch case statement (alphabetically near other r commands):

root-cause)     exec "$SCRIPT_DIR/sw-root-cause.sh" "$@" ;;

Step 5: Dashboard API Endpoint

File: dashboard/server.ts

Add new endpoint after the existing /api/memory/patterns endpoint (around line 3808):

// Root cause failure breakdown
app.get("/api/root-cause/breakdown", async (req) => {
    const url = new URL(req.url);
    const days = parseInt(url.searchParams.get("days") || "30");

    // Read from root-causes.jsonl (learning system output)
    const rcFile = path.join(os.homedir(), ".shipwright/optimization/root-causes.jsonl");
    let classifications: Array<{category: string; confidence: number; message: string; recorded_at: string}> = [];

    try {
        const content = await Bun.file(rcFile).text();
        const cutoff = new Date(Date.now() - days * 86400000).toISOString();
        classifications = content.trim().split("\n")
            .filter(Boolean)
            .map(line => { try { return JSON.parse(line); } catch { return null; } })
            .filter((e): e is NonNullable<typeof e> => e !== null && e.recorded_at > cutoff);
    } catch { /* no data yet */ }

    // Aggregate by category
    const byCategory: Record<string, number> = {};
    const byDay: Record<string, Record<string, number>> = {};

    for (const c of classifications) {
        byCategory[c.category] = (byCategory[c.category] || 0) + 1;
        const day = c.recorded_at?.substring(0, 10) || "unknown";
        if (!byDay[day]) byDay[day] = {};
        byDay[day][c.category] = (byDay[day][c.category] || 0) + 1;
    }

    // Platform bug trend
    const now = Date.now();
    const platformBugs24h = classifications.filter(c =>
        c.category === "platform_bug" &&
        new Date(c.recorded_at).getTime() > now - 86400000
    ).length;
    const platformBugs7d = classifications.filter(c =>
        c.category === "platform_bug" &&
        new Date(c.recorded_at).getTime() > now - 7 * 86400000
    ).length;

    return Response.json({
        total: classifications.length,
        breakdown: byCategory,
        daily: byDay,
        trends: {
            platform_bugs_24h: platformBugs24h,
            platform_bugs_7d: platformBugs7d,
            trend: platformBugs7d > 0
                ? (platformBugs24h * 7 > platformBugs7d ? "increasing" : "stable_or_decreasing")
                : "no_data"
        },
        top_errors: classifications
            .slice(-20)
            .reverse()
            .map(c => ({ category: c.category, confidence: c.confidence, message: c.message?.substring(0, 100) }))
    });
});

Step 6: Dashboard Frontend — Types

File: dashboard/src/types/api.ts

Add interface:

export interface RootCauseBreakdown {
  total: number;
  breakdown: Record<string, number>;
  daily: Record<string, Record<string, number>>;
  trends: {
    platform_bugs_24h: number;
    platform_bugs_7d: number;
    trend: "increasing" | "stable_or_decreasing" | "no_data";
  };
  top_errors: Array<{
    category: string;
    confidence: number;
    message: string;
  }>;
}

Step 7: Dashboard Frontend — API Wrapper

File: dashboard/src/core/api.ts

Add function:

export const fetchRootCauseBreakdown = (days = 30) =>
  request<RootCauseBreakdown>(`/api/root-cause/breakdown?days=${days}`);

Step 8: Dashboard Frontend — Insights Visualization

File: dashboard/src/views/insights.ts

Add to the parallel API calls in the Insights tab:

Add fetchRootCauseBreakdown() to the parallel fetch calls
Add a "Root Cause Breakdown" card to the Insights view

The card renders:

Bar chart (CSS-only, no external deps) showing category distribution
Platform bug trend indicator (increasing/stable)
Top 5 recent errors with category badges
Color-coded by category (platform_bug=rose, code_bug=amber, infra_issue=cyan, etc.)

Step 9: Update Test Suite

File: scripts/sw-root-cause-test.sh

Add tests for the new functions:

test_analyze_history_empty — handles missing history file
test_analyze_history_with_data — returns correct distribution
test_boost_from_history — returns correct confidence boost
test_cli_classify — standalone CLI classify works
test_cli_report — standalone CLI report works

Task Checklist

Task Dependencies

Task 1 → Task 2 (history functions needed before classify enhancement)
Task 1 → Task 3 (library must be complete before daemon wiring)
Task 3 → Task 4 (daemon integration before comment enhancement)
Task 5 depends on Task 1 (CLI wraps library functions)
Task 6 depends on Task 5 (router needs entry point)
Task 8 → Task 9 → Task 10 (types → API → UI)
Task 7 is independent (server-side endpoint)
Task 11 depends on Tasks 1-2 (tests for new functions)
Task 12 depends on all other tasks

Testing Approach

Unit Tests (Task 11-12)

Run existing sw-root-cause-test.sh — all 25+ tests must pass
Add new tests for rootcause_analyze_history() and rootcause_boost_from_history()
Run sw-lib-daemon-failure-test.sh — existing tests must still pass
Test daemon integration by verifying rootcause_main is called (mock via function override)

Integration Tests

Verify CLI shipwright root-cause classify "rate limit 429" returns correct JSON
Verify shipwright root-cause report produces formatted output
Verify dashboard endpoint returns valid JSON structure

Targeted Test Commands

# Core classifier tests
./scripts/sw-root-cause-test.sh

# Daemon failure handling tests
./scripts/sw-lib-daemon-failure-test.sh

# Dashboard API tests (if server running)
./scripts/sw-server-api-test.sh

Definition of Done

Endpoint Specification (API Skill)

`GET /api/root-cause/breakdown`

Query parameters:

days (optional, default: 30) — Number of days of history to include

Response (200 OK):

{
  "total": 47,
  "breakdown": {
    "code_bug": 22,
    "platform_bug": 8,
    "infra_issue": 7,
    "rate_limit": 5,
    "context_exhaustion": 3,
    "config_error": 1,
    "external_dep": 1
  },
  "daily": {
    "2026-03-09": {"code_bug": 3, "platform_bug": 1},
    "2026-03-08": {"code_bug": 2, "infra_issue": 1}
  },
  "trends": {
    "platform_bugs_24h": 1,
    "platform_bugs_7d": 5,
    "trend": "stable_or_decreasing"
  },
  "top_errors": [
    {"category": "code_bug", "confidence": 85, "message": "AssertionError: expected 'foo'..."}
  ]
}

Error responses:

500: {"error": {"code": "INTERNAL_ERROR", "message": "Failed to read root cause data"}}

Rate Limiting

Not applicable — internal dashboard endpoint, not public API.

Versioning

No versioning needed — internal API following existing dashboard patterns.

Root Cause Hypothesis (Systematic Debugging — previous plan stage failure)

Most likely: Previous plan stage produced empty plan.md (context exhaustion or timeout) — confirmed by reading the file (empty). The library and tests already exist from the WIP commit, so the plan stage just needs to produce the plan document, not recreate the implementation.
Possible: Previous attempt tried to re-implement everything from scratch instead of recognizing existing code — mitigated by this plan explicitly building on existing work.
Unlikely: Fundamental architectural issue — the feature is straightforward integration work.

Evidence gathered:

plan.md was empty (line 1 only)
lib/root-cause.sh (427 lines) and test suite (374 lines) exist and are complete
sw-pipeline.sh:58 already sources the library
daemon-failure.sh has no rootcause references — the integration gap

Fix strategy: This plan documents the integration work needed. It does NOT re-implement the library — it builds on the existing 427-line implementation.

Verification plan: Run sw-root-cause-test.sh after each change; run sw-lib-daemon-failure-test.sh after daemon integration.

Pipeline Plan 184

Implementation Plan: Failure Root Cause Classifier with Automated Platform Issue Creation

Brainstorming & Design Decisions

Requirements Clarity

Alternatives Considered

Risk Analysis

Current State Assessment

What EXISTS (from WIP commit 6d70188):

What's MISSING:

Files to Modify

Modified Files (7):

New Files (1):

Implementation Steps

Step 1: Add Historical Pattern Analysis to root-cause.sh

Step 2: Wire Root Cause into Daemon Failure Handler

Step 3: Create CLI Entry Point

Step 4: Add CLI Router Dispatch

Step 5: Dashboard API Endpoint

Step 6: Dashboard Frontend — Types

Step 7: Dashboard Frontend — API Wrapper

Step 8: Dashboard Frontend — Insights Visualization

Step 9: Update Test Suite

Task Checklist

Task Dependencies

Testing Approach

Unit Tests (Task 11-12)

Integration Tests

Targeted Test Commands

Definition of Done

Endpoint Specification (API Skill)

GET /api/root-cause/breakdown

Rate Limiting

Versioning

Root Cause Hypothesis (Systematic Debugging — previous plan stage failure)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

What EXISTS (from WIP commit `6d70188`):

`GET /api/root-cause/breakdown`