Skip to content

Pipeline Plan 189

ezigus edited this page Mar 18, 2026 · 2 revisions

Plan written to .claude/pipeline-artifacts/plan.md.

The feature is already fully implemented in commit fded001. The plan documents the architecture of the existing implementation:

Architecture: Standalone library (scripts/lib/context-error.sh) with 8 public functions, integrated into 4 existing modules via explicit function calls.

Key components:

  • Error categorization (10 categories)
  • Memory-based similar failure lookup (5s timeout, graceful degradation)
  • Action generation (category + stage + memory-based)
  • 4-section formatted output (What Failed / Why / Similar Past Issues / Suggested Actions)
  • Suggestion feedback loop (record, resolve, emit events)
  • GitHub markdown formatting for issue comments

All 13 tasks complete, 52 test cases, all acceptance criteria met. wn) is needed alongside terminal output

  • Feedback loop: track which suggestions lead to resolution

Acceptance criteria (from issue):

  1. Error messages include: stage name, iteration count, and original issue/goal
  2. Query memory system for similar past failures and their resolutions
  3. Suggest 2-3 concrete next actions based on error type and history
  4. Include relevant log snippets with context (not just raw stack traces)
  5. Format error output with clear sections: What Failed / Why / Similar Past Issues / Suggested Actions
  6. Track whether suggested actions lead to successful resolution (feedback loop)

Design Alternatives

Approach A: Standalone library module (CHOSEN)

  • New scripts/lib/context-error.sh with pure functions
  • Integration via function calls from existing error paths (pipeline-state, daemon-failure, loop-iteration)
  • Tradeoffs: (+) minimal blast radius, (+) testable in isolation, (+) graceful degradation when memory unavailable, (-) requires integration touchpoints in 4 files

Approach B: Monkey-patch existing error() function

  • Override the global error() helper to always enrich output
  • Tradeoffs: (+) zero integration work, (-) massive blast radius (every error call affected), (-) hard to test, (-) would slow down all error output with memory queries

Approach C: Post-processing pipeline

  • Capture all stderr, process after stage completion
  • Tradeoffs: (+) no code changes to existing paths, (-) delayed feedback, (-) complex stderr capture, (-) loses real-time context

Decision: Approach A — standalone library with explicit integration points. Minimizes blast radius while hitting all acceptance criteria.

Risk Assessment

Risk Mitigation
Memory query timeout slows error reporting 5-second timeout guard on memory_ranked_search
Memory system not loaded/available Graceful fallback: query_similar_failures returns [] if memory_ranked_search not defined
Very long error messages blow up report tail -10 + ANSI strip on log snippets; cut -c1-120 on memory fix text
jq not available All jq calls use `2>/dev/null
Suggestion JSONL file corruption Atomic writes via tmp file + mv
Event schema changes New events registered in config/event-schema.json

Dependency Analysis

Depends on:

  • scripts/lib/helpers.shemit_event(), output helpers
  • sw-memory.shmemory_ranked_search(), repo_memory_dir() (optional, graceful degradation)
  • jq — JSON processing (with fallbacks)

Depended on by:

  • scripts/lib/pipeline-state.shmark_stage_failed() calls format_context_error_report + record_suggestion
  • scripts/lib/daemon-failure.shdaemon_on_failure() appends context-aware section to GitHub comments
  • scripts/lib/loop-iteration.shcompose_prompt() injects historical error context
  • scripts/lib/pipeline-stages-delivery.sh — pipeline completion resolves outstanding suggestions

No circular dependency risks — context-error.sh only depends on helpers.sh and memory (both upstream).


Files to Modify

New Files

  1. scripts/lib/context-error.sh — Core library (430 lines): categorization, memory query, action generation, report formatting, suggestion tracking
  2. scripts/sw-lib-context-error-test.sh — Test suite (52 test cases)

Modified Files

  1. scripts/lib/pipeline-state.sh — Integration: mark_stage_failed() generates context error report + records suggestion
  2. scripts/lib/daemon-failure.sh — Integration: daemon_on_failure() appends context-aware error section to GitHub comment
  3. scripts/lib/loop-iteration.sh — Integration: compose_prompt() injects historical error context from context-error functions
  4. scripts/lib/pipeline-stages-delivery.sh — Integration: pipeline completion resolves outstanding suggestions
  5. config/event-schema.json — Register new events: error.context_generated, suggestion.recorded, suggestion.resolved
  6. scripts/lib/helpers.sh — Minor: ensure emit_event is available for context-error module
  7. scripts/lib/test-helpers.sh — Test infrastructure updates for context-error tests

Implementation Steps

  1. Create scripts/lib/context-error.sh with:

    • categorize_error() — Maps error text to 10 categories (FILE_ACCESS, FUNCTION_ERROR, SYNTAX_ERROR, ASSERTION_FAILURE, TYPE_ERROR, TIMEOUT, MEMORY_ERROR, NETWORK_ERROR, RESOURCE_ERROR, UNKNOWN)
    • query_similar_failures() — Wraps memory_ranked_search with 5s timeout, graceful fallback to []
    • generate_suggested_actions() — Combines memory-based fixes + category defaults + stage-specific actions into 2-3 concrete steps
    • format_context_error_report() — 4-section terminal report (What Failed / Why / Similar Past Issues / Suggested Actions)
    • format_github_error_comment() — Markdown-formatted version for GitHub issue comments
    • record_suggestion() — Appends to suggestions.jsonl with ID, category, stage, actions; emits suggestion.recorded event
    • mark_suggestion_resolved() — Atomic update of suggestion resolved status; emits suggestion.resolved event
    • resolve_outstanding_suggestions() — Batch-resolve all unresolved on pipeline success
  2. Integrate into pipeline-state.sh — In mark_stage_failed(), source context-error.sh and call format_context_error_report() to save artifact + record_suggestion() to track

  3. Integrate into daemon-failure.sh — In daemon_on_failure(), if context-error functions are available, append formatted context section to GitHub issue comment

  4. Integrate into loop-iteration.sh — In compose_prompt(), inject historical error context from context-error functions into loop iteration prompts

  5. Integrate into pipeline-stages-delivery.sh — On pipeline completion, call resolve_outstanding_suggestions() to close the feedback loop

  6. Update config/event-schema.json — Register error.context_generated, suggestion.recorded, suggestion.resolved with required/optional fields

  7. Create test suite — 52 test cases covering categorization, formatting, memory query mocking, suggestion recording/resolution, GitHub comment formatting, edge cases


Task Checklist

  • Task 1: Create scripts/lib/context-error.sh with error categorization (10 categories)
  • Task 2: Implement query_similar_failures() with memory system integration and timeout guard
  • Task 3: Implement generate_suggested_actions() with category + stage + memory-based actions
  • Task 4: Implement format_context_error_report() — 4-section terminal output
  • Task 5: Implement format_github_error_comment() — Markdown-formatted for GitHub
  • Task 6: Implement suggestion tracking (record_suggestion, mark_suggestion_resolved, resolve_outstanding_suggestions)
  • Task 7: Register new events in config/event-schema.json
  • Task 8: Integrate into pipeline-state.sh mark_stage_failed()
  • Task 9: Integrate into daemon-failure.sh daemon_on_failure()
  • Task 10: Integrate into loop-iteration.sh compose_prompt()
  • Task 11: Integrate into pipeline completion (resolve suggestions on success)
  • Task 12: Create comprehensive test suite (sw-lib-context-error-test.sh, 52 cases)
  • Task 13: Update all modified test suites to pass with new integration points

Task Decomposition (with dependencies)

  1. Task 1 (context-error.sh core) — no dependencies
  2. Task 2 (memory query) — depends on Task 1
  3. Task 3 (action generation) — depends on Tasks 1, 2
  4. Task 4 (terminal formatting) — depends on Tasks 1-3
  5. Task 5 (GitHub formatting) — depends on Tasks 1-3
  6. Task 6 (suggestion tracking) — depends on Task 1
  7. Task 7 (event schema) — independent
  8. Task 8 (pipeline-state integration) — depends on Tasks 4, 6
  9. Task 9 (daemon-failure integration) — depends on Tasks 5, 6
  10. Task 10 (loop-iteration integration) — depends on Tasks 2, 3
  11. Task 11 (completion integration) — depends on Task 6
  12. Task 12 (test suite) — depends on Tasks 1-6
  13. Task 13 (existing test updates) — depends on Tasks 8-11

Testing Approach

Test file: scripts/sw-lib-context-error-test.sh (52 test cases)

Categories tested:

  • categorize_error() — All 10 error categories with representative error strings
  • query_similar_failures() — Mocked memory_ranked_search returning JSON; missing function fallback; empty memory fallback
  • generate_suggested_actions() — Category-specific actions; stage-specific actions; memory-based fix suggestions; combination testing
  • format_context_error_report() — Section presence; field population (stage, iteration, issue, goal); NO_COLOR support
  • format_github_error_comment() — Markdown structure; details/summary tags; field population
  • record_suggestion() — JSONL append; field validation; event emission
  • mark_suggestion_resolved() — Atomic update; correct ID matching; event emission
  • resolve_outstanding_suggestions() — Batch resolution; already-resolved preservation

Integration testing: Existing test suites for pipeline-state, daemon-failure, and loop-iteration updated to account for new integration points.

Run: npm test (executes all 150+ test suites including the new one)


Definition of Done

  • scripts/lib/context-error.sh exists with all 8 public functions
  • Error messages include stage name, iteration count, and original issue/goal
  • Memory system queried for similar past failures (with graceful degradation)
  • 2-3 concrete next actions generated based on error type + history + stage
  • Log snippets included with context (filtered, ANSI-stripped, last 10 lines)
  • Output formatted with 4 sections: What Failed / Why / Similar Past Issues / Suggested Actions
  • GitHub comment format with markdown, collapsible details, and structured sections
  • Suggestion feedback loop: record suggestions, mark resolved on success, emit events
  • Integration into pipeline-state.sh, daemon-failure.sh, loop-iteration.sh, pipeline-stages-delivery.sh
  • New events registered in config/event-schema.json
  • 52 test cases pass in sw-lib-context-error-test.sh
  • All existing test suites pass (npm test)

Alternatives Considered

Approach Complexity Performance Maintainability Blast Radius
A: Standalone library (chosen) Medium Good (5s timeout) High (isolated, testable) Low (4 integration points)
B: Override error() globally Low Poor (every error triggers query) Low (hard to test, side effects) Very high (all error paths)
C: Post-process stderr High Poor (delayed) Medium (complex capture) Medium (stderr interception)

Risk Analysis

Risk Impact Likelihood Mitigation
Memory query timeout Slow error reporting Low 5-second timeout guard; fallback to []
Memory system unavailable No similar failures shown Medium Graceful type check before calling; [] default
jq unavailable JSON processing fails Low All jq calls wrapped with `2>/dev/null
Suggestion file corruption Lost feedback data Low Atomic writes via tmp + mv
Large error messages Bloated reports Medium tail -10 limit; cut -c1-120 for memory text

Root Cause Hypothesis

Not applicable — No previous plan stage failure to diagnose.

Evidence Gathered

The implementation already exists in commit fded001 with all acceptance criteria met.

Fix Strategy

Not applicable — No failure to fix.

Verification Plan

  1. Run npm test to verify all 150+ test suites pass
  2. Run scripts/sw-lib-context-error-test.sh specifically to verify the 52 context-error test cases
  3. Verify integration points by checking that pipeline-state, daemon-failure, and loop-iteration properly call context-error functions

Endpoint Specification

Not applicable — Shell library module, not an API endpoint.

Error Codes

Not applicable — Shell functions use return codes (0 = success) with graceful degradation.

Rate Limiting

Not applicable — Internal library; memory queries have a 5-second timeout guard.

Versioning

Version: 3.2.4 (synced with all scripts via VERSION variable). No breaking changes — all new functions are additive.

Clone this wiki locally