Skip to content

Pipeline Design 189

ezigus edited this page Mar 18, 2026 · 2 revisions

ADR written to .claude/pipeline-artifacts/design.md.

Key architectural decisions documented:

  • Standalone library (context-error.sh) with 8 public functions — chosen over global error() override (blast radius too high) and stderr post-processing (can't inject into loop prompts mid-iteration)
  • 4 integration points via type guards — pipeline-state, daemon-failure, loop-iteration, sw-pipeline completion — all degrade gracefully when the library isn't loaded
  • 5-second timeout on memory queries prevents IO-bound lookups from stalling error output
  • Unidirectional dependency graph — callers → context-error → memory + helpers, no cycles
  • Atomic JSONL writes for suggestion tracking with tmp+mv pattern
  • All 15 validation criteria checked off against the existing implementation in commit fded001 prompts (loop-iteration), and pipeline completion (feedback loop)
  • All jq usage must tolerate jq being absent
  • Atomic file writes required (pipefail + concurrent workers)

Decision

Standalone library module (scripts/lib/context-error.sh) with 8 pure-ish functions, integrated into 4 existing modules via explicit type func_name >/dev/null 2>&1 guard checks. No global side effects — callers opt in by sourcing the library and calling functions directly.

Component Diagram

┌──────────────────────────────────────────────────────────────┐
│                     Callers (Integration Points)             │
│                                                              │
│  pipeline-state.sh    daemon-failure.sh    loop-iteration.sh │
│  mark_stage_failed()  daemon_on_failure()  compose_prompt()  │
│         │                    │                    │           │
│         ▼                    ▼                    ▼           │
│  format_context_     format_github_     query_similar_       │
│  error_report()      error_comment()    failures() +         │
│         │                    │           categorize_error() + │
│         │                    │           generate_actions()   │
│         └────────┬───────────┘                    │           │
│                  ▼                                │           │
│  ┌─────────────────────────────────┐              │           │
│  │     context-error.sh            │◄─────────────┘           │
│  │                                 │                          │
│  │  categorize_error()             │                          │
│  │  query_similar_failures()  ─────┼──► sw-memory.sh          │
│  │  generate_suggested_actions()   │    (memory_ranked_search)│
│  │  format_context_error_report()  │                          │
│  │  format_github_error_comment()  │                          │
│  │  record_suggestion()       ─────┼──► suggestions.jsonl     │
│  │  mark_suggestion_resolved()     │                          │
│  │  resolve_outstanding_suggestions│                          │
│  └─────────────────────────────────┘                          │
│                  │                                            │
│                  ▼                                            │
│  sw-pipeline.sh (completion) ── resolve_outstanding_          │
│                                  suggestions()               │
│                                                              │
│  helpers.sh ── emit_event() ──► events.jsonl                 │
└──────────────────────────────────────────────────────────────┘

Dependencies flow one direction: callers → context-error → memory + helpers. No circular dependencies.

Interface Contracts

// Error categorization — maps error text to one of 10 known categories
categorize_error(error_msg: string): ErrorCategory
// Returns: "FILE_ACCESS" | "FUNCTION_ERROR" | "SYNTAX_ERROR" | "ASSERTION_FAILURE"
//        | "TYPE_ERROR" | "TIMEOUT" | "MEMORY_ERROR" | "NETWORK_ERROR"
//        | "RESOURCE_ERROR" | "UNKNOWN"
// Errors: never fails (falls through to "UNKNOWN")

// Memory query with timeout guard
query_similar_failures(error_msg: string, max_results?: number = 3): JSONArray
// Returns: JSON array of similar past failures, or "[]"
// Errors: returns "[]" if memory_ranked_search unavailable, dir missing, or timeout
// Timeout: 5 seconds hard limit via `timeout` command

// Action generation — combines memory fixes + category defaults + stage context
generate_suggested_actions(
  category: ErrorCategory,
  failure_class: string,  // unused, reserved for future error-actionability bridge
  stage_id: string,
  similar_json: JSONArray
): string  // newline-separated action list (2-4 items)
// Errors: never fails (always produces at least 2 default actions)

// Terminal-formatted 4-section report
format_context_error_report(
  error_msg: string,
  stage_id: string,
  iteration: number,
  goal: string,
  issue_number: string
): string  // multi-line formatted report
// Errors: returns partial report on internal failures (each section independent)
// Respects NO_COLOR env var for plain-text output

// GitHub markdown-formatted report
format_github_error_comment(
  error_msg: string,
  stage_id: string,
  iteration: number,
  goal: string,
  issue_number: string
): string  // markdown with collapsible <details> for error logs
// Errors: same graceful degradation as terminal format

// Suggestion tracking — append to JSONL
record_suggestion(
  suggestion_id: string,
  category: ErrorCategory,
  stage_id: string,
  actions_json: JSONArray
): void
// Side effects: appends to $ARTIFACTS_DIR/suggestions.jsonl (atomic via tmp+mv)
//               emits suggestion.recorded event

// Individual resolution
mark_suggestion_resolved(suggestion_id: string, resolved?: string = "true"): void
// Side effects: atomically rewrites suggestions.jsonl, emits suggestion.resolved event
// Errors: no-op if file missing or ID not found

// Batch resolution on pipeline success
resolve_outstanding_suggestions(): void
// Side effects: marks all unresolved suggestions as resolved (atomic rewrite)
// Errors: no-op if file missing

Data Flow

Error path (pipeline stage fails):

Stage fails
  → pipeline-state.sh:mark_stage_failed()
    → reads last 5 lines from stage log
    → format_context_error_report(log_tail, stage, iteration, goal, issue)
      → categorize_error(log_tail) → category
      → query_similar_failures(log_tail, 3) → [similar matches] (5s timeout)
      → generate_suggested_actions(category, "", stage, similar) → actions
      → assemble 4-section report
    → save_artifact("context-error-{stage}.md", report)
    → record_suggestion(id, category, stage, actions_json)
    → emit_event("error.context_generated")

Daemon failure path (GitHub comment):

Daemon worker fails
  → daemon-failure.sh:daemon_on_failure()
    → format_github_error_comment(log_tail, "pipeline", 0, goal, issue)
      → same internal flow as above, markdown output
    → appended to GitHub issue comment body

Loop iteration path (prompt injection):

Build loop iteration with prior errors
  → loop-iteration.sh:compose_prompt()
    → query_similar_failures(error_lines, 3)
    → categorize_error(error_lines)
    → generate_suggested_actions(category, "", "build", similar)
    → inject "Historical Context" section into Claude prompt

Feedback loop (pipeline success):

Pipeline completes successfully
  → sw-pipeline.sh (completion block)
    → resolve_outstanding_suggestions()
      → rewrites suggestions.jsonl, setting resolved="true" on all null entries

Error Boundaries

Component Error Source Handling
query_similar_failures memory_ranked_search not loaded type check → return []
query_similar_failures Memory dir doesn't exist Dir check → return []
query_similar_failures Query takes too long timeout 5 → return []
generate_suggested_actions jq parse failure on similar_json 2>/dev/null fallback → skip memory-based fix, use category defaults
record_suggestion jq unavailable 2>/dev/null on jq call; file may not be written
record_suggestion ARTIFACTS_DIR not writable mkdir -p with fallback; silent failure
mark_suggestion_resolved Concurrent writers Atomic tmp + mv; last writer wins (acceptable for tracking data)
All callers context-error.sh not sourced type func_name >/dev/null 2>&1 guard before every call
All callers Any function throws 2>/dev/null with fallback wrapping at call sites

Every integration point is guarded with type ... >/dev/null 2>&1 so the system operates identically when context-error.sh is not loaded. No caller can fail due to this module being absent or broken.

Alternatives Considered

  1. Override global error() function — Pros: zero integration work, every error path automatically enriched. Cons: massive blast radius (all error calls would trigger memory queries), 5-second timeout penalty on every error, untestable (global state mutation), would break simple error output in non-pipeline contexts. Rejected because the cost model is inverted: most errors don't need memory context, but all would pay the latency price.

  2. Post-process stderr after stage completion — Pros: no source code changes to existing modules. Cons: requires complex stderr capture (tee + temp files), delays error feedback until after the stage completes (defeats the purpose of real-time context), loses the ability to inject context into loop prompts mid-iteration. Rejected because the loop-iteration use case requires error context during execution, not after.

Implementation Plan

Files created

  • scripts/lib/context-error.sh — 431-line core library (8 public functions, include guard, version-stamped)
  • scripts/sw-lib-context-error-test.sh — 52-case test suite covering all functions, edge cases, and event emission

Files modified

  • scripts/lib/pipeline-state.shmark_stage_failed() calls format_context_error_report + record_suggestion (lines ~440-464)
  • scripts/lib/daemon-failure.shdaemon_on_failure() calls format_github_error_comment (lines ~371-376)
  • scripts/lib/loop-iteration.shcompose_prompt() injects historical error context via query_similar_failures + categorize_error + generate_suggested_actions (lines ~121-132)
  • scripts/sw-pipeline.sh — Pipeline completion block calls resolve_outstanding_suggestions (line ~2746)
  • config/event-schema.json — Registers 3 new events: error.context_generated, suggestion.recorded, suggestion.resolved

Dependencies

  • No new external dependencies
  • Runtime dependency on jq (with fallbacks for absence)
  • Optional dependency on memory_ranked_search from sw-memory.sh (graceful degradation)

Risk areas

  • Memory query latency: Mitigated by 5-second timeout command wrapper. If the memory corpus is very large, queries could approach this limit. Monitor via suggestion.recorded event timestamps.
  • suggestions.jsonl concurrent access: mark_suggestion_resolved and resolve_outstanding_suggestions both do full-file rewrite via tmp+mv. In theory, two concurrent pipelines could race. Acceptable for tracking/analytics data — not a correctness concern.
  • Error category accuracy: Regex-based categorization (categorize_error) is necessarily heuristic. The ordered regex cascade means a message matching both "timeout" and "assertion" patterns would be classified by whichever regex appears first. This is acceptable because the category drives suggestions, not control flow.

Validation Criteria

  • scripts/lib/context-error.sh loads without error and exports 8 public functions
  • categorize_error correctly classifies all 10 error categories via regex patterns
  • query_similar_failures returns [] when memory system is unavailable (no crash, no hang)
  • query_similar_failures respects 5-second timeout (tested via mock)
  • format_context_error_report produces all 4 sections: What Failed, Why, Similar Past Issues, Suggested Actions
  • format_context_error_report includes stage name, iteration count, issue number, and goal in output
  • format_github_error_comment produces valid markdown with <details> blocks
  • record_suggestion appends valid JSONL and emits suggestion.recorded event
  • mark_suggestion_resolved atomically updates the correct entry and emits suggestion.resolved event
  • resolve_outstanding_suggestions batch-resolves unresolved entries without touching already-resolved ones
  • All 4 integration points use type ... >/dev/null 2>&1 guards (zero impact when library not loaded)
  • All 52 test cases pass in scripts/sw-lib-context-error-test.sh
  • Full test suite (npm test) passes with no regressions
  • NO_COLOR environment variable produces plain-text output (no Unicode box-drawing)
  • New events registered in config/event-schema.json with correct required/optional fields

Clone this wiki locally