Skip to content

Pipeline Design 211

Seth Ford edited this page Mar 9, 2026 · 1 revision

I have enough context now. Let me produce the ADR.


Design: Memory Pattern Effectiveness Tracker with Proactive Failure Prevention Scoring

Context

Shipwright's memory system (sw-memory.sh, 2118 lines) captures failure patterns from pipeline runs into ~/.shipwright/memory/<repo-hash>/failures.json. A library module lib/memory-effectiveness.sh (507 lines) already provides core tracking primitives: injection recording, outcome tracking, scoring, ranking, pruning, and reporting. However:

  1. No pre-pipeline scoring — Memory is injected blindly. There's no mechanism to match an incoming issue against historical patterns before the pipeline starts, so irrelevant patterns waste context tokens.
  2. Hooks are disconnectedmemeff_on_injection() and memeff_on_pipeline_complete() exist but are never called from sw-loop.sh or pipeline-commands.sh.
  3. Outcome tracking is shallow — Current tracking records avoided_error: true/false but doesn't distinguish "failure prevented" from "failure unrelated to injected pattern."
  4. No CLI surface — Effectiveness data is inaccessible to operators. No shipwright memory effectiveness or shipwright memory score-issue commands exist.

Constraints:

  • Bash 3.2 compatibility (macOS default) — no associative arrays, no readarray
  • JSONL storage (~/.shipwright/optimization/) — append-only, atomic writes via tmp+mv
  • Must not break existing always-inject behavior (new scoring adds an optional filter layer)
  • Cold start: must return score 0 and inject nothing when no historical data exists
  • Must be backwards-compatible with existing JSONL files (additive fields only)

Decision

Approach: Enhance-and-Wire (not rebuild)

Extend lib/memory-effectiveness.sh with one new function (memeff_score_issue) and enhanced outcome fields. Wire the existing but disconnected hooks into three integration points. Expose via CLI subcommands on sw-memory.sh.

Component Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        CLI Layer                                 │
│  sw-memory.sh: effectiveness [text|json|strategic]               │
│  sw-memory.sh: score-issue <title> <body> [labels] [files]      │
└────────────────────────┬────────────────────────────────────────┘
                         │ dispatches to
┌────────────────────────▼────────────────────────────────────────┐
│              lib/memory-effectiveness.sh                          │
│                                                                  │
│  NEW: memeff_score_issue()                                       │
│    ├─ keyword_overlap_score (title+body vs pattern+root_cause)   │
│    ├─ label_match_score (issue labels vs pattern categories)     │
│    ├─ file_overlap_score (affected files vs pattern file_paths)  │
│    └─ effectiveness_weight (historical prevention_rate boost)    │
│                                                                  │
│  ENHANCED: memeff_track_outcome()                                │
│    ├─ pattern_injected: bool                                     │
│    ├─ failure_occurred: bool                                     │
│    ├─ failure_type_matched: bool                                 │
│    └─ failure_prevented: bool (derived: injected && !occurred)   │
│                                                                  │
│  ENHANCED: memeff_report()                                       │
│    ├─ injection_success_rate                                     │
│    ├─ patterns_needing_refinement list                           │
│    └─ strategic export format (JSON for agent consumption)       │
│                                                                  │
│  EXISTING (unchanged):                                           │
│    memeff_track_injection, memeff_score_pattern,                  │
│    memeff_rank_patterns, memeff_prune_ineffective,               │
│    memeff_proactive_score, memeff_on_injection,                  │
│    memeff_on_pipeline_complete                                    │
└───────┬──────────────────────────┬──────────────────────────────┘
        │                          │
   Injection Hook              Completion Hook
        │                          │
┌───────▼──────────┐    ┌─────────▼───────────────────────────┐
│   sw-loop.sh      │    │  lib/pipeline-commands.sh            │
│                   │    │  lib/daemon-dispatch.sh              │
│  memory_closed_   │    │                                      │
│  loop_inject()    │    │  pipeline.completed event handler    │
│  memory_inject_   │    │  daemon_reap_completed()             │
│  context()        │    │                                      │
│  + memeff_on_     │    │  + memeff_on_pipeline_complete()     │
│    injection()    │    │    called with outcome + error_class  │
└──────────────────┘    └──────────────────────────────────────┘

Data Flow

Issue arrives
    │
    ▼
memeff_score_issue(title, body, labels, files)
    │
    ├─ Reads: ~/.shipwright/memory/<repo>/failures.json (pattern catalog)
    ├─ Reads: ~/.shipwright/optimization/memory-outcomes.jsonl (historical effectiveness)
    │
    ▼
Score 0-100 returned
    │
    ├─ score < 30 (default threshold) → No injection (cold start / low match)
    ├─ score >= 30 → Inject top-N matching patterns into pipeline prompt
    │
    ▼
memeff_track_injection(memory_id, pipeline_id, stage, context)
    │  Writes: ~/.shipwright/optimization/memory-injections.jsonl
    │
    ▼
Pipeline runs (build → test → review → ...)
    │
    ▼
Pipeline completes (success or failure)
    │
    ▼
memeff_on_pipeline_complete(pipeline_id, outcome, error_context)
    │  Reads: memory-injections.jsonl (find what was injected)
    │  Writes: memory-outcomes.jsonl (with enhanced fields)
    │  Emits: memeff.outcome event
    │
    ▼
memeff_report() / memeff_rank_patterns()
    │  Reads: memory-outcomes.jsonl
    │  Aggregates: prevention_rate, relevance_rate, effectiveness_score
    │  Outputs: text / json / strategic format

Interface Contracts

# NEW — Pre-pipeline issue scoring
# Returns: JSON {"score": 0-100, "matching_patterns": [...], "recommended_injections": [...]}
memeff_score_issue(issue_title, issue_body, issue_labels, affected_files)
# Errors: Returns '{"score":0,"matching_patterns":[],"recommended_injections":[]}' on any failure (cold start safe)

# ENHANCED — Outcome tracking with richer fields
# outcome_record schema (JSONL):
# {
#   "memory_id": string,
#   "pipeline_id": string,
#   "outcome": "success" | "failure" | "inconclusive",
#   "was_relevant": boolean,
#   "avoided_error": boolean,
#   "pattern_injected": boolean,        # NEW
#   "failure_occurred": boolean,         # NEW
#   "failure_type_matched": boolean,     # NEW
#   "failure_prevented": boolean,        # NEW (derived)
#   "error_description": string,
#   "recorded_at": ISO8601
# }

# ENHANCED — Report with strategic export
# format: "text" (human), "json" (machine), "strategic" (agent-consumable)
memeff_report(format)
# strategic format schema:
# {
#   "summary": { "total_patterns": N, "average_score": N, ... },
#   "injection_success_rate": float,
#   "top_patterns": [...],
#   "patterns_needing_refinement": [...],
#   "recommendations": [string]
# }

# CLI subcommands (dispatched by sw-memory.sh case statement):
# shipwright memory effectiveness [text|json|strategic]
# shipwright memory score-issue <title> [--body "..."] [--labels "a,b"] [--files "x,y"]

Error Boundaries

Component Error Source Handling
memeff_score_issue Missing failures.json, empty patterns Return {"score":0} — cold start safe
memeff_score_issue jq parse error on corrupted JSON `
memeff_track_outcome No matching injection record Warn + skip (existing behavior)
memeff_on_pipeline_complete JSONL grep finds no injections Early return, no outcome written
memeff_report strategic Empty outcomes file Return valid JSON with zeroes
Hook wiring in sw-loop.sh memory-effectiveness.sh not loadable type memeff_on_injection >/dev/null 2>&1 guard
Hook wiring in pipeline-commands.sh Module not sourced Same type guard pattern

Scoring Algorithm (memeff_score_issue)

score = (keyword_overlap * 0.35) + (label_match * 0.20) + (file_overlap * 0.25) + (effectiveness_weight * 0.20)

keyword_overlap:  Tokenize issue title+body, match against pattern + root_cause fields.
                  Score = (matched_keywords / total_pattern_keywords) * 100

label_match:      Map issue labels to pattern categories (e.g., "bug" → "runtime_error").
                  Binary: 100 if any match, 0 otherwise.

file_overlap:     Intersect affected_files with pattern's file_paths (if available).
                  Score = (intersection_count / pattern_file_count) * 100

effectiveness_weight: Historical prevention_rate from memeff_score_pattern().
                     Score = prevention_rate (0-100), defaults to 50 if < 3 data points.

Threshold: inject when aggregate score >= 30 (configurable via MEMEFF_INJECTION_THRESHOLD).

Alternatives Considered

1. Separate SQLite-based tracker

Pros: Proper relational queries, JOIN capability, ACID transactions, indexes for fast lookups. Cons: Adds SQLite dependency to memory-effectiveness module (currently pure jq/JSONL); the project already uses sw-db.sh for other storage but memory effectiveness is a lightweight append-only workload. Over-engineered for ~100-1000 records. Breaks the pure-bash-library pattern of lib/.

2. AI-powered semantic matching (Claude API call in memeff_score_issue)

Pros: Much better issue-to-pattern matching than keyword overlap. Could understand semantic intent. Cons: Adds latency (2-5s API call) before every pipeline start. Adds cost ($0.01-0.05 per scoring call). Creates a circular dependency (using Claude to decide what to inject into Claude). Fails when offline/rate-limited, which is exactly when you need the memory system most.

3. Build from scratch — new sw-memory-effectiveness.sh top-level script

Pros: Clean separation, dedicated CLI entry point, no risk of breaking existing module. Cons: Duplicates the 507-line library that already exists. Creates two effectiveness-tracking codepaths. Plan explicitly states "modify only, no new files." The existing module was designed for exactly this extension.

Schema Changes

Forward migration — No SQL schema; JSONL records gain additive fields:

// memory-outcomes.jsonl — existing fields preserved, new fields added
{"memory_id":"auth-fix-1","pipeline_id":"pipe-42","outcome":"success","was_relevant":true,"avoided_error":true,"pattern_injected":true,"failure_occurred":false,"failure_type_matched":false,"failure_prevented":true,"error_description":"","recorded_at":"2026-03-09T12:00:00Z"}

Backwards compatibility: New fields default to false/empty via jq's // false fallback. Old readers that don't query these fields are unaffected.

Rollback: Delete the new fields from future records. Old records without new fields already work with // false defaults. No migration needed — JSONL is append-only.

Data backfill: Not required. Existing records missing pattern_injected/failure_prevented are treated as having those fields set to false, which is semantically correct (we didn't track it, so we can't claim prevention).

Idempotency Strategy

  • Injection tracking: memeff_track_injection creates a new JSONL record per injection event. Duplicate calls for the same (memory_id, pipeline_id) pair produce duplicate records. This is acceptable: memeff_on_pipeline_complete processes all matching records, and duplicate injections don't change the outcome scoring (outcome is per-pipeline, not per-injection).
  • Outcome tracking: memeff_track_outcome finds the matching injection by (memory_id, pipeline_id) and appends an outcome. Multiple calls for the same pair append duplicate outcomes, but memeff_score_pattern counts all outcomes — so the score slightly skews toward the duplicated outcome. In practice, memeff_on_pipeline_complete is called once at pipeline end, so duplicates don't occur in normal flow.
  • Side-effect safety: All writes use atomic tmp+mv. Event emissions are fire-and-forget. No external API calls.

Rollback Plan

  1. Revert the commits modifying the 5 files
  2. Existing JSONL files in ~/.shipwright/optimization/ remain valid (new fields are simply ignored by the old code)
  3. No external state to clean up (no database, no API registrations)
  4. CLI subcommands in sw-memory.sh simply become unrecognized — fall through to usage help

Implementation Plan

Files to modify (5 total, no new files):

File Changes
scripts/lib/memory-effectiveness.sh Add memeff_score_issue(), enhance memeff_track_outcome() with new fields, enhance memeff_report() with strategic format and injection_success_rate
scripts/sw-memory.sh Add effectiveness and score-issue case branches in CLI dispatch
scripts/sw-loop.sh Wire memeff_on_injection call after memory_closed_loop_inject / memory_inject_context (~line 2162, 2207)
scripts/lib/pipeline-commands.sh Wire memeff_on_pipeline_complete call after pipeline.completed emit_event (~line 792, 844)
scripts/lib/daemon-dispatch.sh Wire memeff_on_pipeline_complete in daemon_reap_completed() (~line 399+)

Dependencies

None new. Uses existing jq, grep, sed, date — all available in Bash 3.2 environments.

Risk Areas

  1. Performance of memeff_score_issue with large failure catalogs — Iterates failures.json entries (typically < 50 per repo). Tokenization uses tr/sort/comm — O(n·m) where n=issue tokens, m=pattern tokens. Should be < 100ms for typical catalogs. Risk: repos with 500+ failure entries could take 1-2s. Mitigation: cap at top-50 patterns by seen_count before scoring.

  2. JSONL file growthmemory-injections.jsonl grows by ~1 record per pipeline run per injected pattern. At 10 pipelines/day with 3 patterns each, that's ~30 records/day, ~11K/year. At ~200 bytes/record, that's ~2MB/year. memeff_prune_ineffective already archives old records. Low risk.

  3. Hook wiring in sw-loop.sh — The loop runs in tight iteration cycles. Adding a type memeff_on_injection >/dev/null 2>&1 && memeff_on_injection ... guard is cheap (no fork), but the actual JSONL append inside memeff_track_injection forks jq + mktemp + mv. This adds ~50ms per injection. Acceptable since injection happens once at loop start, not per iteration.

  4. Pipeline-commands.sh completion path — Both success and failure paths emit pipeline.completed. The hook must handle both. Risk: if memeff_on_pipeline_complete errors out, it could interfere with the pipeline exit code. Mitigation: wrap in || true to ensure it's fire-and-forget.

Validation Criteria

  • memeff_score_issue "auth token expired" "Login fails after session timeout" "bug" "src/auth.ts" returns a JSON object with score between 0-100 when failures.json contains auth-related patterns
  • memeff_score_issue returns {"score":0,"matching_patterns":[],"recommended_injections":[]} on cold start (no failures.json)
  • Enhanced outcome records contain pattern_injected, failure_occurred, failure_type_matched, failure_prevented fields
  • Old outcome records without new fields still parse correctly (jq '.failure_prevented // false' returns false)
  • shipwright memory effectiveness prints a text report with injection_success_rate
  • shipwright memory effectiveness json returns valid JSON matching the strategic schema
  • shipwright memory score-issue "test title" returns valid JSON with a score
  • memeff_on_injection is called in sw-loop.sh when memory_closed_loop_inject returns a pattern (guarded by type check)
  • memeff_on_pipeline_complete is called in pipeline-commands.sh after both success and failure pipeline.completed events (guarded by || true)
  • Existing 16 tests in sw-memory-effectiveness-test.sh still pass (no regressions)
  • New integration tests cover: score→inject→succeed→outcome and score→inject→fail→outcome flows
  • sw-pipeline-test.sh, sw-memory-test.sh, and sw-daemon-test.sh pass without regressions
  • Scoring threshold (30 default) is configurable via MEMEFF_INJECTION_THRESHOLD environment variable
  • All new functions are Bash 3.2 compatible (no associative arrays, no readarray)

Clone this wiki locally