-
Notifications
You must be signed in to change notification settings - Fork 1
Pipeline Design 211
I have enough context now. Let me produce the ADR.
Shipwright's memory system (sw-memory.sh, 2118 lines) captures failure patterns from pipeline runs into ~/.shipwright/memory/<repo-hash>/failures.json. A library module lib/memory-effectiveness.sh (507 lines) already provides core tracking primitives: injection recording, outcome tracking, scoring, ranking, pruning, and reporting. However:
- No pre-pipeline scoring — Memory is injected blindly. There's no mechanism to match an incoming issue against historical patterns before the pipeline starts, so irrelevant patterns waste context tokens.
-
Hooks are disconnected —
memeff_on_injection()andmemeff_on_pipeline_complete()exist but are never called fromsw-loop.shorpipeline-commands.sh. -
Outcome tracking is shallow — Current tracking records
avoided_error: true/falsebut doesn't distinguish "failure prevented" from "failure unrelated to injected pattern." -
No CLI surface — Effectiveness data is inaccessible to operators. No
shipwright memory effectivenessorshipwright memory score-issuecommands exist.
Constraints:
- Bash 3.2 compatibility (macOS default) — no associative arrays, no
readarray - JSONL storage (
~/.shipwright/optimization/) — append-only, atomic writes via tmp+mv - Must not break existing always-inject behavior (new scoring adds an optional filter layer)
- Cold start: must return score 0 and inject nothing when no historical data exists
- Must be backwards-compatible with existing JSONL files (additive fields only)
Extend lib/memory-effectiveness.sh with one new function (memeff_score_issue) and enhanced outcome fields. Wire the existing but disconnected hooks into three integration points. Expose via CLI subcommands on sw-memory.sh.
┌─────────────────────────────────────────────────────────────────┐
│ CLI Layer │
│ sw-memory.sh: effectiveness [text|json|strategic] │
│ sw-memory.sh: score-issue <title> <body> [labels] [files] │
└────────────────────────┬────────────────────────────────────────┘
│ dispatches to
┌────────────────────────▼────────────────────────────────────────┐
│ lib/memory-effectiveness.sh │
│ │
│ NEW: memeff_score_issue() │
│ ├─ keyword_overlap_score (title+body vs pattern+root_cause) │
│ ├─ label_match_score (issue labels vs pattern categories) │
│ ├─ file_overlap_score (affected files vs pattern file_paths) │
│ └─ effectiveness_weight (historical prevention_rate boost) │
│ │
│ ENHANCED: memeff_track_outcome() │
│ ├─ pattern_injected: bool │
│ ├─ failure_occurred: bool │
│ ├─ failure_type_matched: bool │
│ └─ failure_prevented: bool (derived: injected && !occurred) │
│ │
│ ENHANCED: memeff_report() │
│ ├─ injection_success_rate │
│ ├─ patterns_needing_refinement list │
│ └─ strategic export format (JSON for agent consumption) │
│ │
│ EXISTING (unchanged): │
│ memeff_track_injection, memeff_score_pattern, │
│ memeff_rank_patterns, memeff_prune_ineffective, │
│ memeff_proactive_score, memeff_on_injection, │
│ memeff_on_pipeline_complete │
└───────┬──────────────────────────┬──────────────────────────────┘
│ │
Injection Hook Completion Hook
│ │
┌───────▼──────────┐ ┌─────────▼───────────────────────────┐
│ sw-loop.sh │ │ lib/pipeline-commands.sh │
│ │ │ lib/daemon-dispatch.sh │
│ memory_closed_ │ │ │
│ loop_inject() │ │ pipeline.completed event handler │
│ memory_inject_ │ │ daemon_reap_completed() │
│ context() │ │ │
│ + memeff_on_ │ │ + memeff_on_pipeline_complete() │
│ injection() │ │ called with outcome + error_class │
└──────────────────┘ └──────────────────────────────────────┘
Issue arrives
│
▼
memeff_score_issue(title, body, labels, files)
│
├─ Reads: ~/.shipwright/memory/<repo>/failures.json (pattern catalog)
├─ Reads: ~/.shipwright/optimization/memory-outcomes.jsonl (historical effectiveness)
│
▼
Score 0-100 returned
│
├─ score < 30 (default threshold) → No injection (cold start / low match)
├─ score >= 30 → Inject top-N matching patterns into pipeline prompt
│
▼
memeff_track_injection(memory_id, pipeline_id, stage, context)
│ Writes: ~/.shipwright/optimization/memory-injections.jsonl
│
▼
Pipeline runs (build → test → review → ...)
│
▼
Pipeline completes (success or failure)
│
▼
memeff_on_pipeline_complete(pipeline_id, outcome, error_context)
│ Reads: memory-injections.jsonl (find what was injected)
│ Writes: memory-outcomes.jsonl (with enhanced fields)
│ Emits: memeff.outcome event
│
▼
memeff_report() / memeff_rank_patterns()
│ Reads: memory-outcomes.jsonl
│ Aggregates: prevention_rate, relevance_rate, effectiveness_score
│ Outputs: text / json / strategic format
# NEW — Pre-pipeline issue scoring
# Returns: JSON {"score": 0-100, "matching_patterns": [...], "recommended_injections": [...]}
memeff_score_issue(issue_title, issue_body, issue_labels, affected_files)
# Errors: Returns '{"score":0,"matching_patterns":[],"recommended_injections":[]}' on any failure (cold start safe)
# ENHANCED — Outcome tracking with richer fields
# outcome_record schema (JSONL):
# {
# "memory_id": string,
# "pipeline_id": string,
# "outcome": "success" | "failure" | "inconclusive",
# "was_relevant": boolean,
# "avoided_error": boolean,
# "pattern_injected": boolean, # NEW
# "failure_occurred": boolean, # NEW
# "failure_type_matched": boolean, # NEW
# "failure_prevented": boolean, # NEW (derived)
# "error_description": string,
# "recorded_at": ISO8601
# }
# ENHANCED — Report with strategic export
# format: "text" (human), "json" (machine), "strategic" (agent-consumable)
memeff_report(format)
# strategic format schema:
# {
# "summary": { "total_patterns": N, "average_score": N, ... },
# "injection_success_rate": float,
# "top_patterns": [...],
# "patterns_needing_refinement": [...],
# "recommendations": [string]
# }
# CLI subcommands (dispatched by sw-memory.sh case statement):
# shipwright memory effectiveness [text|json|strategic]
# shipwright memory score-issue <title> [--body "..."] [--labels "a,b"] [--files "x,y"]| Component | Error Source | Handling |
|---|---|---|
memeff_score_issue |
Missing failures.json, empty patterns | Return {"score":0} — cold start safe |
memeff_score_issue |
jq parse error on corrupted JSON | ` |
memeff_track_outcome |
No matching injection record | Warn + skip (existing behavior) |
memeff_on_pipeline_complete |
JSONL grep finds no injections | Early return, no outcome written |
memeff_report strategic |
Empty outcomes file | Return valid JSON with zeroes |
| Hook wiring in sw-loop.sh |
memory-effectiveness.sh not loadable |
type memeff_on_injection >/dev/null 2>&1 guard |
| Hook wiring in pipeline-commands.sh | Module not sourced | Same type guard pattern |
score = (keyword_overlap * 0.35) + (label_match * 0.20) + (file_overlap * 0.25) + (effectiveness_weight * 0.20)
keyword_overlap: Tokenize issue title+body, match against pattern + root_cause fields.
Score = (matched_keywords / total_pattern_keywords) * 100
label_match: Map issue labels to pattern categories (e.g., "bug" → "runtime_error").
Binary: 100 if any match, 0 otherwise.
file_overlap: Intersect affected_files with pattern's file_paths (if available).
Score = (intersection_count / pattern_file_count) * 100
effectiveness_weight: Historical prevention_rate from memeff_score_pattern().
Score = prevention_rate (0-100), defaults to 50 if < 3 data points.
Threshold: inject when aggregate score >= 30 (configurable via MEMEFF_INJECTION_THRESHOLD).
Pros: Proper relational queries, JOIN capability, ACID transactions, indexes for fast lookups.
Cons: Adds SQLite dependency to memory-effectiveness module (currently pure jq/JSONL); the project already uses sw-db.sh for other storage but memory effectiveness is a lightweight append-only workload. Over-engineered for ~100-1000 records. Breaks the pure-bash-library pattern of lib/.
Pros: Much better issue-to-pattern matching than keyword overlap. Could understand semantic intent. Cons: Adds latency (2-5s API call) before every pipeline start. Adds cost ($0.01-0.05 per scoring call). Creates a circular dependency (using Claude to decide what to inject into Claude). Fails when offline/rate-limited, which is exactly when you need the memory system most.
Pros: Clean separation, dedicated CLI entry point, no risk of breaking existing module. Cons: Duplicates the 507-line library that already exists. Creates two effectiveness-tracking codepaths. Plan explicitly states "modify only, no new files." The existing module was designed for exactly this extension.
Forward migration — No SQL schema; JSONL records gain additive fields:
// memory-outcomes.jsonl — existing fields preserved, new fields added
{"memory_id":"auth-fix-1","pipeline_id":"pipe-42","outcome":"success","was_relevant":true,"avoided_error":true,"pattern_injected":true,"failure_occurred":false,"failure_type_matched":false,"failure_prevented":true,"error_description":"","recorded_at":"2026-03-09T12:00:00Z"}Backwards compatibility: New fields default to false/empty via jq's // false fallback. Old readers that don't query these fields are unaffected.
Rollback: Delete the new fields from future records. Old records without new fields already work with // false defaults. No migration needed — JSONL is append-only.
Data backfill: Not required. Existing records missing pattern_injected/failure_prevented are treated as having those fields set to false, which is semantically correct (we didn't track it, so we can't claim prevention).
-
Injection tracking:
memeff_track_injectioncreates a new JSONL record per injection event. Duplicate calls for the same(memory_id, pipeline_id)pair produce duplicate records. This is acceptable:memeff_on_pipeline_completeprocesses all matching records, and duplicate injections don't change the outcome scoring (outcome is per-pipeline, not per-injection). -
Outcome tracking:
memeff_track_outcomefinds the matching injection by(memory_id, pipeline_id)and appends an outcome. Multiple calls for the same pair append duplicate outcomes, butmemeff_score_patterncounts all outcomes — so the score slightly skews toward the duplicated outcome. In practice,memeff_on_pipeline_completeis called once at pipeline end, so duplicates don't occur in normal flow. - Side-effect safety: All writes use atomic tmp+mv. Event emissions are fire-and-forget. No external API calls.
- Revert the commits modifying the 5 files
- Existing JSONL files in
~/.shipwright/optimization/remain valid (new fields are simply ignored by the old code) - No external state to clean up (no database, no API registrations)
- CLI subcommands in
sw-memory.shsimply become unrecognized — fall through to usage help
| File | Changes |
|---|---|
scripts/lib/memory-effectiveness.sh |
Add memeff_score_issue(), enhance memeff_track_outcome() with new fields, enhance memeff_report() with strategic format and injection_success_rate |
scripts/sw-memory.sh |
Add effectiveness and score-issue case branches in CLI dispatch |
scripts/sw-loop.sh |
Wire memeff_on_injection call after memory_closed_loop_inject / memory_inject_context (~line 2162, 2207) |
scripts/lib/pipeline-commands.sh |
Wire memeff_on_pipeline_complete call after pipeline.completed emit_event (~line 792, 844) |
scripts/lib/daemon-dispatch.sh |
Wire memeff_on_pipeline_complete in daemon_reap_completed() (~line 399+) |
None new. Uses existing jq, grep, sed, date — all available in Bash 3.2 environments.
-
Performance of
memeff_score_issuewith large failure catalogs — Iteratesfailures.jsonentries (typically < 50 per repo). Tokenization usestr/sort/comm— O(n·m) where n=issue tokens, m=pattern tokens. Should be < 100ms for typical catalogs. Risk: repos with 500+ failure entries could take 1-2s. Mitigation: cap at top-50 patterns byseen_countbefore scoring. -
JSONL file growth —
memory-injections.jsonlgrows by ~1 record per pipeline run per injected pattern. At 10 pipelines/day with 3 patterns each, that's ~30 records/day, ~11K/year. At ~200 bytes/record, that's ~2MB/year.memeff_prune_ineffectivealready archives old records. Low risk. -
Hook wiring in
sw-loop.sh— The loop runs in tight iteration cycles. Adding atype memeff_on_injection >/dev/null 2>&1 && memeff_on_injection ...guard is cheap (no fork), but the actual JSONL append insidememeff_track_injectionforksjq+mktemp+mv. This adds ~50ms per injection. Acceptable since injection happens once at loop start, not per iteration. -
Pipeline-commands.sh completion path — Both success and failure paths emit
pipeline.completed. The hook must handle both. Risk: ifmemeff_on_pipeline_completeerrors out, it could interfere with the pipeline exit code. Mitigation: wrap in|| trueto ensure it's fire-and-forget.
-
memeff_score_issue "auth token expired" "Login fails after session timeout" "bug" "src/auth.ts"returns a JSON object withscorebetween 0-100 when failures.json contains auth-related patterns -
memeff_score_issuereturns{"score":0,"matching_patterns":[],"recommended_injections":[]}on cold start (no failures.json) - Enhanced outcome records contain
pattern_injected,failure_occurred,failure_type_matched,failure_preventedfields - Old outcome records without new fields still parse correctly (
jq '.failure_prevented // false'returnsfalse) -
shipwright memory effectivenessprints a text report with injection_success_rate -
shipwright memory effectiveness jsonreturns valid JSON matching the strategic schema -
shipwright memory score-issue "test title"returns valid JSON with a score -
memeff_on_injectionis called insw-loop.shwhenmemory_closed_loop_injectreturns a pattern (guarded bytypecheck) -
memeff_on_pipeline_completeis called inpipeline-commands.shafter both success and failurepipeline.completedevents (guarded by|| true) - Existing 16 tests in
sw-memory-effectiveness-test.shstill pass (no regressions) - New integration tests cover: score→inject→succeed→outcome and score→inject→fail→outcome flows
-
sw-pipeline-test.sh,sw-memory-test.sh, andsw-daemon-test.shpass without regressions - Scoring threshold (30 default) is configurable via
MEMEFF_INJECTION_THRESHOLDenvironment variable - All new functions are Bash 3.2 compatible (no associative arrays, no
readarray)