Pipeline Design 179

Now I have the full picture. Let me write the ADR.

Design: Success pattern learning and template recommendation engine

Context

Shipwright pipelines use templates (fast, standard, full, hotfix, autonomous, cost-aware) that determine which stages execute and how gates behave. Template selection is currently complexity-score-driven in daemon triage — a static mapping with no feedback loop. When the same type of issue repeatedly succeeds with a particular template, that signal is discarded. The goal is to learn from successful pipeline completions and recommend templates for new issues based on keyword similarity to past successes.

Constraints:

Entire codebase is bash (Bash 3.2 compatible) — no Python/Node runtime available at script layer
Must not block or break existing pipelines on failure — strict graceful degradation
Pattern storage must be bounded (disk growth in ~/.shipwright/memory/)
jq is the only structured-data tool available; it may not be installed on all systems
Three integration points: sw-memory.sh (capture), sw-pipeline.sh (display + track), daemon-triage.sh (recommend)

Decision

Single bash module (scripts/sw-success-patterns.sh) with TF-IDF-style keyword matching and domain expansion, stored as JSON in ~/.shipwright/memory/<repo-hash>/success-patterns.json.

Pattern capture

On successful pipeline completion, memory_finalize_pipeline() in sw-memory.sh:681 calls success_capture_pattern(). The function parses the pipeline state file (YAML frontmatter) for goal, template, timestamps, and stage results. Keywords are extracted from the goal text with domain expansion (e.g., "auth" expands to include "authentication", "authorization", "login", "session", "token"). Pattern ID is SHA-256 of goal + template + timestamp, truncated to 16 chars. Storage uses atomic writes (tmp file + mv) capped at 200 patterns via FIFO eviction (jq .patterns | .[-200:]).

Template recommendation

success_recommend_template() scores each stored pattern against the new goal using keyword overlap (Jaccard-like: overlap / max(goal_keywords, pattern_keywords)). Filters require minimum 2 keyword matches and 0.2 score. Patterns are grouped by template; the template with the highest count wins, weighted by average score. A confidence percentage is calculated as (winner_count / total_matched) * avg_score * 100. Recommendations require:

Minimum 3 stored patterns (cold-start guard)
At least 1 matching pattern after filtering
60% confidence threshold in daemon triage (daemon-triage.sh:416)

Integration points

sw-memory.sh:675-681 — Sources sw-success-patterns.sh; calls success_capture_pattern() on complete pipelines (|| true guarded)
sw-pipeline.sh:1802-1805 — Displays formatted recommendation box after intake stage completes (type check + || true)
sw-pipeline.sh:2891-2899 — Tracks recommendation acceptance at pipeline end by comparing recommended vs. actual template
daemon-triage.sh:404-416 — Queries recommendation during template selection; applies only if confidence >= 60%

Error handling

Every public function returns 0 on error. All jq calls have || true fallbacks. Integration calls are guarded by type ... >/dev/null 2>&1 checks. The module never blocks pipeline execution — if success-patterns.sh fails to load, is missing, or jq is unavailable, all callers silently continue.

Events

Four event types emitted to ~/.shipwright/events.jsonl: success.pattern_captured, success.recommendation, success.recommendation_accepted, success.recommendation_rejected. Schema registered in config/event-schema.json.

Alternatives Considered

SQLite-backed with full-text search — Pros: Better query performance at scale, native FTS ranking / Cons: New dependency (sqlite3 binary), more complex schema management, overkill for 200-item dataset. Rejected because the pattern volume is small and jq is already used everywhere.
Embedding-based vector similarity (Python/Node) — Pros: Best semantic matching, handles synonyms automatically / Cons: Requires Python or Node runtime at script layer, adds ~100MB+ of dependencies, cold-start latency for model loading. Rejected because it introduces a new runtime dependency and the matching quality improvement doesn't justify the complexity for capped-at-200 patterns.

Implementation Plan

Files created:
- scripts/sw-success-patterns.sh — Core module (586 lines): storage, keyword extraction with domain expansion, capture, recommend, display, track, stats, CLI dispatcher
- scripts/sw-success-patterns-test.sh — Test suite (463 lines): 26 test cases with self-contained mock environment
Files modified:
- scripts/sw-memory.sh — Source success-patterns module (+4 lines); hook capture into memory_finalize_pipeline() (+7 lines)
- scripts/sw-pipeline.sh — Display recommendation after intake (+4 lines); track acceptance at completion (+9 lines)
- scripts/lib/daemon-triage.sh — Query recommendation in template selection (+21 lines)
- config/event-schema.json — Register 4 success.* event types (+16 lines)
- package.json — Register test suite in npm test (+1 line)
Dependencies: None new. Uses existing jq, bash, git, shasum.
Risk areas:
- Keyword extraction quality: Domain expansion is hardcoded to 8 domains (auth, api, db, ui, test, deploy, error, perf). Goals outside these domains get no expansion, reducing match quality. Mitigated by requiring only 2 keyword overlaps.
- Confidence formula: (winner_count / total_matched) * avg_score * 100 can produce misleadingly high confidence when total_matched is small (e.g., 1 match = 100% * score). Mitigated by the cold-start guard (minimum 3 patterns) and the 0.2 score floor.
- Date parsing: date -j -f (macOS) with date -d (Linux) fallback for duration calculation. If neither works, duration defaults to 0 — acceptable degradation.
- Acceptance tracking: Relies on recommended_template: being present in the state file, which is only written by daemon-triage integration. Manual pipeline runs won't track acceptance.

Data Flow Diagram

Pipeline completes (status: complete)
       │
       ▼
memory_finalize_pipeline() ─── sw-memory.sh:675
       │
       ├── [FAILURE POINT: state_file missing/unreadable → return 0]
       │
       ▼
success_capture_pattern(state_file, artifacts_dir) ─── sw-success-patterns.sh:127
       │
       ├── Parse YAML frontmatter (sed)
       ├── Extract keywords + domain expansion
       ├── [FAILURE POINT: jq unavailable → return 0]
       │
       ▼
  Atomic write to success-patterns.json (tmp + mv)
       │
       ├── FIFO eviction if > 200 patterns
       ├── [FAILURE POINT: write fails → rm tmp, return 0]
       │
       ▼
  emit_event "success.pattern_captured"


New pipeline starts (daemon triage)
       │
       ▼
success_recommend_template(goal, labels, complexity) ─── daemon-triage.sh:404
       │
       ├── [FAILURE POINT: < 3 patterns → return empty]
       ├── [FAILURE POINT: no keyword overlap → return empty]
       │
       ▼
  jq scoring: keyword overlap → group by template → confidence calc
       │
       ├── [FAILURE POINT: confidence < 60% → not applied]
       │
       ▼
  Template recommendation applied (or falls back to complexity-based default)

Schema Changes

No database migrations — file-based JSON storage. Forward schema:

{
  "version": 1,
  "patterns": [{ "id": "string", "goal": "string", "labels": ["string"], "template": "string",
                 "complexity": "string", "duration_secs": 0, "cost_usd": 0,
                 "stage_results": {}, "keywords": ["string"],
                 "issue_number": null, "repo": "string", "captured_at": "ISO8601" }],
  "stats": { "total_captured": 0, "recommendations_made": 0,
             "recommendations_accepted": 0, "last_updated": "ISO8601" }
}

Rollback: Delete scripts/sw-success-patterns.sh and scripts/sw-success-patterns-test.sh. Revert integration lines in sw-memory.sh, sw-pipeline.sh, daemon-triage.sh. Remove event types from config/event-schema.json. Remove test entry from package.json. Pattern data in ~/.shipwright/memory/*/success-patterns.json is inert — safe to leave.

Idempotency Strategy

Pattern IDs are SHA-256 of goal + template + current_timestamp. Each pipeline run produces a unique timestamp, so duplicate captures from the same pipeline are not possible in practice. If the same pipeline somehow triggers capture twice, two patterns with different IDs are stored — the 200-cap FIFO eviction handles cleanup. No deduplication logic is needed since each run is temporally unique.

Endpoint Specification

Not applicable — internal bash module with shell function interfaces only. No HTTP/REST/RPC surface.

Error Codes

Not applicable — all functions return exit code 0 on any error (graceful degradation by design). Errors are logged to stderr via the error() helper but never propagate to callers.

Rate Limiting

Not applicable — pattern capture is bounded by pipeline completion frequency (typically minutes to hours between invocations). The 200-pattern FIFO cap provides natural storage bounds.

Versioning

Schema version field ("version": 1) in success-patterns.json enables future migration if structure changes. No API versioning needed — internal module with no external consumers.

Pipeline Design 179

Design: Success pattern learning and template recommendation engine

Context

Decision

Pattern capture

Template recommendation

Integration points

Error handling

Events

Alternatives Considered

Implementation Plan

Data Flow Diagram

Schema Changes

Idempotency Strategy

Endpoint Specification

Error Codes

Rate Limiting

Versioning

Validation Criteria

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally