-
Notifications
You must be signed in to change notification settings - Fork 0
Pipeline Design 179
Now I have the full picture. Let me write the ADR.
Shipwright pipelines use templates (fast, standard, full, hotfix, autonomous, cost-aware) that determine which stages execute and how gates behave. Template selection is currently complexity-score-driven in daemon triage — a static mapping with no feedback loop. When the same type of issue repeatedly succeeds with a particular template, that signal is discarded. The goal is to learn from successful pipeline completions and recommend templates for new issues based on keyword similarity to past successes.
Constraints:
- Entire codebase is bash (Bash 3.2 compatible) — no Python/Node runtime available at script layer
- Must not block or break existing pipelines on failure — strict graceful degradation
- Pattern storage must be bounded (disk growth in
~/.shipwright/memory/) - jq is the only structured-data tool available; it may not be installed on all systems
- Three integration points:
sw-memory.sh(capture),sw-pipeline.sh(display + track),daemon-triage.sh(recommend)
Single bash module (scripts/sw-success-patterns.sh) with TF-IDF-style keyword matching and domain expansion, stored as JSON in ~/.shipwright/memory/<repo-hash>/success-patterns.json.
On successful pipeline completion, memory_finalize_pipeline() in sw-memory.sh:681 calls success_capture_pattern(). The function parses the pipeline state file (YAML frontmatter) for goal, template, timestamps, and stage results. Keywords are extracted from the goal text with domain expansion (e.g., "auth" expands to include "authentication", "authorization", "login", "session", "token"). Pattern ID is SHA-256 of goal + template + timestamp, truncated to 16 chars. Storage uses atomic writes (tmp file + mv) capped at 200 patterns via FIFO eviction (jq .patterns | .[-200:]).
success_recommend_template() scores each stored pattern against the new goal using keyword overlap (Jaccard-like: overlap / max(goal_keywords, pattern_keywords)). Filters require minimum 2 keyword matches and 0.2 score. Patterns are grouped by template; the template with the highest count wins, weighted by average score. A confidence percentage is calculated as (winner_count / total_matched) * avg_score * 100. Recommendations require:
- Minimum 3 stored patterns (cold-start guard)
- At least 1 matching pattern after filtering
- 60% confidence threshold in daemon triage (
daemon-triage.sh:416)
-
sw-memory.sh:675-681— Sourcessw-success-patterns.sh; callssuccess_capture_pattern()on complete pipelines (|| trueguarded) -
sw-pipeline.sh:1802-1805— Displays formatted recommendation box after intake stage completes (typecheck +|| true) -
sw-pipeline.sh:2891-2899— Tracks recommendation acceptance at pipeline end by comparing recommended vs. actual template -
daemon-triage.sh:404-416— Queries recommendation during template selection; applies only if confidence >= 60%
Every public function returns 0 on error. All jq calls have || true fallbacks. Integration calls are guarded by type ... >/dev/null 2>&1 checks. The module never blocks pipeline execution — if success-patterns.sh fails to load, is missing, or jq is unavailable, all callers silently continue.
Four event types emitted to ~/.shipwright/events.jsonl: success.pattern_captured, success.recommendation, success.recommendation_accepted, success.recommendation_rejected. Schema registered in config/event-schema.json.
-
SQLite-backed with full-text search — Pros: Better query performance at scale, native FTS ranking / Cons: New dependency (sqlite3 binary), more complex schema management, overkill for 200-item dataset. Rejected because the pattern volume is small and jq is already used everywhere.
-
Embedding-based vector similarity (Python/Node) — Pros: Best semantic matching, handles synonyms automatically / Cons: Requires Python or Node runtime at script layer, adds ~100MB+ of dependencies, cold-start latency for model loading. Rejected because it introduces a new runtime dependency and the matching quality improvement doesn't justify the complexity for capped-at-200 patterns.
-
Files created:
-
scripts/sw-success-patterns.sh— Core module (586 lines): storage, keyword extraction with domain expansion, capture, recommend, display, track, stats, CLI dispatcher -
scripts/sw-success-patterns-test.sh— Test suite (463 lines): 26 test cases with self-contained mock environment
-
-
Files modified:
-
scripts/sw-memory.sh— Source success-patterns module (+4 lines); hook capture intomemory_finalize_pipeline()(+7 lines) -
scripts/sw-pipeline.sh— Display recommendation after intake (+4 lines); track acceptance at completion (+9 lines) -
scripts/lib/daemon-triage.sh— Query recommendation in template selection (+21 lines) -
config/event-schema.json— Register 4 success.* event types (+16 lines) -
package.json— Register test suite innpm test(+1 line)
-
-
Dependencies: None new. Uses existing jq, bash, git, shasum.
-
Risk areas:
- Keyword extraction quality: Domain expansion is hardcoded to 8 domains (auth, api, db, ui, test, deploy, error, perf). Goals outside these domains get no expansion, reducing match quality. Mitigated by requiring only 2 keyword overlaps.
-
Confidence formula:
(winner_count / total_matched) * avg_score * 100can produce misleadingly high confidence when total_matched is small (e.g., 1 match = 100% * score). Mitigated by the cold-start guard (minimum 3 patterns) and the 0.2 score floor. -
Date parsing:
date -j -f(macOS) withdate -d(Linux) fallback for duration calculation. If neither works, duration defaults to 0 — acceptable degradation. -
Acceptance tracking: Relies on
recommended_template:being present in the state file, which is only written by daemon-triage integration. Manual pipeline runs won't track acceptance.
Pipeline completes (status: complete)
│
▼
memory_finalize_pipeline() ─── sw-memory.sh:675
│
├── [FAILURE POINT: state_file missing/unreadable → return 0]
│
▼
success_capture_pattern(state_file, artifacts_dir) ─── sw-success-patterns.sh:127
│
├── Parse YAML frontmatter (sed)
├── Extract keywords + domain expansion
├── [FAILURE POINT: jq unavailable → return 0]
│
▼
Atomic write to success-patterns.json (tmp + mv)
│
├── FIFO eviction if > 200 patterns
├── [FAILURE POINT: write fails → rm tmp, return 0]
│
▼
emit_event "success.pattern_captured"
New pipeline starts (daemon triage)
│
▼
success_recommend_template(goal, labels, complexity) ─── daemon-triage.sh:404
│
├── [FAILURE POINT: < 3 patterns → return empty]
├── [FAILURE POINT: no keyword overlap → return empty]
│
▼
jq scoring: keyword overlap → group by template → confidence calc
│
├── [FAILURE POINT: confidence < 60% → not applied]
│
▼
Template recommendation applied (or falls back to complexity-based default)
No database migrations — file-based JSON storage. Forward schema:
{
"version": 1,
"patterns": [{ "id": "string", "goal": "string", "labels": ["string"], "template": "string",
"complexity": "string", "duration_secs": 0, "cost_usd": 0,
"stage_results": {}, "keywords": ["string"],
"issue_number": null, "repo": "string", "captured_at": "ISO8601" }],
"stats": { "total_captured": 0, "recommendations_made": 0,
"recommendations_accepted": 0, "last_updated": "ISO8601" }
}Rollback: Delete scripts/sw-success-patterns.sh and scripts/sw-success-patterns-test.sh. Revert integration lines in sw-memory.sh, sw-pipeline.sh, daemon-triage.sh. Remove event types from config/event-schema.json. Remove test entry from package.json. Pattern data in ~/.shipwright/memory/*/success-patterns.json is inert — safe to leave.
Pattern IDs are SHA-256 of goal + template + current_timestamp. Each pipeline run produces a unique timestamp, so duplicate captures from the same pipeline are not possible in practice. If the same pipeline somehow triggers capture twice, two patterns with different IDs are stored — the 200-cap FIFO eviction handles cleanup. No deduplication logic is needed since each run is temporally unique.
Not applicable — internal bash module with shell function interfaces only. No HTTP/REST/RPC surface.
Not applicable — all functions return exit code 0 on any error (graceful degradation by design). Errors are logged to stderr via the error() helper but never propagate to callers.
Not applicable — pattern capture is bounded by pipeline completion frequency (typically minutes to hours between invocations). The 200-pattern FIFO cap provides natural storage bounds.
Schema version field ("version": 1) in success-patterns.json enables future migration if structure changes. No API versioning needed — internal module with no external consumers.
- Patterns captured on successful pipeline completion only (failed pipelines skipped)
- Storage file initialized with correct schema (version 1, empty patterns array)
- Keyword extraction includes domain expansion (auth → authentication, authorization, etc.)
- Cold-start guard: no recommendations with < 3 patterns
- No recommendation for unrelated goals (no keyword overlap)
- Correct template recommended when similar patterns exist
- 200-pattern FIFO cap enforced
- Acceptance tracking increments counter when recommended == actual template
- Rejection does not increment acceptance counter
- Stats display includes pattern count, acceptance rate, template breakdown
- CLI subcommands (capture, recommend, display, accept, show, stats, help) functional
- Events emitted to events.jsonl for capture and recommendation actions
- All 26 tests pass
- Graceful degradation: no pipeline failures when jq missing or patterns file corrupt