Skip to content

Pipeline Design 313

ezigus edited this page Apr 18, 2026 · 2 revisions

ADR written to .claude/pipeline-artifacts/design.md (294 lines).

Summary of the architecture decision:

  • Approach: Orphan git ref refs/ruflo-memory storing a single memory-export.json — matches the existing shipwright-data pattern (lines 803-852 of the workflow)
  • 4 new functions in ruflo-adapter.sh: pull, push, prune (90-day), merge (jq -s union)
  • 2 new workflow steps: restore (after cache restore at line 406), save (before cache save at line 869, if: always())
  • Fail-open everywhere: all functions return 0, pipeline never breaks on memory errors
  • CI-only: [[ "${CI:-}" == "true" ]] guard prevents local dev side effects
  • Rejected alternatives: enhanced actions/cache (unreliable eviction), artifacts (expire), merging into shipwright-data (coupling)
  • Known limitation: ruflo memory export may not capture HNSW/Q-weights — documented inline, deferred to issue 8b ady has contents: write (line 49)
  • ruflo memory export output format is partially unknown — may only capture KV store, not HNSW indexes or Q-learning weights

Decision

Persist ruflo memory to a dedicated orphan git ref refs/ruflo-memory containing a single file memory-export.json. Four new functions in scripts/lib/ruflo-adapter.sh:

  1. ruflo_ci_memory_pull() — On CI job start, fetches refs/ruflo-memory, extracts memory-export.json, feeds it to ruflo memory import. Wired into ruflo_import_memory() (line 500).

  2. ruflo_ci_memory_push() — On CI job end, runs ruflo memory export, prunes entries >90 days, fetches remote state, merges local+remote via jq -s '.[0] * .[1]' (local overwrites = newer wins), pushes via temp git repo. Retries 3x with exponential jitter. Wired into ruflo_export_memory() (line 518).

  3. ruflo_prune_memory_export(file, max_age_days) — Removes entries with timestamps older than max_age_days. Uses dual date -d (GNU) / date -v (BSD) syntax for cross-platform compat. Atomic write via tmp+mv.

  4. ruflo_merge_memory_exports(local, remote, output) — Union merge of two JSON files. jq -s '.[0] * .[1]' — local keys overwrite remote (local is freshly exported = always newer). Fallback: if jq fails, local is used as-is.

Key design choices:

  • refs/ruflo-memory (not refs/heads/ruflo-memory): Won't appear in git branch output, avoids user confusion.
  • Separate ref from shipwright-data: Different ownership semantics — ruflo memory is machine-generated learning data, shipwright-data is operational state. Separate refs avoid merge conflicts between the two systems.
  • CI guard [[ "${CI:-}" == "true" ]]: All four functions are no-ops on developer machines. No local side effects.
  • All functions return 0: Every git and jq operation is wrapped in || return 0 or || true. The pipeline never fails due to memory persistence.
  • Existing actions/cache steps remain: They serve as a fast-path fallback layer. The orphan branch is the durable layer.

Merge semantics: jq -s '.[0] * .[1]' performs shallow object merge — remote is .[0], local is .[1], so local keys overwrite. This is correct because the local export was just created and is authoritative for any key it contains. Keys only in remote are preserved (union behavior).

Alternatives Considered

  1. Enhanced actions/cache (current state) — Pros: no git operations, ~1s overhead. Cons: 7-day LRU eviction loses memory unpredictably, cache key collisions across concurrent runs, no merge semantics for concurrent writers. Rejected: unreliable for long-lived learning data.

  2. actions/upload-artifact / download-artifact — Pros: no push conflicts. Cons: artifacts expire (90 days default), no cross-workflow access, requires API queries to find latest artifact, no merge semantics. Rejected: expiration defeats the purpose of persistent memory.

  3. Merge into existing shipwright-data branch — Pros: one fewer ref to manage. Cons: couples ruflo memory lifecycle to shipwright operational state, merge conflicts between heterogeneous data, harder to reason about pruning. Rejected: separation of concerns.

Implementation Plan

  • Files to create: None
  • Files to modify:
    • scripts/lib/ruflo-adapter.sh — 4 new functions after line 1337, wiring at lines 500 and 527
    • .github/workflows/shipwright-pipeline.yml — 2 new workflow steps (restore after line 406, save before line 869)
    • scripts/sw-ruflo-adapter-test.sh — 8 new test cases
  • Dependencies: None new. Uses existing jq, git, date, mktemp.
  • Risk areas:
    • ruflo memory export may not capture HNSW/Q-weights (documented as known limitation, tracked for issue 8b)
    • Push conflicts with concurrent pipelines (mitigated by 3-retry with jitter, matching shipwright-data pattern at line 831)
    • date -d vs date -v platform divergence in pruning (mitigated by dual syntax with fallback)
    • All git operations in subshell ( cd ... ) to prevent working-dir pollution in caller

Component Diagram

┌─────────────────────────────────────────────────────────┐
│                shipwright-pipeline.yml                    │
│                                                          │
│  [Restore ruflo memory]──►ruflo_ci_memory_pull()        │
│         │                        │                       │
│         │                   git fetch refs/ruflo-memory  │
│         │                   git show → .json             │
│         │                   ruflo memory import          │
│         ▼                                                │
│  [Pipeline stages: intake→build→test→review→pr→...]     │
│         │                                                │
│         ▼                                                │
│  [Save ruflo memory]──►ruflo_ci_memory_push()           │
│    (if: always())            │                           │
│                         ruflo memory export              │
│                         ruflo_prune_memory_export()      │
│                         ruflo_merge_memory_exports()     │
│                         git push refs/ruflo-memory       │
└──────────────────────────┬──────────────────────────────┘
                           │
                           ▼
              ┌────────────────────────┐
              │  refs/ruflo-memory     │
              │  (orphan git ref)      │
              │                        │
              │  memory-export.json    │
              └────────────────────────┘

Components (4):

  1. Workflow layer (shipwright-pipeline.yml) — Two new steps that call the adapter functions
  2. Adapter functions (ruflo-adapter.sh) — Pull/push orchestration with CI guard
  3. Data functions (ruflo-adapter.sh) — Prune and merge, pure JSON transforms
  4. Storage (refs/ruflo-memory) — Single-file orphan ref, append-only (with pruning)

Dependencies flow inward: Workflow → Adapter → Data → Storage. No reverse dependencies.

Interface Contracts

# ── ruflo_ci_memory_pull() ────────────────────────────────────
# Preconditions:  CI=true, ruflo_available() == true
# Postconditions: memory-export.json written to .claude-flow/data/,
#                 ruflo memory import invoked
# Returns:        0 (always)
# Errors:         All swallowed — git fetch failure, missing ref,
#                 invalid JSON, import failure → warn + return 0
# Side effects:   git fetch, file write, ruflo memory import
# Idempotent:     Yes (re-import is safe)

# ── ruflo_ci_memory_push() ────────────────────────────────────
# Preconditions:  CI=true, ruflo_available() == true
# Postconditions: memory-export.json pushed to refs/ruflo-memory
#                 (or no-op after 3 failed attempts)
# Returns:        0 (always)
# Errors:         All swallowed — export failure, push conflict after
#                 3 retries → warn + emit_event + return 0
# Side effects:   ruflo memory export, git fetch, git push
# Idempotent:     Yes (push is a full snapshot, not delta)

# ── ruflo_prune_memory_export(file, max_age_days) ────────────
# Input:          file: path to JSON file
#                 max_age_days: integer (default 90)
# Preconditions:  file exists, valid JSON with timestamp fields
# Postconditions: entries older than max_age_days removed in-place
# Returns:        0 (always)
# Errors:         Missing file → no-op, return 0
#                 Invalid JSON → no-op, return 0
# Side effects:   file modified in-place (atomic via tmp+mv)

# ── ruflo_merge_memory_exports(local, remote, output) ────────
# Input:          local: path to local JSON, remote: path to remote JSON,
#                 output: path for merged result
# Preconditions:  both files exist, valid JSON
# Postconditions: output contains union of keys; for shared keys,
#                 local value wins (newer)
# Returns:        0 (always)
# Errors:         Missing/invalid remote → copy local to output, return 0
#                 jq failure → copy local to output, return 0
# Side effects:   output file written

Data Flow

CI Job Start:
  git fetch origin refs/ruflo-memory:refs/ruflo-memory
    │ (failure → skip, return 0)
    ▼
  git show refs/ruflo-memory:memory-export.json > /tmp/import.json
    │ (failure → skip, return 0)
    ▼
  ruflo memory import --input /tmp/import.json
    │ (failure → warn, return 0)
    ▼
  [ruflo HNSW index populated with prior learning]

CI Job End:
  ruflo memory export --output /tmp/export.json
    │ (failure → warn, return 0)
    ▼
  ruflo_prune_memory_export(/tmp/export.json, 90)
    │ removes entries > 90 days old
    ▼
  git fetch origin refs/ruflo-memory → extract remote.json
    │ (failure → use local only)
    ▼
  ruflo_merge_memory_exports(local, remote, merged)
    │ jq -s '.[0] * .[1]'  (remote=.[0], local=.[1], local wins)
    ▼
  (in subshell) git init tmpdir → commit merged → push refs/ruflo-memory
    │ retry 3x with exponential jitter on conflict
    │ (all 3 fail → warn, return 0)
    ▼
  [refs/ruflo-memory updated with latest merged snapshot]

Error Boundaries

Error Where Caught Handling
refs/ruflo-memory doesn't exist (first run) ruflo_ci_memory_pull git fetch fails → skip import, return 0
ruflo memory import fails ruflo_ci_memory_pull warn + emit_event, return 0
ruflo memory export fails ruflo_ci_memory_push warn + emit_event, return 0 (skip push)
Invalid JSON in export ruflo_prune_memory_export jq returns non-zero → file unchanged, return 0
Remote JSON missing/invalid ruflo_merge_memory_exports Use local as-is, return 0
Push conflict (concurrent) ruflo_ci_memory_push Fetch + rebase + retry, 3 attempts with jitter
All 3 push retries fail ruflo_ci_memory_push warn + emit_event, return 0 — memory lost for this run only
date -d unavailable (macOS) ruflo_prune_memory_export Fallback to date -v-Nd (BSD syntax)
Subshell git ops change cwd All git operations Wrapped in ( cd tmpdir && ... ) subshell

No error propagates to the caller. The pipeline never fails due to memory persistence.


Security Analysis

Threat Model (STRIDE)

Threat Category Risk Mitigation
Token leaked in git error output Information Disclosure Low 2>/dev/null on all git push/fetch commands; GITHUB_TOKEN is already scoped to repo
Malicious data injected into memory JSON Tampering Low Memory is consumed only by ruflo internals (jq transforms, ruflo import); no shell eval of JSON content
Memory export contains PII or secrets Information Disclosure Low Ruflo memory stores patterns/routes/Q-weights, not user data; refs/ruflo-memory is in same repo (same access control)
Unbounded memory growth causes DoS Denial of Service Medium 90-day pruning runs before every push; ruflo_with_timeout bounds operations
Concurrent push causes data loss Tampering Medium Merge semantics (union + newer wins) + 3-retry; worst case = one run's learning lost, not catastrophic

Auth Flow

Not applicable — uses existing GITHUB_TOKEN with contents: write already granted at line 49 of the workflow. No new authentication or session management introduced.

Input Validation Points

Entry Point Input Validation
ruflo memory export output JSON file Validated by jq during prune/merge; invalid JSON → no-op
refs/ruflo-memory content JSON from git Validated by jq during merge; invalid → use local only
max_age_days parameter Integer Used in date arithmetic only; non-integer → date command fails → prune skipped

Security Checklist

  • No secrets in code — GITHUB_TOKEN injected via workflow env, not hardcoded
  • Git error output suppressed (2>/dev/null) — no token leakage in logs
  • No shell eval of JSON content — all processing via jq
  • Same-repo ref — no cross-repo data exposure
  • Pruning prevents unbounded growth
  • No new permissions required — existing contents: write is sufficient

Data Pipeline Analysis

Schema Changes

Not applicable — no database. The "schema" is the JSON structure of memory-export.json, defined by ruflo memory export (opaque to us). We treat it as an arbitrary JSON object and merge at the top-level key level.

Data Flow Diagram

ruflo memory export ──► memory-export.json (local)
                              │
                    ┌─────────┼─────────────┐
                    ▼                        ▼
          ruflo_prune (in-place)    git fetch remote.json
                    │                        │
                    └─────────┬──────────────┘
                              ▼
                    ruflo_merge_memory_exports
                    (remote=base, local=override)
                              │
                              ▼
                    merged.json ──► git push refs/ruflo-memory
                                         │
                              ┌──────────┤ (failure?)
                              ▼          ▼
                         [success]  [retry with fetch-merge-push]

Failure points: git fetch (remote unavailable), git push (conflict), jq (invalid JSON). All handled with fallback-to-local or return 0.

Idempotency Strategy

  • Pull is idempotent: Re-importing the same JSON into ruflo is safe (key-based upsert).
  • Push is idempotent: Each push is a full snapshot commit to the orphan ref. Re-pushing the same content produces git diff --cached --quiet → no commit → no push.
  • Merge is deterministic: Given the same local and remote inputs, jq -s '.[0] * .[1]' always produces the same output.

Rollback Plan

  1. Delete the ref: git push origin :refs/ruflo-memory removes the orphan ref entirely
  2. Revert to cache-only: Remove the two workflow steps and two wiring lines in ruflo-adapter.sh; the existing actions/cache steps continue working unchanged
  3. No data migration needed: The orphan ref and the cache are independent storage layers

Validation Criteria

  • ruflo_ci_memory_pull() restores memory from orphan branch when CI=true and ref exists
  • ruflo_ci_memory_pull() is a silent no-op when ref doesn't exist (first run)
  • ruflo_ci_memory_push() persists memory to refs/ruflo-memory in CI
  • ruflo_ci_memory_push() handles push conflict via fetch-merge-retry (3 attempts)
  • ruflo_prune_memory_export() removes entries older than 90 days
  • ruflo_prune_memory_export() is a no-op on missing or invalid JSON
  • ruflo_merge_memory_exports() produces union of keys with local winning
  • ruflo_merge_memory_exports() falls back to local-only on jq failure
  • All four functions return 0 on every code path (pipeline never fails on memory errors)
  • All four functions are no-ops when CI != true (no local dev side effects)
  • Existing actions/cache steps remain functional as fallback layer
  • 8 unit tests pass in scripts/sw-ruflo-adapter-test.sh
  • Full test suite (npm test) shows no regressions
  • HNSW/Q-weight export limitation documented inline in ruflo_ci_memory_push()
  • No GITHUB_TOKEN leakage in git operation stderr

Clone this wiki locally