Pipeline Plan 327

I now have all the context I need. Here's the implementation plan:

Implementation Plan: Store Test Stage Results in Ruflo & Recall Flakiness Patterns

Files to Modify

scripts/lib/pipeline-stages-build.sh — stage_test() function (lines 596–726)
scripts/sw-ruflo-adapter-test.sh — add new test sections at end (before print_test_results)

Implementation Steps

1. Add ruflo recall at the start of `stage_test()` (after line 618, before running tests)

Insert a block after info "Running tests: ..." (line 618) but before bash -c "$test_cmd" (line 620). Pattern follows stage_test_first lines 21–31:

# ── Recall historical flakiness patterns from ruflo ──────────────────
local _ruflo_flakiness_ctx=""
if declare -f ruflo_recall >/dev/null 2>&1 && \
   declare -f ruflo_available >/dev/null 2>&1 && \
   ruflo_available; then
    _ruflo_flakiness_ctx=$(ruflo_recall "test flakiness patterns failures" \
        "pipeline-${SHIPWRIGHT_PIPELINE_ID:-unknown}" 2>/dev/null || true)
    _ruflo_flakiness_ctx=$(printf '%.2000s' "${_ruflo_flakiness_ctx:-}")
    if [[ -n "$_ruflo_flakiness_ctx" ]]; then
        info "Ruflo recall: historical test patterns found"
        info "${DIM}${_ruflo_flakiness_ctx}${RESET}"
    fi
fi

Key points:

declare -f guards per memory feedback (feedback_ruflo_declare_f_guard.md)
printf '%.2000s' truncation per memory feedback (feedback_ruflo_recall_plain_text.md)
Namespace pipeline-<PIPELINE_ID> per acceptance criteria
Logged for human visibility, does not gate execution
Fail-open: || true on recall, empty string fallback

2. Move/restructure ruflo store to fire on BOTH pass and fail (lines 635–723)

Current problem: ruflo_store at line 719 only fires on pass (the failure path return 1s at line 665). Fix by:

Adding a store call inside the failure branch (before return 1 at line 665)
Enriching the existing pass-side store (line 719–722) with more data

Failure path — insert before return 1 at line 665:

# Store failed test result in ruflo for flakiness tracking
if declare -f ruflo_store >/dev/null 2>&1 && \
   declare -f ruflo_available >/dev/null 2>&1 && \
   ruflo_available; then
    local _fail_test_count
    _fail_test_count=$(grep -cE 'PASS|FAIL|✓|✗|ok [0-9]' "$test_log" 2>/dev/null || echo "0")
    ruflo_store "stage-test-result" \
        "Tests FAILED (exit $test_exit). Count: ${_fail_test_count}. Cmd: ${test_cmd}. Coverage: 0%. Time: $(date -u +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || echo unknown)." \
        "pipeline-${SHIPWRIGHT_PIPELINE_ID:-unknown}" \
        "test,stage_test,failed" 2>/dev/null || true
fi

Pass path — replace lines 718–723 with enriched store:

# Store test results in ruflo for cross-stage context and flakiness tracking
if declare -f ruflo_store >/dev/null 2>&1 && \
   declare -f ruflo_available >/dev/null 2>&1 && \
   ruflo_available; then
    local _pass_test_count
    _pass_test_count=$(grep -cE 'PASS|FAIL|✓|✗|ok [0-9]' "$test_log" 2>/dev/null || echo "0")
    ruflo_store "stage-test-result" \
        "Tests PASSED. Count: ${_pass_test_count}. Cmd: ${test_cmd}. Coverage: ${_cov_pct:-0}%. Time: $(date -u +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || echo unknown)." \
        "pipeline-${SHIPWRIGHT_PIPELINE_ID:-unknown}" \
        "test,stage_test,passed" 2>/dev/null || true
fi

3. Add tests in `sw-ruflo-adapter-test.sh`

Insert before print_test_results (line 2531). Add 4 test sections:

stage_test recall: ruflo_recall called at start — mock ruflo_recall, verify it's called with correct namespace and query
stage_test recall: output logged for visibility — verify recall output appears in info log
stage_test store: ruflo_store called on pass — mock test pass, verify store called with passed tag and correct data (coverage, count, cmd, timestamp)
stage_test store: ruflo_store called on fail — mock test fail, verify store called with failed tag

Each test stubs ruflo_available, ruflo_recall, ruflo_store as local functions that log args to temp files, then asserts against those logs. Pattern matches existing stage_test_first tests at lines 2400–2528.

Task Checklist

Testing Approach

Test Pyramid Breakdown:

Unit tests (4 new): All 4 tests are unit-level, mocking ruflo_recall/ruflo_store/ruflo_available as stub functions that log their arguments to temp files. Assertions check those logs.
Integration tests (0): Not needed — the ruflo adapter integration is already covered by existing tests.
E2E tests (0): Not applicable — pipeline-level E2E would be a separate concern.

Coverage Targets:

100% of the new code paths (recall-at-start, store-on-pass, store-on-fail)
Both the "ruflo available" and "ruflo unavailable" branches are covered (the unavailable branch is implicitly tested by existing guard tests)

Critical Paths to Test:

Happy path: Ruflo available, recall returns history, tests pass, store called with pass data
Error case 1: Tests fail — store still called with failure data (the main gap this issue fixes)
Error case 2: Ruflo unavailable — recall returns empty, store skipped, test execution unaffected
Edge case 1: SHIPWRIGHT_PIPELINE_ID unset — namespace falls back to pipeline-unknown
Edge case 2: Recall returns empty string — no flakiness info logged, execution proceeds normally

Definition of Done

ruflo_recall() called at start of stage_test() with pipeline-<PIPELINE_ID> namespace
Recall results logged via info but do not gate test execution
ruflo_store() called after BOTH pass and fail outcomes with: pass/fail, coverage %, test count, test cmd, timestamp
Storage namespace: pipeline-<PIPELINE_ID>
All ruflo calls guarded with declare -f + ruflo_available (fail-open)
4 new tests in sw-ruflo-adapter-test.sh covering recall and store on pass/fail
npm test passes with no regressions

Security/API/Skill Notes

Endpoint Specification: N/A — no new API endpoints; this is shell-to-CLI integration.
Error Codes: N/A — all ruflo calls are fail-open (|| true).
Rate Limiting: N/A — ruflo CLI has built-in circuit-breaker timeout.
Versioning: N/A — no API versioning change.
Threat Model (STRIDE): N/A — no new attack surface; ruflo data is local, no secrets stored.
Auth Flow: N/A — no authentication changes.
Input Validation Points: The test_cmd value stored in ruflo is already validated/auto-detected by the existing pipeline config parsing. The recall query is a hardcoded string.
Security Checklist: No secrets in stored data (only test counts/coverage/cmd), no user input flows into ruflo keys without existing sanitization.

Pipeline Plan 327

Implementation Plan: Store Test Stage Results in Ruflo & Recall Flakiness Patterns

Files to Modify

Implementation Steps

1. Add ruflo recall at the start of stage_test() (after line 618, before running tests)

2. Move/restructure ruflo store to fire on BOTH pass and fail (lines 635–723)

3. Add tests in sw-ruflo-adapter-test.sh

Task Checklist

Testing Approach

Definition of Done

Security/API/Skill Notes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

1. Add ruflo recall at the start of `stage_test()` (after line 618, before running tests)

3. Add tests in `sw-ruflo-adapter-test.sh`