Skip to content

Pipeline Plan 327

ezigus edited this page Apr 19, 2026 · 1 revision

I now have all the context I need. Here's the implementation plan:


Implementation Plan: Store Test Stage Results in Ruflo & Recall Flakiness Patterns

Files to Modify

  1. scripts/lib/pipeline-stages-build.shstage_test() function (lines 596–726)
  2. scripts/sw-ruflo-adapter-test.sh — add new test sections at end (before print_test_results)

Implementation Steps

1. Add ruflo recall at the start of stage_test() (after line 618, before running tests)

Insert a block after info "Running tests: ..." (line 618) but before bash -c "$test_cmd" (line 620). Pattern follows stage_test_first lines 21–31:

# ── Recall historical flakiness patterns from ruflo ──────────────────
local _ruflo_flakiness_ctx=""
if declare -f ruflo_recall >/dev/null 2>&1 && \
   declare -f ruflo_available >/dev/null 2>&1 && \
   ruflo_available; then
    _ruflo_flakiness_ctx=$(ruflo_recall "test flakiness patterns failures" \
        "pipeline-${SHIPWRIGHT_PIPELINE_ID:-unknown}" 2>/dev/null || true)
    _ruflo_flakiness_ctx=$(printf '%.2000s' "${_ruflo_flakiness_ctx:-}")
    if [[ -n "$_ruflo_flakiness_ctx" ]]; then
        info "Ruflo recall: historical test patterns found"
        info "${DIM}${_ruflo_flakiness_ctx}${RESET}"
    fi
fi

Key points:

  • declare -f guards per memory feedback (feedback_ruflo_declare_f_guard.md)
  • printf '%.2000s' truncation per memory feedback (feedback_ruflo_recall_plain_text.md)
  • Namespace pipeline-<PIPELINE_ID> per acceptance criteria
  • Logged for human visibility, does not gate execution
  • Fail-open: || true on recall, empty string fallback

2. Move/restructure ruflo store to fire on BOTH pass and fail (lines 635–723)

Current problem: ruflo_store at line 719 only fires on pass (the failure path return 1s at line 665). Fix by:

  • Adding a store call inside the failure branch (before return 1 at line 665)
  • Enriching the existing pass-side store (line 719–722) with more data

Failure path — insert before return 1 at line 665:

# Store failed test result in ruflo for flakiness tracking
if declare -f ruflo_store >/dev/null 2>&1 && \
   declare -f ruflo_available >/dev/null 2>&1 && \
   ruflo_available; then
    local _fail_test_count
    _fail_test_count=$(grep -cE 'PASS|FAIL|✓|✗|ok [0-9]' "$test_log" 2>/dev/null || echo "0")
    ruflo_store "stage-test-result" \
        "Tests FAILED (exit $test_exit). Count: ${_fail_test_count}. Cmd: ${test_cmd}. Coverage: 0%. Time: $(date -u +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || echo unknown)." \
        "pipeline-${SHIPWRIGHT_PIPELINE_ID:-unknown}" \
        "test,stage_test,failed" 2>/dev/null || true
fi

Pass path — replace lines 718–723 with enriched store:

# Store test results in ruflo for cross-stage context and flakiness tracking
if declare -f ruflo_store >/dev/null 2>&1 && \
   declare -f ruflo_available >/dev/null 2>&1 && \
   ruflo_available; then
    local _pass_test_count
    _pass_test_count=$(grep -cE 'PASS|FAIL|✓|✗|ok [0-9]' "$test_log" 2>/dev/null || echo "0")
    ruflo_store "stage-test-result" \
        "Tests PASSED. Count: ${_pass_test_count}. Cmd: ${test_cmd}. Coverage: ${_cov_pct:-0}%. Time: $(date -u +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null || echo unknown)." \
        "pipeline-${SHIPWRIGHT_PIPELINE_ID:-unknown}" \
        "test,stage_test,passed" 2>/dev/null || true
fi

3. Add tests in sw-ruflo-adapter-test.sh

Insert before print_test_results (line 2531). Add 4 test sections:

  1. stage_test recall: ruflo_recall called at start — mock ruflo_recall, verify it's called with correct namespace and query
  2. stage_test recall: output logged for visibility — verify recall output appears in info log
  3. stage_test store: ruflo_store called on pass — mock test pass, verify store called with passed tag and correct data (coverage, count, cmd, timestamp)
  4. stage_test store: ruflo_store called on fail — mock test fail, verify store called with failed tag

Each test stubs ruflo_available, ruflo_recall, ruflo_store as local functions that log args to temp files, then asserts against those logs. Pattern matches existing stage_test_first tests at lines 2400–2528.

Task Checklist

  • Task 1: Add ruflo recall block at start of stage_test() — query pipeline-<PIPELINE_ID> for historical test/flakiness patterns
  • Task 2: Log recall results via info for human visibility (no gating)
  • Task 3: Add ruflo store in the failure path (before return 1) with pass/fail, coverage, test count, cmd, timestamp
  • Task 4: Enrich the existing pass-path ruflo store with test count, cmd, timestamp, and tags
  • Task 5: Add declare -f guards + ruflo_available checks on all new ruflo calls
  • Task 6: Add test — recall-before-test: verify ruflo_recall invoked with correct namespace
  • Task 7: Add test — recall output logged for human visibility
  • Task 8: Add test — store-after-test (pass path): verify store args contain passed, coverage, cmd
  • Task 9: Add test — store-after-test (fail path): verify store called with failed tag
  • Task 10: Run npm test and confirm all existing tests pass

Testing Approach

Test Pyramid Breakdown:

  • Unit tests (4 new): All 4 tests are unit-level, mocking ruflo_recall/ruflo_store/ruflo_available as stub functions that log their arguments to temp files. Assertions check those logs.
  • Integration tests (0): Not needed — the ruflo adapter integration is already covered by existing tests.
  • E2E tests (0): Not applicable — pipeline-level E2E would be a separate concern.

Coverage Targets:

  • 100% of the new code paths (recall-at-start, store-on-pass, store-on-fail)
  • Both the "ruflo available" and "ruflo unavailable" branches are covered (the unavailable branch is implicitly tested by existing guard tests)

Critical Paths to Test:

  • Happy path: Ruflo available, recall returns history, tests pass, store called with pass data
  • Error case 1: Tests fail — store still called with failure data (the main gap this issue fixes)
  • Error case 2: Ruflo unavailable — recall returns empty, store skipped, test execution unaffected
  • Edge case 1: SHIPWRIGHT_PIPELINE_ID unset — namespace falls back to pipeline-unknown
  • Edge case 2: Recall returns empty string — no flakiness info logged, execution proceeds normally

Definition of Done

  • ruflo_recall() called at start of stage_test() with pipeline-<PIPELINE_ID> namespace
  • Recall results logged via info but do not gate test execution
  • ruflo_store() called after BOTH pass and fail outcomes with: pass/fail, coverage %, test count, test cmd, timestamp
  • Storage namespace: pipeline-<PIPELINE_ID>
  • All ruflo calls guarded with declare -f + ruflo_available (fail-open)
  • 4 new tests in sw-ruflo-adapter-test.sh covering recall and store on pass/fail
  • npm test passes with no regressions

Security/API/Skill Notes

  • Endpoint Specification: N/A — no new API endpoints; this is shell-to-CLI integration.
  • Error Codes: N/A — all ruflo calls are fail-open (|| true).
  • Rate Limiting: N/A — ruflo CLI has built-in circuit-breaker timeout.
  • Versioning: N/A — no API versioning change.
  • Threat Model (STRIDE): N/A — no new attack surface; ruflo data is local, no secrets stored.
  • Auth Flow: N/A — no authentication changes.
  • Input Validation Points: The test_cmd value stored in ruflo is already validated/auto-detected by the existing pipeline config parsing. The recall query is a hardcoded string.
  • Security Checklist: No secrets in stored data (only test counts/coverage/cmd), no user input flows into ruflo keys without existing sanitization.

Clone this wiki locally