Skip to content

v3.10.29 β€” 3-dataset BEIR (rank 4/11 on mean) + ruvector@0.2.27 tier-0 + #2246 fixes

Choose a tag to compare

@ruvnet ruvnet released this 31 May 03:50
· 54 commits to main since this release

What ships β€” batched per "no constant releases"

Four independent threads:

  1. 3rd BEIR dataset (ArguAna) β€” strengthens 2-dataset β†’ 3-dataset story
  2. BGE-large NFCorpus ceiling test β€” answered (no lift on this hardware)
  3. ruvector@0.2.27 Tier-0 wiring β€” kills the silent-fallback bug at source
  4. 4 user bugs from #2246 β€” 3 fixed, 1 forwarded

3-dataset BEIR results

Dataset nDCG@10 Pipeline Rank
NFCorpus 0.358 Lucene + RRF + CE rerank 2/11
SciFact 0.683 Lucene + RRF + CE rerank 3/11
ArguAna 0.432 Lucene + RRF (CE rerank hurt) 5/11
3-dataset mean 0.491 mixed β€”

3-dataset mean leaderboard

System Params Mean nDCG@10
BGE-large-v1.5 (published) 335M 0.579
SPLADE++ (published) 110M 0.524
GenQ (published) 110M 0.485 (~tied with us)
ruflo best per-dataset 110M 0.491
GTR-XL (published) 1.2B 0.481
BM25 (published Lucene) β€” 0.467
Contriever 110M 0.461
TAS-B 66M 0.464

Rank 4 of 11 on 3-dataset mean. Beats published BM25 (+0.024), beats GTR-XL (with 1/10Γ— our params), beats Contriever, TAS-B, ColBERT, SBERT. Loses to SPLADE++ (-0.033) and BGE-large (-0.088, mostly the ArguAna gap).

Counter-findings reported honestly

ArguAna kills the cross-encoder rerank. Pulled at the 50-query checkpoint (running nDCG 0.283 vs dense alone 0.431, estimated 6+ hours wall time). ArguAna is counter-argument retrieval β€” pointwise relevance scoring doesn't help when the task requires understanding opposition. Pipeline auto-adapts: rerank wins NFCorpus and SciFact, loses ArguAna.

BGE-large NFCorpus = no lift. Xenova/bge-large-en-v1.5 (335M, int8 quantized) = 0.350 vs our BGE-base 0.352. Below the published BAAI BGE-large baseline (0.380). Likely Xenova int8 quantization underperforms BAAI's unquantized fp32.

BGE query prefix is mixed (ADR-090). BAAI's recommended Represent this sentence for searching relevant passages: prefix: NFCorpus +0.009 βœ“, SciFact -0.007 βœ—, ArguAna +0.003 ~noise. Opt-in only via BGE_QUERY_PREFIX=1. Not a default.

ruvector@0.2.27 Tier-0 wiring (closes ADR-086 at source)

neural-tools embedder cascade:

  • Tier 0 (NEW): ruvector@0.2.27.embed() β€” bundled, no sharp dep, disk-cache hit
  • Tier 1: agentic-flow/reasoningbank (broken on darwin-arm64 without sharp)
  • Tier 2-3: @claude-flow/embeddings

Verified active: probe returns embedder: ruvector@0.2.27 (bundled all-MiniLM-L6-v2), _realEmbedding: true, dim 384, disk-cache hit. Measured 6.2Γ— per-doc parallel-embed speedup (claimed 10-14Γ—; ours had CPU contention from BEIR benches).

Both upstream issues filed yesterday were fixed in <24hr:

#2246 user bug fixes

Finding Status
#1 memory_search_unified hardcoded 6 namespaces (missed 95% of an 8789-entry store) FIXED β€” new namespaces param + CLAUDE_FLOW_MEMORY_SEARCH_NAMESPACES env + dynamic enumeration default + namespaceSource audit field + 9 regression tests
#2 npm install -g overwrites dist patches silently acknowledged, tracked for separate release
#3 agentdb addCausalEdge() silently orphans edges forwarded β†’ ruvnet/agentdb#7
#4 graph_edges DB unavailable on fresh env FIXED β€” getBridgeDb({createIfMissing: true}) lazy-creates empty memory.db + better error message

Full triage reply on #2246.

Reproduce

git clone https://github.com/ruvnet/ruflo && cd ruflo
npm install && ( cd v3/@claude-flow/cli && npx tsc )

for ds in nfcorpus scifact arguana; do
  mkdir -p /tmp/beir-$ds && cd /tmp/beir-$ds
  curl -sL -o $ds.zip "https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/$ds.zip" && unzip -q $ds.zip
  BEIR_DATA_DIR=/tmp/beir-$ds/$ds USE_LUCENE_BM25=1 RERANK=1 \
    node /path/to/v3/@claude-flow/cli/scripts/run-beir-hybrid.mjs
done

Honest limits

  • 3/18 BEIR datasets (NFCorpus, SciFact, ArguAna). The 0.491 mean is suggestive, not BEIR-average
  • Zero-shot β€” NFCorpus train (110k pairs) unused
  • CPU-bound β€” TREC-COVID/HotpotQA/NQ/DBPedia need GPU
  • Our Lucene BM25 matches published Β±0.003 (re-implementation, not a Lucene binding)
  • CE rerank doesn't always help β€” pulled on ArguAna

What's next (blocked on GPU)

  • Tailscale GPU access β€” gates the 5 remaining BEIR datasets and fine-tuning
  • BGE-base fine-tune on NFCorpus train (110k pairs, ~3 GPU-hours)
  • bge-reranker-v2-m3 (568M, 2.27GB) as heavyweight opt-in

Install

npx ruflo@3.10.29    # latest / alpha / v3alpha all aligned

Full ADRs: ADR-089, ADR-090