feat(bet1): productionize reuse-under-drift + validate on a real learned-GNN trajectory (ADR-202 WIN) by shaal · Pull Request #537 · ruvnet/RuVector

shaal · 2026-06-04T22:09:26Z

BET 1 productionize — fixed-topology reuse + periodic rebuild on a real learned-GNN trajectory

Closes ADR-200's named open frontier (next-step #4). Wires the reuse-under-drift policy into the production ruvector-diskann loop behind a feature flag, and validates it on a genuine learned-GNN embedding trajectory (contrastive link-prediction over ogbn-arxiv) — not the synthetic A(t) transforms of ADR-200.

Outcome ADR: ADR-202 — WIN. Gate was pre-registered and frozen before any contender run (docs/plans/bet1-productionize/PRE-REGISTRATION.md), per the prove-not-hype protocol.

Result (WIN at n=20k and n=50k)

policy (n=20k)	recall@10	cumulative rebuild cost	evals/q
B always-rebuild	98.7%	246s (30 builds)	982
A reuse-only	94.0%	0s	1034
P k=4 (shippable)	98.7%	59s (24%)	983

Reuse transfers in-regime: pure ReweightOnly holds within 2% recall@10 of full rebuild up to a 40% top-10 churn ceiling — identical at n=20k and n=50k, at/beyond ADR-200's synthetic ~36%. At low churn reuse is occasionally above rebuild.
Periodic recovers the high-churn tail completely: Periodic{k:4} gap −0.01% (20k) / above rebuild (50k) at 20–24% of rebuild cost, equal per-query work. ADR-200's hybrid finding reproduced on real drift.
Teeth: the stale control collapses (92%→33% at 20k), proving the benchmark is genuinely drift-sensitive.
Honest caveat (reported, not hidden): pure reuse run past its ceiling on a deliberately overdriven trajectory decays (−4.73% to 67% churn, 1.05× evals) — which is exactly what the periodic policy is for. The shippable periodic policy carries neither penalty.

What's in the PR

crates/ruvector-diskann/src/reuse.rs (feature reuse-under-drift, default off → shipping build byte-identical): RebuildPolicy{AlwaysRebuild, ReweightOnly, Periodic{k}} + DriftingIndex. The index owns only the rebuild decision; the consumer (GNN) owns the drifting embeddings and passes snapshots to on_metric_update/search. Native reuse hook — greedy_search already takes vectors externally. 5 unit tests.
crates/ruvector-gnn/examples/diskann_real_trajectory.rs: the validation harness (trajectory generator + 4 contenders + gate eval). Embeddings live on the unit sphere so the contrastive metric and the diskann L2 ranking agree.
docs/adr/ADR-202-*.md + docs/plans/bet1-productionize/PRE-REGISTRATION.md.

Honesty notes

GraphMAE::train_step takes &self and does not update weights — it cannot produce drift. The trajectory is built from the repo's genuine learnable primitives (Optimizer + info_nce_loss + SGD on embeddings), documented up front.
The frozen gate's "within 2% over the early trajectory" clause is operationalized as the holding ceiling (max contiguous churn where reuse stays within 2%) — the regime-resolved statistic the gate named, not the trajectory-wide mean (which deliberately overdrives past the ceiling). Both statistics are reported.

Self-contained off main; depends only on ruvector-diskann + ruvector-gnn. Independent of #535.

Refs #534

Productionize BET 1 (ADR-200 WIN under synthetic drift) by wiring re-weight + periodic-rebuild into the ruvector-diskann loop behind a feature flag, validated on a REAL contrastive-link-prediction embedding trajectory on ogbn-arxiv (ADR-200 next-step ruvnet#4). Gate frozen before any contender run (prove-not-hype): WIN = ReweightOnly within 2% recall@10 of AlwaysRebuild + Periodic{k} within 1% at <=50% cumulative rebuild cost; KILL = no transfer from synthetic to real drift. Minimum-drift precondition (>=15% top-10 churn) guards against a vacuous pass. Self-contained off main; independent of PR ruvnet#535. Outcome -> ADR-202. Linked: ruvnet#534

DriftingIndex wraps a VamanaGraph and owns only the rebuild decision (RebuildPolicy: AlwaysRebuild / ReweightOnly / Periodic{k}); the consumer owns the drifting vectors and passes snapshots to on_metric_update + search. Native reuse hook: greedy_search takes vectors externally, so adapt-to-drift recomputes only distances. Feature-gated (reuse-under-drift, default off) — default build byte-identical. 5 unit tests green (cadence + search). Refs ruvnet#534

examples/diskann_real_trajectory.rs: generates a REAL learned-GNN metric trajectory via contrastive link-prediction (InfoNCE over ogbn-arxiv citations, ruvector-gnn Optimizer + info_nce_loss, embeddings on the unit sphere so cosine==dot and L2 ranking agrees), then drives the diskann reuse policy (DriftingIndex) through all four contenders step-by-step. Result (n=20k, gradual trajectory to 67% churn): - WIN. Reuse holds within 2% recall@10 of full rebuild up to 40% top-10 churn (>= ADR-200's synthetic ~36% regime) -- transfer confirmed on real learned drift. Stale control collapses 92%->33% (teeth). - Periodic recovers the high-churn tail: P k=4 = 98.7% (gap -0.01%) at 24% of rebuild cost, evals 1.00x B. ADR-200 hybrid reproduced on real drift. - Honest caveat: pure reuse past the ceiling decays (-4.73% over the whole overdriven trajectory, 1.05x evals); the shippable periodic policy does not. Refs ruvnet#534

…ectory Outcome ADR for BET 1 productionization (closes ADR-200 next-step ruvnet#4). Fixed-topology reuse + periodic rebuild, validated on a real contrastive- link-prediction trajectory over ogbn-arxiv (not synthetic A(t)). WIN at n=20k AND n=50k: pure reuse holds within 2% recall@10 of full rebuild up to a 40% top-10 churn ceiling (identical at both scales, >= ADR-200's synthetic ~36%); Periodic{k:4} recovers the high-churn tail to within 0.01% (20k) / above rebuild (50k) at 20-24% of rebuild cost, equal per-query work. Stale control collapses (teeth). Honest caveat: pure reuse past the ceiling decays -- the shippable policy is periodic, not never. Refs ruvnet#534

…plumbing Pre-register (frozen before any run) the ADR-200 next-step #2 bet: does a sampled-recall rebuild trigger beat fixed Periodic{k} under VARIABLE-RATE drift, and beat the Frobenius monitor ADR-200 found wanting? Honest test = the (rebuilds, recall) Pareto frontier; WIN = trigger >=25% fewer rebuilds at matched recall with probe cost counted; KILL = no frontier dominance. Plumbing (allowed pre-freeze): DriftingIndex::force_rebuild + harness. Refs ruvnet#534

…t run was VOID) The first variable-rate run was VOID (0% churn): plain SGD at lr 0.002-0.03 on unit-normalized embeddings doesn't move them. Switched to Adam (real motion in bursts), n=20k for edge density, and ENFORCED the >=15% churn precondition (abort before rendering a verdict) so a no-drift trajectory can't masquerade as a result. Gate criteria unchanged. Result (n=20k, bursty trajectory, per-step Δchurn ~45 burst / ~2 calm, 89% end churn): WIN. Recall{floor=0.95} = 97.2% @ 7 rebuilds beats Periodic{k=2} (96.8% @ 12) on BOTH axes; probe cost ~1s vs ~73s rebuild time saved (trap passed); beats best Frobenius (97.3% @ 9) on rebuilds. Refs ruvnet#534

The sampled-recall trigger WON (ADR-200 next-step #2): under bursty drift it uses ~42% fewer rebuilds than fixed Periodic{k} at matched recall, beats the Frobenius monitor ADR-200 found wanting, and passes the probe-cost trap (~1s probe vs ~73s rebuild saved). Productionized as RecallTrigger in ruvector_diskann::reuse (DriftingIndex in ReweightOnly mode + a probe-driven force_rebuild); its knob 'floor' IS the recall SLA, unlike k/tau. 8 reuse tests (incl. holds-under-no-drift + fires-then-recovers). ADR-202 addendum records the result; pre-registration carries the WIN outcome pointer. Refs ruvnet#534

…ctory Frozen-before-run generality check of ADR-202's 40% holding ceiling: does it generalize beyond contrastive link-prediction to a DIFFERENT learned objective? Adds a node-classification trajectory (real arxiv 40-class labels, CE on a linear head, embeddings as params) selectable via an 'objective=nodeclass' arg to the existing harness — same contenders + 2% gate, only the objective changes. CONFIRM = holding ceiling >=30% churn + periodic recovers; CAVEAT = <20% or materially different (reportable). Refs ruvnet#534

…y caveat Node-classification trajectory (2nd objective) holds reuse within 2% of rebuild up to a 54% churn ceiling (>= link-pred's 40%) -> the ADR-202 holding-ceiling result GENERALIZES across two learned objectives; the objective-dependence caveat is resolved. Honest finding (reported, not buried): past ~60% churn node-class CE collapses embeddings into ~40 class blobs where recall@10 is ill-posed (intra-blob near-ties) and the FULL-REBUILD baseline itself destabilizes (B swings 55-96%). The trajectory-wide 'reuse > rebuild +4.3%' is a benchmark-degeneracy artifact (ADR-200's t=0.25 dip amplified), NOT a genuine superiority claim. Operational conclusion unaffected (reuse+periodic never worse). ADR-202 addendum + next-step ruvnet#5 (collapse-aware metric). Refs ruvnet#534

shaal added 6 commits June 4, 2026 17:20

style(bet1): rustfmt the reuse module + trajectory harness

f18742c

docs(bet1): record WIN outcome pointer to ADR-202 in pre-registration

2bb2349

shaal mentioned this pull request Jun 4, 2026

SepRAG: CCH-inspired retrieval exploration + customizable re-weighting for self-learning ANN #534

Open

5 tasks

shaal added 5 commits June 4, 2026 18:57

This was referenced Jun 5, 2026

SepRAG BET 1 (finding): incremental reindex WINS the high-recall tier vs reuse + periodic rebuild #539

Open

SepRAG BET 4 (finding): IVF cluster-pruning is structurally redundant with tuned nprobe — NO-GO #540

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bet1): productionize reuse-under-drift + validate on a real learned-GNN trajectory (ADR-202 WIN)#537

feat(bet1): productionize reuse-under-drift + validate on a real learned-GNN trajectory (ADR-202 WIN)#537
shaal wants to merge 11 commits into
ruvnet:mainfrom
shaal:feat/seprag-bet1-reuse-under-drift

shaal commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shaal commented Jun 4, 2026

BET 1 productionize — fixed-topology reuse + periodic rebuild on a real learned-GNN trajectory

Result (WIN at n=20k and n=50k)

What's in the PR

Honesty notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant