feat(bet1): productionize reuse-under-drift + validate on a real learned-GNN trajectory (ADR-202 WIN)#537
Open
shaal wants to merge 11 commits into
Open
feat(bet1): productionize reuse-under-drift + validate on a real learned-GNN trajectory (ADR-202 WIN)#537shaal wants to merge 11 commits into
shaal wants to merge 11 commits into
Conversation
Productionize BET 1 (ADR-200 WIN under synthetic drift) by wiring re-weight + periodic-rebuild into the ruvector-diskann loop behind a feature flag, validated on a REAL contrastive-link-prediction embedding trajectory on ogbn-arxiv (ADR-200 next-step ruvnet#4). Gate frozen before any contender run (prove-not-hype): WIN = ReweightOnly within 2% recall@10 of AlwaysRebuild + Periodic{k} within 1% at <=50% cumulative rebuild cost; KILL = no transfer from synthetic to real drift. Minimum-drift precondition (>=15% top-10 churn) guards against a vacuous pass. Self-contained off main; independent of PR ruvnet#535. Outcome -> ADR-202. Linked: ruvnet#534
DriftingIndex wraps a VamanaGraph and owns only the rebuild decision
(RebuildPolicy: AlwaysRebuild / ReweightOnly / Periodic{k}); the consumer
owns the drifting vectors and passes snapshots to on_metric_update + search.
Native reuse hook: greedy_search takes vectors externally, so adapt-to-drift
recomputes only distances. Feature-gated (reuse-under-drift, default off) —
default build byte-identical. 5 unit tests green (cadence + search).
Refs ruvnet#534
examples/diskann_real_trajectory.rs: generates a REAL learned-GNN metric trajectory via contrastive link-prediction (InfoNCE over ogbn-arxiv citations, ruvector-gnn Optimizer + info_nce_loss, embeddings on the unit sphere so cosine==dot and L2 ranking agrees), then drives the diskann reuse policy (DriftingIndex) through all four contenders step-by-step. Result (n=20k, gradual trajectory to 67% churn): - WIN. Reuse holds within 2% recall@10 of full rebuild up to 40% top-10 churn (>= ADR-200's synthetic ~36% regime) -- transfer confirmed on real learned drift. Stale control collapses 92%->33% (teeth). - Periodic recovers the high-churn tail: P k=4 = 98.7% (gap -0.01%) at 24% of rebuild cost, evals 1.00x B. ADR-200 hybrid reproduced on real drift. - Honest caveat: pure reuse past the ceiling decays (-4.73% over the whole overdriven trajectory, 1.05x evals); the shippable periodic policy does not. Refs ruvnet#534
…ectory Outcome ADR for BET 1 productionization (closes ADR-200 next-step ruvnet#4). Fixed-topology reuse + periodic rebuild, validated on a real contrastive- link-prediction trajectory over ogbn-arxiv (not synthetic A(t)). WIN at n=20k AND n=50k: pure reuse holds within 2% recall@10 of full rebuild up to a 40% top-10 churn ceiling (identical at both scales, >= ADR-200's synthetic ~36%); Periodic{k:4} recovers the high-churn tail to within 0.01% (20k) / above rebuild (50k) at 20-24% of rebuild cost, equal per-query work. Stale control collapses (teeth). Honest caveat: pure reuse past the ceiling decays -- the shippable policy is periodic, not never. Refs ruvnet#534
Open
5 tasks
…plumbing Pre-register (frozen before any run) the ADR-200 next-step #2 bet: does a sampled-recall rebuild trigger beat fixed Periodic{k} under VARIABLE-RATE drift, and beat the Frobenius monitor ADR-200 found wanting? Honest test = the (rebuilds, recall) Pareto frontier; WIN = trigger >=25% fewer rebuilds at matched recall with probe cost counted; KILL = no frontier dominance. Plumbing (allowed pre-freeze): DriftingIndex::force_rebuild + harness. Refs ruvnet#534
…t run was VOID)
The first variable-rate run was VOID (0% churn): plain SGD at lr 0.002-0.03
on unit-normalized embeddings doesn't move them. Switched to Adam (real
motion in bursts), n=20k for edge density, and ENFORCED the >=15% churn
precondition (abort before rendering a verdict) so a no-drift trajectory
can't masquerade as a result. Gate criteria unchanged.
Result (n=20k, bursty trajectory, per-step Δchurn ~45 burst / ~2 calm,
89% end churn): WIN. Recall{floor=0.95} = 97.2% @ 7 rebuilds beats
Periodic{k=2} (96.8% @ 12) on BOTH axes; probe cost ~1s vs ~73s rebuild
time saved (trap passed); beats best Frobenius (97.3% @ 9) on rebuilds.
Refs ruvnet#534
The sampled-recall trigger WON (ADR-200 next-step #2): under bursty drift it uses ~42% fewer rebuilds than fixed Periodic{k} at matched recall, beats the Frobenius monitor ADR-200 found wanting, and passes the probe-cost trap (~1s probe vs ~73s rebuild saved). Productionized as RecallTrigger in ruvector_diskann::reuse (DriftingIndex in ReweightOnly mode + a probe-driven force_rebuild); its knob 'floor' IS the recall SLA, unlike k/tau. 8 reuse tests (incl. holds-under-no-drift + fires-then-recovers). ADR-202 addendum records the result; pre-registration carries the WIN outcome pointer. Refs ruvnet#534
…ctory Frozen-before-run generality check of ADR-202's 40% holding ceiling: does it generalize beyond contrastive link-prediction to a DIFFERENT learned objective? Adds a node-classification trajectory (real arxiv 40-class labels, CE on a linear head, embeddings as params) selectable via an 'objective=nodeclass' arg to the existing harness — same contenders + 2% gate, only the objective changes. CONFIRM = holding ceiling >=30% churn + periodic recovers; CAVEAT = <20% or materially different (reportable). Refs ruvnet#534
…y caveat Node-classification trajectory (2nd objective) holds reuse within 2% of rebuild up to a 54% churn ceiling (>= link-pred's 40%) -> the ADR-202 holding-ceiling result GENERALIZES across two learned objectives; the objective-dependence caveat is resolved. Honest finding (reported, not buried): past ~60% churn node-class CE collapses embeddings into ~40 class blobs where recall@10 is ill-posed (intra-blob near-ties) and the FULL-REBUILD baseline itself destabilizes (B swings 55-96%). The trajectory-wide 'reuse > rebuild +4.3%' is a benchmark-degeneracy artifact (ADR-200's t=0.25 dip amplified), NOT a genuine superiority claim. Operational conclusion unaffected (reuse+periodic never worse). ADR-202 addendum + next-step ruvnet#5 (collapse-aware metric). Refs ruvnet#534
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BET 1 productionize — fixed-topology reuse + periodic rebuild on a real learned-GNN trajectory
Closes ADR-200's named open frontier (next-step #4). Wires the reuse-under-drift policy into the production
ruvector-diskannloop behind a feature flag, and validates it on a genuine learned-GNN embedding trajectory (contrastive link-prediction over ogbn-arxiv) — not the syntheticA(t)transforms of ADR-200.Outcome ADR: ADR-202 — WIN. Gate was pre-registered and frozen before any contender run (
docs/plans/bet1-productionize/PRE-REGISTRATION.md), per the prove-not-hype protocol.Result (WIN at n=20k and n=50k)
ReweightOnlyholds within 2% recall@10 of full rebuild up to a 40% top-10 churn ceiling — identical at n=20k and n=50k, at/beyond ADR-200's synthetic ~36%. At low churn reuse is occasionally above rebuild.Periodic{k:4}gap −0.01% (20k) / above rebuild (50k) at 20–24% of rebuild cost, equal per-query work. ADR-200's hybrid finding reproduced on real drift.What's in the PR
crates/ruvector-diskann/src/reuse.rs(featurereuse-under-drift, default off → shipping build byte-identical):RebuildPolicy{AlwaysRebuild, ReweightOnly, Periodic{k}}+DriftingIndex. The index owns only the rebuild decision; the consumer (GNN) owns the drifting embeddings and passes snapshots toon_metric_update/search. Native reuse hook —greedy_searchalready takes vectors externally. 5 unit tests.crates/ruvector-gnn/examples/diskann_real_trajectory.rs: the validation harness (trajectory generator + 4 contenders + gate eval). Embeddings live on the unit sphere so the contrastive metric and the diskann L2 ranking agree.docs/adr/ADR-202-*.md+docs/plans/bet1-productionize/PRE-REGISTRATION.md.Honesty notes
GraphMAE::train_steptakes&selfand does not update weights — it cannot produce drift. The trajectory is built from the repo's genuine learnable primitives (Optimizer+info_nce_loss+ SGD on embeddings), documented up front.Self-contained off
main; depends only onruvector-diskann+ruvector-gnn. Independent of #535.Refs #534