v2.1.0 — SOTA Gap Implementations
v2.1.0 — State-of-the-Art Gap Implementations
13 new modules across 3 crates addressing gaps between RuVector and 2024-2026 research from Google, Meta, DeepSeek, and Microsoft. 8,577 lines of new code, 859 tests passing, zero regressions.
Highlights
Advanced Search & Retrieval (ruvector-core)
- Hybrid Search (RRF) — Sparse + dense vector fusion with Reciprocal Rank Fusion, SPLADE-compatible scoring. 20-49% retrieval improvement.
- Graph RAG — Knowledge graph + Leiden community detection + local/global/hybrid search. 30-60% improvement on complex multi-hop queries.
- DiskANN / Vamana — SSD-backed billion-scale ANN with alpha-RNG pruning and LRU page cache. <10ms latency.
- ColBERT Multi-Vector — Per-token late interaction retrieval with MaxSim, AvgSim, SumMax scoring.
- Matryoshka Embeddings — Adaptive-dimension search with funnel and cascade modes for speed with minimal recall loss.
- OPQ — Optimized Product Quantization with learned rotation matrix. 10-30% error reduction vs standard PQ.
- LSM Compaction — Log-Structured Merge-tree for write-heavy workloads with bloom filters.
Attention & Inference (ruvector-attention)
- FlashAttention-3 — IO-aware tiled attention reducing memory from O(N²) to O(N). Configurable block sizes, causal masking, dropout.
- Multi-Head Latent Attention (MLA) — DeepSeek-V2/V3 style KV-cache compression (~93% reduction).
- KV-Cache Compression — 3-4 bit asymmetric per-channel quantization (TurboQuant-inspired). H2O, Sliding Window, PyramidKV eviction. 6-8x memory reduction.
- Selective State Space Models (Mamba) — Linear-time sequence processing with selective scan and discretization.
- Speculative Decoding — Draft-verify pipeline with Medusa multi-head and tree attention for 2-3x generation speedup.
Graph Learning (ruvector-gnn)
- GraphMAE — Graph Masked Autoencoder with GAT encoder, SCE loss, degree-centrality masking, re-masking regularization.
Quality
- 859 Rust tests — 423 (core) + 210 (attention) + 226 (gnn), all passing
- Zero regressions from v2.0.6
- No unsafe code in any new module
- Security fixes: NaN-safe sort comparisons, quantization input validation
Published Packages
crates.io:
| Crate | Version |
|---|---|
| ruvector-core | 2.1.0 |
| ruvector-attention | 2.1.0 |
| ruvector-gnn | 2.1.0 |
| ruvector-attention-wasm | 2.1.0 |
| ruvector-gnn-wasm | 2.1.0 |
| ruvllm | 2.1.0 |
npm:
| Package | Version |
|---|---|
| ruvector | 0.2.19 |
| ruvector-wasm | 2.1.0 |
| ruvector-attention-wasm | 2.1.0 |
| ruvector-gnn-wasm | 2.1.0 |
| ruvector-attention-unified-wasm | 0.1.0 |
| @ruvector/ruvllm | 2.5.4 |
CLI Fix
- Fixed `npx ruvector create` and `benchmark` commands — `dimension` → `dimensions` field name mismatch (#307)
Documentation
- Updated root README with all new SOTA modules
- Updated npm README with v2.1 features and TurboQuant section
- Updated @ruvector/ruvllm README with TurboQuant KV-cache compression docs
- ADR-128: SOTA gap analysis and implementation documentation
Full Changelog: v2.0.6...v2.1.0
Training Pipeline (ADR-129)
Added complete GCloud training infrastructure for continuous model improvement:
- Release gate automation — 7 ship/no-ship criteria (G1-G7) with automated checker
- Dataset governance — Schema validation, dedup, contamination checks, quality scoring
- Nightly training — Incremental LoRA from pi.ruv.io brain learnings → validate → push to HF
- TurboQuant sidecar —
.turboquant.jsonper-layer KV-cache config profiles - Cloud Run Jobs — 4 GPU jobs (calibration, SFT, benchmark, nightly) + 2 schedulers
- Ablation matrix — 5-run isolation testing (baseline → imatrix → SFT → DPO → TQ)
Deploy: ./scripts/training/deploy_training.sh