Skip to content

v2.1.0 — SOTA Gap Implementations

Choose a tag to compare

@github-actions github-actions released this 27 Mar 21:19
· 2469 commits to main since this release

v2.1.0 — State-of-the-Art Gap Implementations

13 new modules across 3 crates addressing gaps between RuVector and 2024-2026 research from Google, Meta, DeepSeek, and Microsoft. 8,577 lines of new code, 859 tests passing, zero regressions.

Highlights

Advanced Search & Retrieval (ruvector-core)

  • Hybrid Search (RRF) — Sparse + dense vector fusion with Reciprocal Rank Fusion, SPLADE-compatible scoring. 20-49% retrieval improvement.
  • Graph RAG — Knowledge graph + Leiden community detection + local/global/hybrid search. 30-60% improvement on complex multi-hop queries.
  • DiskANN / Vamana — SSD-backed billion-scale ANN with alpha-RNG pruning and LRU page cache. <10ms latency.
  • ColBERT Multi-Vector — Per-token late interaction retrieval with MaxSim, AvgSim, SumMax scoring.
  • Matryoshka Embeddings — Adaptive-dimension search with funnel and cascade modes for speed with minimal recall loss.
  • OPQ — Optimized Product Quantization with learned rotation matrix. 10-30% error reduction vs standard PQ.
  • LSM Compaction — Log-Structured Merge-tree for write-heavy workloads with bloom filters.

Attention & Inference (ruvector-attention)

  • FlashAttention-3 — IO-aware tiled attention reducing memory from O(N²) to O(N). Configurable block sizes, causal masking, dropout.
  • Multi-Head Latent Attention (MLA) — DeepSeek-V2/V3 style KV-cache compression (~93% reduction).
  • KV-Cache Compression — 3-4 bit asymmetric per-channel quantization (TurboQuant-inspired). H2O, Sliding Window, PyramidKV eviction. 6-8x memory reduction.
  • Selective State Space Models (Mamba) — Linear-time sequence processing with selective scan and discretization.
  • Speculative Decoding — Draft-verify pipeline with Medusa multi-head and tree attention for 2-3x generation speedup.

Graph Learning (ruvector-gnn)

  • GraphMAE — Graph Masked Autoencoder with GAT encoder, SCE loss, degree-centrality masking, re-masking regularization.

Quality

  • 859 Rust tests — 423 (core) + 210 (attention) + 226 (gnn), all passing
  • Zero regressions from v2.0.6
  • No unsafe code in any new module
  • Security fixes: NaN-safe sort comparisons, quantization input validation

Published Packages

crates.io:

Crate Version
ruvector-core 2.1.0
ruvector-attention 2.1.0
ruvector-gnn 2.1.0
ruvector-attention-wasm 2.1.0
ruvector-gnn-wasm 2.1.0
ruvllm 2.1.0

npm:

Package Version
ruvector 0.2.19
ruvector-wasm 2.1.0
ruvector-attention-wasm 2.1.0
ruvector-gnn-wasm 2.1.0
ruvector-attention-unified-wasm 0.1.0
@ruvector/ruvllm 2.5.4

CLI Fix

  • Fixed `npx ruvector create` and `benchmark` commands — `dimension` → `dimensions` field name mismatch (#307)

Documentation

  • Updated root README with all new SOTA modules
  • Updated npm README with v2.1 features and TurboQuant section
  • Updated @ruvector/ruvllm README with TurboQuant KV-cache compression docs
  • ADR-128: SOTA gap analysis and implementation documentation

Full Changelog: v2.0.6...v2.1.0


Training Pipeline (ADR-129)

Added complete GCloud training infrastructure for continuous model improvement:

  • Release gate automation — 7 ship/no-ship criteria (G1-G7) with automated checker
  • Dataset governance — Schema validation, dedup, contamination checks, quality scoring
  • Nightly training — Incremental LoRA from pi.ruv.io brain learnings → validate → push to HF
  • TurboQuant sidecar.turboquant.json per-layer KV-cache config profiles
  • Cloud Run Jobs — 4 GPU jobs (calibration, SFT, benchmark, nightly) + 2 schedulers
  • Ablation matrix — 5-run isolation testing (baseline → imatrix → SFT → DPO → TQ)

Deploy: ./scripts/training/deploy_training.sh