You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Missing ANN regime: 2–4-bit scalar-quantized FastScan search index
Summary
ruvector has strong ANN coverage but is missing the 2–4-bit scalar-quantized search regime that every mainstream production vector DB lives in (FAISS IndexPQFastScan, Qdrant, Milvus IVF-SQ). This issue makes the case for adding it as a small, self-contained crate (ruvector-turbovec, proposed in ADR-194). A working scalar-reference milestone (M1) is implemented and validated — see the companion PR #521.
The gap
Crate
Index family
Quantization
Suitable for "search 10M vectors for nearest 10"?
ruvector-core
HNSW (graph)
none (f32)
yes, but full f32 memory
ruvector-diskann
DiskANN (graph)
none / PQ-ish
yes
ruvector-rairs
IVF (ADR-193)
optional
yes
ruvector-rabitq
Flat + rotation
1-bit only
yes, but 1-bit caps recall
ruvllmturbo_quant.rs
— (value codec)
2.5–4 bit
no — KV-cache compressor, not an index
Two things sit near this regime but neither closes it:
ruvector-rabitq is 1-bit. It's excellent when memory dominates, but on 1536–3072-dim production embeddings (OpenAI text-embedding-3, Cohere, Voyage) you need an exact-rerank pass over the raw f32 originals to hit production R@1 — which re-inflates memory back toward the f32 footprint, partly defeating the point.
ruvllm/src/quantize/turbo_quant.rs is a TurboQuant value codec for transformer KV-cache / embedding compression. It has no inverted lists, no top-k heap, no FastScan LUT kernel, no filtered/allowlist search, no stable IDs/deletion. Wrong abstraction for search.
What's missing, concretely
2–4 bits per dimension (not 1, not 32).
Codebook-free scalar quantization — no k-means training, online ingest.
A FastScan-style nibble-LUT SIMD kernel that scores 16/32 candidates per vector instruction without ever materializing f32.
Competitive recall without a mandatory f32 rerank pass.
Why it matters
Memory/recall trade-off that's actually new. The per-vector length-renormalization scalar (c_x) makes the inner-product estimator unbiased, so 2/4-bit recall holds without the rerank tax that 1-bit RaBitQ pays. The memory win is real, not borrowed-back at query time.
Parity with the field. It puts ruvector in the same regime as FAISS IndexPQFastScan / Milvus IVF-SQ, which is what most teams reach for when they outgrow flat f32 but can't afford 1-bit's recall hit.
Reuse, not sprawl. It implements the existingruvector_rabitq::AnnIndex trait and reuses RandomRotation — so it drops into the same dispatcher/consumers with no new plumbing, and adds exactly one workspace crate.
The scalar M1 scorer is deliberately the determinism oracle the SIMD kernels must match bit-for-bit, so the perf work lands without risking correctness.
Ask
Adopt ADR-194 and land M1 (#521) as the foundation, then iterate on the SIMD milestones. Feedback welcome on the reuse boundary (rabitq trait/rotation), the ADR-193 IVF composition, and whether the Lloyd–Max math should be hoisted into a shared ruvector-math crate rather than living in the codec.
Missing ANN regime: 2–4-bit scalar-quantized FastScan search index
Summary
ruvector has strong ANN coverage but is missing the 2–4-bit scalar-quantized search regime that every mainstream production vector DB lives in (FAISS
IndexPQFastScan, Qdrant, Milvus IVF-SQ). This issue makes the case for adding it as a small, self-contained crate (ruvector-turbovec, proposed in ADR-194). A working scalar-reference milestone (M1) is implemented and validated — see the companion PR #521.The gap
ruvector-coreruvector-diskannruvector-rairsruvector-rabitqruvllmturbo_quant.rsTwo things sit near this regime but neither closes it:
ruvector-rabitqis 1-bit. It's excellent when memory dominates, but on 1536–3072-dim production embeddings (OpenAItext-embedding-3, Cohere, Voyage) you need an exact-rerank pass over the raw f32 originals to hit production R@1 — which re-inflates memory back toward the f32 footprint, partly defeating the point.ruvllm/src/quantize/turbo_quant.rsis a TurboQuant value codec for transformer KV-cache / embedding compression. It has no inverted lists, no top-k heap, no FastScan LUT kernel, no filtered/allowlist search, no stable IDs/deletion. Wrong abstraction for search.What's missing, concretely
Why it matters
c_x) makes the inner-product estimator unbiased, so 2/4-bit recall holds without the rerank tax that 1-bit RaBitQ pays. The memory win is real, not borrowed-back at query time.IndexPQFastScan/ Milvus IVF-SQ, which is what most teams reach for when they outgrow flat f32 but can't afford 1-bit's recall hit.ruvector_rabitq::AnnIndextrait and reusesRandomRotation— so it drops into the same dispatcher/consumers with no new plumbing, and adds exactly one workspace crate.ruvector-rairs(ADR-193). It's also adjacent to the in-flight SymphonyQG research (research(nightly): SymphonyQG — graph-coupled 4-bit FastScan neighbor scoring #443, research(nightly): symphonyqg — co-designed 1-bit graph quantization (SIGMOD 2025) #428: graph-coupled 4-bit FastScan scoring), and complements the merged TurboQuant KV-cache work (feat(ruvllm): TurboQuant KV cache & vector compression #297) by covering the search-index side those didn't.Evidence it works (M1, measured)
n=5,000uniform-random vectors (worst case for ANN),dim=256,k=10, no f32 rerank, vs exact brute-force L2:Recall rises monotonically with bit-width; near-zero bias confirms the unbiased estimator. 16 unit + 1 doc-test pass; clippy clean. Reproduce with
cargo run --release -p ruvector-turbovec.Proposed path
Tracked by ADR-194. Incremental milestones:
IdMapIndex(O(1) delete, filtered search). ✅ implemented & validated..tvpersistence.ruvector-rulakedispatcher registration.The scalar M1 scorer is deliberately the determinism oracle the SIMD kernels must match bit-for-bit, so the perf work lands without risking correctness.
Ask
Adopt ADR-194 and land M1 (#521) as the foundation, then iterate on the SIMD milestones. Feedback welcome on the reuse boundary (rabitq trait/rotation), the ADR-193 IVF composition, and whether the Lloyd–Max math should be hoisted into a shared
ruvector-mathcrate rather than living in the codec.