Skip to content

Missing ANN regime: 2–4-bit scalar-quantized FastScan search index #520

@shaal

Description

@shaal

Missing ANN regime: 2–4-bit scalar-quantized FastScan search index

Summary

ruvector has strong ANN coverage but is missing the 2–4-bit scalar-quantized search regime that every mainstream production vector DB lives in (FAISS IndexPQFastScan, Qdrant, Milvus IVF-SQ). This issue makes the case for adding it as a small, self-contained crate (ruvector-turbovec, proposed in ADR-194). A working scalar-reference milestone (M1) is implemented and validated — see the companion PR #521.

The gap

Crate Index family Quantization Suitable for "search 10M vectors for nearest 10"?
ruvector-core HNSW (graph) none (f32) yes, but full f32 memory
ruvector-diskann DiskANN (graph) none / PQ-ish yes
ruvector-rairs IVF (ADR-193) optional yes
ruvector-rabitq Flat + rotation 1-bit only yes, but 1-bit caps recall
ruvllm turbo_quant.rs — (value codec) 2.5–4 bit no — KV-cache compressor, not an index

Two things sit near this regime but neither closes it:

  1. ruvector-rabitq is 1-bit. It's excellent when memory dominates, but on 1536–3072-dim production embeddings (OpenAI text-embedding-3, Cohere, Voyage) you need an exact-rerank pass over the raw f32 originals to hit production R@1 — which re-inflates memory back toward the f32 footprint, partly defeating the point.
  2. ruvllm/src/quantize/turbo_quant.rs is a TurboQuant value codec for transformer KV-cache / embedding compression. It has no inverted lists, no top-k heap, no FastScan LUT kernel, no filtered/allowlist search, no stable IDs/deletion. Wrong abstraction for search.

What's missing, concretely

  • 2–4 bits per dimension (not 1, not 32).
  • Codebook-free scalar quantization — no k-means training, online ingest.
  • A FastScan-style nibble-LUT SIMD kernel that scores 16/32 candidates per vector instruction without ever materializing f32.
  • Competitive recall without a mandatory f32 rerank pass.

Why it matters

  • Memory/recall trade-off that's actually new. The per-vector length-renormalization scalar (c_x) makes the inner-product estimator unbiased, so 2/4-bit recall holds without the rerank tax that 1-bit RaBitQ pays. The memory win is real, not borrowed-back at query time.
  • Parity with the field. It puts ruvector in the same regime as FAISS IndexPQFastScan / Milvus IVF-SQ, which is what most teams reach for when they outgrow flat f32 but can't afford 1-bit's recall hit.
  • Reuse, not sprawl. It implements the existing ruvector_rabitq::AnnIndex trait and reuses RandomRotation — so it drops into the same dispatcher/consumers with no new plumbing, and adds exactly one workspace crate.
  • Roadmap fit. The same block-SoA codes can later back an IVF posting list (IVF-SQ-FastScan) — a natural composition with ruvector-rairs (ADR-193). It's also adjacent to the in-flight SymphonyQG research (research(nightly): SymphonyQG — graph-coupled 4-bit FastScan neighbor scoring #443, research(nightly): symphonyqg — co-designed 1-bit graph quantization (SIGMOD 2025) #428: graph-coupled 4-bit FastScan scoring), and complements the merged TurboQuant KV-cache work (feat(ruvllm): TurboQuant KV cache & vector compression #297) by covering the search-index side those didn't.

Evidence it works (M1, measured)

n=5,000 uniform-random vectors (worst case for ANN), dim=256, k=10, no f32 rerank, vs exact brute-force L2:

Width recall@10 compression mean cosine bias
1-bit 0.308 25.6× +0.0005
2-bit 0.561 14.2× +0.0001
4-bit 0.879 7.5× −0.0000

Recall rises monotonically with bit-width; near-zero bias confirms the unbiased estimator. 16 unit + 1 doc-test pass; clippy clean. Reproduce with cargo run --release -p ruvector-turbovec.

Proposed path

Tracked by ADR-194. Incremental milestones:

  1. M1 — scalar reference (this PR): rotation reuse + Lloyd–Max 2/4-bit SQ + TQ+ calibration + unbiased scoring + IdMapIndex (O(1) delete, filtered search). ✅ implemented & validated.
  2. M2 — FastScan nibble-LUT SIMD kernel (AVX2 + NEON), fuzzed bit-identical to the scalar oracle.
  3. M3.tv persistence.
  4. M4 — AVX-512BW kernel + ruvector-rulake dispatcher registration.

The scalar M1 scorer is deliberately the determinism oracle the SIMD kernels must match bit-for-bit, so the perf work lands without risking correctness.

Ask

Adopt ADR-194 and land M1 (#521) as the foundation, then iterate on the SIMD milestones. Feedback welcome on the reuse boundary (rabitq trait/rotation), the ADR-193 IVF composition, and whether the Lloyd–Max math should be hoisted into a shared ruvector-math crate rather than living in the codec.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions