Missing ANN regime: 2–4-bit scalar-quantized FastScan search index

# Missing ANN regime: 2–4-bit scalar-quantized FastScan search index

## Summary

ruvector has strong ANN coverage but is **missing the 2–4-bit scalar-quantized search regime** that every mainstream production vector DB lives in (FAISS `IndexPQFastScan`, Qdrant, Milvus IVF-SQ). This issue makes the case for adding it as a small, self-contained crate (`ruvector-turbovec`, proposed in ADR-194). A working scalar-reference milestone (M1) is implemented and validated — see the companion PR **#521**.

## The gap

| Crate | Index family | Quantization | Suitable for "search 10M vectors for nearest 10"? |
|-------|--------------|--------------|------|
| `ruvector-core` | HNSW (graph) | none (f32) | yes, but full f32 memory |
| `ruvector-diskann` | DiskANN (graph) | none / PQ-ish | yes |
| `ruvector-rairs` | IVF (ADR-193) | optional | yes |
| `ruvector-rabitq` | Flat + rotation | **1-bit only** | yes, but 1-bit caps recall |
| `ruvllm` `turbo_quant.rs` | — (value codec) | 2.5–4 bit | **no** — KV-cache compressor, not an index |

Two things sit *near* this regime but neither closes it:

1. **`ruvector-rabitq`** is 1-bit. It's excellent when memory dominates, but on 1536–3072-dim production embeddings (OpenAI `text-embedding-3`, Cohere, Voyage) you need an exact-rerank pass over the raw f32 originals to hit production R@1 — which **re-inflates memory back toward the f32 footprint**, partly defeating the point.
2. **`ruvllm/src/quantize/turbo_quant.rs`** is a TurboQuant *value codec* for transformer KV-cache / embedding compression. It has no inverted lists, no top-k heap, no FastScan LUT kernel, no filtered/allowlist search, no stable IDs/deletion. **Wrong abstraction for search.**

## What's missing, concretely

- **2–4 bits per dimension** (not 1, not 32).
- **Codebook-free** scalar quantization — no k-means training, online ingest.
- A **FastScan-style nibble-LUT SIMD kernel** that scores 16/32 candidates per vector instruction without ever materializing f32.
- Competitive recall **without** a mandatory f32 rerank pass.

## Why it matters

- **Memory/recall trade-off that's actually new.** The per-vector length-renormalization scalar (`c_x`) makes the inner-product estimator *unbiased*, so 2/4-bit recall holds without the rerank tax that 1-bit RaBitQ pays. The memory win is real, not borrowed-back at query time.
- **Parity with the field.** It puts ruvector in the same regime as FAISS `IndexPQFastScan` / Milvus IVF-SQ, which is what most teams reach for when they outgrow flat f32 but can't afford 1-bit's recall hit.
- **Reuse, not sprawl.** It implements the *existing* `ruvector_rabitq::AnnIndex` trait and reuses `RandomRotation` — so it drops into the same dispatcher/consumers with no new plumbing, and adds exactly one workspace crate.
- **Roadmap fit.** The same block-SoA codes can later back an IVF posting list (IVF-SQ-FastScan) — a natural composition with `ruvector-rairs` (ADR-193). It's also adjacent to the in-flight SymphonyQG research (#443, #428: graph-coupled 4-bit FastScan scoring), and complements the merged TurboQuant KV-cache work (#297) by covering the *search-index* side those didn't.

## Evidence it works (M1, measured)

`n=5,000` uniform-random vectors (worst case for ANN), `dim=256`, `k=10`, no f32 rerank, vs exact brute-force L2:

| Width | recall@10 | compression | mean cosine bias |
|-------|-----------|-------------|------------------|
| 1-bit | 0.308 | 25.6× | +0.0005 |
| 2-bit | 0.561 | 14.2× | +0.0001 |
| **4-bit** | **0.879** | **7.5×** | **−0.0000** |

Recall rises monotonically with bit-width; near-zero bias confirms the unbiased estimator. 16 unit + 1 doc-test pass; clippy clean. Reproduce with `cargo run --release -p ruvector-turbovec`.

## Proposed path

Tracked by ADR-194. Incremental milestones:

1. **M1 — scalar reference** (this PR): rotation reuse + Lloyd–Max 2/4-bit SQ + TQ+ calibration + unbiased scoring + `IdMapIndex` (O(1) delete, filtered search). ✅ implemented & validated.
2. **M2** — FastScan nibble-LUT SIMD kernel (AVX2 + NEON), fuzzed bit-identical to the scalar oracle.
3. **M3** — `.tv` persistence.
4. **M4** — AVX-512BW kernel + `ruvector-rulake` dispatcher registration.

The scalar M1 scorer is deliberately the determinism oracle the SIMD kernels must match bit-for-bit, so the perf work lands without risking correctness.

## Ask

Adopt ADR-194 and land M1 (#521) as the foundation, then iterate on the SIMD milestones. Feedback welcome on the reuse boundary (rabitq trait/rotation), the ADR-193 IVF composition, and whether the Lloyd–Max math should be hoisted into a shared `ruvector-math` crate rather than living in the codec.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing ANN regime: 2–4-bit scalar-quantized FastScan search index #520

Missing ANN regime: 2–4-bit scalar-quantized FastScan search index

Summary

The gap

What's missing, concretely

Why it matters

Evidence it works (M1, measured)

Proposed path

Ask

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Crate	Index family	Quantization	Suitable for "search 10M vectors for nearest 10"?
`ruvector-core`	HNSW (graph)	none (f32)	yes, but full f32 memory
`ruvector-diskann`	DiskANN (graph)	none / PQ-ish	yes
`ruvector-rairs`	IVF (ADR-193)	optional	yes
`ruvector-rabitq`	Flat + rotation	1-bit only	yes, but 1-bit caps recall
`ruvllm` `turbo_quant.rs`	— (value codec)	2.5–4 bit	no — KV-cache compressor, not an index

Width	recall@10	compression	mean cosine bias
1-bit	0.308	25.6×	+0.0005
2-bit	0.561	14.2×	+0.0001
4-bit	0.879	7.5×	−0.0000

Missing ANN regime: 2–4-bit scalar-quantized FastScan search index #520

Description

Missing ANN regime: 2–4-bit scalar-quantized FastScan search index

Summary

The gap

What's missing, concretely

Why it matters

Evidence it works (M1, measured)

Proposed path

Ask

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions