perf(vector/index): Round-3 roadmap — Qdrant/LanceDB parity

Round 3 of the laurus perf push for the vector indexing pipeline. Recent rounds shipped HNSW visited-set bitmap (#406), cached \`vector_ids\` per field (#405), SQ + two-stage rerank (#481), PQ POC + full (#481 stage 3), QuantizedVectorPool optimisations (#522/#526). The remaining gap to Qdrant / Lucene 9 KnnVectorsFormat / LanceDB is in **the in-memory and on-disk data structures**: \`Vec<Vec<Vec<u64>>>\` HNSW graph (triple pointer chase), AoS f32 vector storage, hand-rolled serial k-means, full graph rebuild on any deletion, \`AHashSet<u64>\` deletion bitmap, per-vector inline \`field_name\` strings.

## Scope

- HNSW graph: CSR (offsets + neighbours) layout with u32 internal IDs.
- Vector storage: SoA / columnar / mmap-friendly fixed-stride payload (new LVS2 segment format).
- HNSW build: lock-free arena, parallel level-generation, thread-local build buffers.
- IVF / PQ training: rayon-parallel k-means + sub-quantiser.
- HNSW deletions: tombstone-then-replace (no full rebuild).
- DeletionBitmap: Roaring with \`arc_swap\` (cross-cutting with U1 lexical).
- f16 / bf16 / int4 dense storage variants.
- DiskANN / Vamana-style out-of-core HNSW build for segments > RAM.
- Shared / per-schema PQ codebook (instead of per-segment training).

## Reference: Qdrant / LanceDB / Lucene99 parity

| Area | laurus today | Qdrant / LanceDB / Lucene99 |
|---|---|---|
| HNSW adjacency | \`Vec<Vec<Vec<u64>>>\` per node × layer × neighbour | CSR \`int[]\` arena (Lucene99HnswVectorsFormat), \`GraphLayers\` (Qdrant) |
| Vector storage | \`Arc<Vec<f32>>\` per vector + per-record \`field_name\` inline | fixed-stride \`.vec\` blob (Lucene99), \`ChunkedVectorStorage\` (Qdrant), Arrow \`fixed_size_list\` (LanceDB) |
| Concurrent build | \`Vec<RwLock<Vec<u64>>>\` | fixed-stride per-node arena + spinlock byte (hnswlib), \`AtomicRefCell\` (Qdrant) |
| IVF / PQ training | serial Lloyd + serial sub-quantiser | rayon-parallel (Qdrant), OpenMP (FAISS) |
| HNSW deletion | drop graph + full rebuild on next finalize | \`mark_deleted\` + \`replace_deleted\` (hnswlib) |
| Deletion bitmap | \`RwLock<AHashSet<u64>>\` | Roaring (Qdrant \`IdTracker\`, Lucene) |
| Storage variants | int8 + PQ8 only | int8/int4 (Lucene 9.10), f16/bf16 (Qdrant), int4 PQ |
| Out-of-core build | none — caps at RAM | DiskANN / Vamana two-stage, LanceDB streaming IVF-PQ |

## Exit criteria

- All sub-issues below closed or explicitly deferred.
- Round-3 benchmark suite (\`make bench BENCH=vector_indexing\`) shows HNSW build throughput within +/- 25 % of Qdrant on the SIFT 1M dataset.

## Sub-issues

| ID | Issue | Size | Title |
|---|---|---|---|
| VI-01 | #619 | XL | perf(vector/index): switch HNSW adjacency to CSR (offsets + neighbours) packed layout |
| VI-02 | #620 | XL | perf(vector/index): move from AoS f32 buffers to columnar / SoA vector storage with mmap-friendly alignment |
| VI-03 | #621 | L | perf(vector/index): replace `Vec<RwLock<Vec<u64>>>` build-time graph with lock-free / sharded build |
| VI-04 | #622 | L | perf(vector/index): parallelise IVF k-means training (Lloyd iterations + k-means++ init) and SIMD-vectorise inner loops |
| VI-05 | #623 | L | perf(vector/index): add support for f16 / bf16 / int4 dense storage variants |
| VI-06 | #624 | M | perf(vector/index): tombstone-then-replace ("replace_deleted") for HNSW deletions instead of full graph rebuild |
| VI-07 | #625 | M | perf(vector/index): replace `DeletionBitmap`'s `AHashSet<u64>` with a Roaring bitmap (with optional dense `BitVec` shard) |
| VI-08 | #626 | M | perf(vector/index): replace `HashMap<(u64, String), u32>` field/doc->position index with field-id + dense ordinal lookup |
| VI-09 | #627 | M | perf(vector/index): stop persisting the HNSW graph as redundant manual u64s; pack as CSR + variable-byte deltas |
| VI-10 | #628 | M | perf(vector/index): transpose PQ codes to per-sub-vector columns for SIMD FastScan / PQ4 |
| VI-11 | #629 | M | perf(vector/index): IVF inverted lists: store as contiguous `Vec<u32>` per cluster with offsets, not `Vec<Vec<(u64, String, Vector)>>` |
| VI-12 | #630 | M | perf(vector/index): out-of-core HNSW build for segments > available RAM (DiskANN / Vamana inspired) |
| VI-13 | #631 | M | perf(vector/index): train-once / reuse PQ codebook across segments instead of per-segment training |
| VI-14 | #632 | M | perf(vector/index): build-time visited-set / candidate-heap reuse with thread-local arenas |
| VI-15 | #633 | M | perf(vector/index): tighten on-disk record by promoting `field_name` to a per-segment dictionary |
| VI-16 | #634 | M | perf(vector/index): commit pipeline: write payload incrementally instead of one giant `create_output` at flush |
| VI-17 | #635 | S | perf(vector/index): use `Vec<u32>` doc-id internal type throughout the index (instead of `u64`) |
| VI-18 | #636 | S | perf(vector/index): avoid `vectors.clone()` on the segment-merge read path |
| VI-19 | #637 | S | perf(vector/index): HNSW writer level-distribution generation is single-threaded |
| VI-20 | #638 | S | perf(vector/index): eagerly build per-segment search-time PQ LUT cache |
| VI-21 | #639 | S | perf(vector/index): drop `serde_json` segments.json — use a binary checksum-protected segment manifest |
| VI-22 | #640 | S | perf(vector/index): active-segment (`SegmentedVectorField`) brute-force search uses O(N) f32 scan with no SIMD batching |
| VI-23 | #641 | S | perf(vector/index): HNSW per-level fixed-stride neighbour packing during build (M0 / M known up front) |
| VI-24 | #642 | S | perf(vector/index): eliminate `field_index: HashMap<String, ...>` lookup hash on every NRT query |
| VI-25 | #643 | S | perf(vector/index): multi-vector / sparse vector layout reservation in segment header |

Round-3 investigation report: `~/.claude/tasks/laurus/20260523_perf_round3_audit/`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(vector/index): Round-3 roadmap — Qdrant/LanceDB parity #535

Scope

Reference: Qdrant / LanceDB / Lucene99 parity

Exit criteria

Sub-issues

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Area	laurus today	Qdrant / LanceDB / Lucene99
HNSW adjacency	`Vec<Vec<Vec>>` per node × layer × neighbour	CSR `int[]` arena (Lucene99HnswVectorsFormat), `GraphLayers` (Qdrant)
Vector storage	`Arc<Vec>` per vector + per-record `field_name` inline	fixed-stride `.vec` blob (Lucene99), `ChunkedVectorStorage` (Qdrant), Arrow `fixed_size_list` (LanceDB)
Concurrent build	`Vec<RwLock<Vec>>`	fixed-stride per-node arena + spinlock byte (hnswlib), `AtomicRefCell` (Qdrant)
IVF / PQ training	serial Lloyd + serial sub-quantiser	rayon-parallel (Qdrant), OpenMP (FAISS)
HNSW deletion	drop graph + full rebuild on next finalize	`mark_deleted` + `replace_deleted` (hnswlib)
Deletion bitmap	`RwLock<AHashSet>`	Roaring (Qdrant `IdTracker`, Lucene)
Storage variants	int8 + PQ8 only	int8/int4 (Lucene 9.10), f16/bf16 (Qdrant), int4 PQ
Out-of-core build	none — caps at RAM	DiskANN / Vamana two-stage, LanceDB streaming IVF-PQ

ID	Issue	Size	Title
VI-01	#619	XL	perf(vector/index): switch HNSW adjacency to CSR (offsets + neighbours) packed layout
VI-02	#620	XL	perf(vector/index): move from AoS f32 buffers to columnar / SoA vector storage with mmap-friendly alignment
VI-03	#621	L	perf(vector/index): replace `Vec<RwLock<Vec<u64>>>` build-time graph with lock-free / sharded build
VI-04	#622	L	perf(vector/index): parallelise IVF k-means training (Lloyd iterations + k-means++ init) and SIMD-vectorise inner loops
VI-05	#623	L	perf(vector/index): add support for f16 / bf16 / int4 dense storage variants
VI-06	#624	M	perf(vector/index): tombstone-then-replace ("replace_deleted") for HNSW deletions instead of full graph rebuild
VI-07	#625	M	perf(vector/index): replace `DeletionBitmap`'s `AHashSet<u64>` with a Roaring bitmap (with optional dense `BitVec` shard)
VI-08	#626	M	perf(vector/index): replace `HashMap<(u64, String), u32>` field/doc->position index with field-id + dense ordinal lookup
VI-09	#627	M	perf(vector/index): stop persisting the HNSW graph as redundant manual u64s; pack as CSR + variable-byte deltas
VI-10	#628	M	perf(vector/index): transpose PQ codes to per-sub-vector columns for SIMD FastScan / PQ4
VI-11	#629	M	perf(vector/index): IVF inverted lists: store as contiguous `Vec<u32>` per cluster with offsets, not `Vec<Vec<(u64, String, Vector)>>`
VI-12	#630	M	perf(vector/index): out-of-core HNSW build for segments > available RAM (DiskANN / Vamana inspired)
VI-13	#631	M	perf(vector/index): train-once / reuse PQ codebook across segments instead of per-segment training
VI-14	#632	M	perf(vector/index): build-time visited-set / candidate-heap reuse with thread-local arenas
VI-15	#633	M	perf(vector/index): tighten on-disk record by promoting `field_name` to a per-segment dictionary
VI-16	#634	M	perf(vector/index): commit pipeline: write payload incrementally instead of one giant `create_output` at flush
VI-17	#635	S	perf(vector/index): use `Vec<u32>` doc-id internal type throughout the index (instead of `u64`)
VI-18	#636	S	perf(vector/index): avoid `vectors.clone()` on the segment-merge read path
VI-19	#637	S	perf(vector/index): HNSW writer level-distribution generation is single-threaded
VI-20	#638	S	perf(vector/index): eagerly build per-segment search-time PQ LUT cache
VI-21	#639	S	perf(vector/index): drop `serde_json` segments.json — use a binary checksum-protected segment manifest
VI-22	#640	S	perf(vector/index): active-segment (`SegmentedVectorField`) brute-force search uses O(N) f32 scan with no SIMD batching
VI-23	#641	S	perf(vector/index): HNSW per-level fixed-stride neighbour packing during build (M0 / M known up front)
VI-24	#642	S	perf(vector/index): eliminate `field_index: HashMap<String, ...>` lookup hash on every NRT query
VI-25	#643	S	perf(vector/index): multi-vector / sparse vector layout reservation in segment header

perf(vector/index): Round-3 roadmap — Qdrant/LanceDB parity #535

Description

Scope

Reference: Qdrant / LanceDB / Lucene99 parity

Exit criteria

Sub-issues

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions