Skip to content

perf(vector/index): Round-3 roadmap — Qdrant/LanceDB parity #535

@mosuka

Description

@mosuka

Round 3 of the laurus perf push for the vector indexing pipeline. Recent rounds shipped HNSW visited-set bitmap (#406), cached `vector_ids` per field (#405), SQ + two-stage rerank (#481), PQ POC + full (#481 stage 3), QuantizedVectorPool optimisations (#522/#526). The remaining gap to Qdrant / Lucene 9 KnnVectorsFormat / LanceDB is in the in-memory and on-disk data structures: `Vec<Vec<Vec>>` HNSW graph (triple pointer chase), AoS f32 vector storage, hand-rolled serial k-means, full graph rebuild on any deletion, `AHashSet` deletion bitmap, per-vector inline `field_name` strings.

Scope

  • HNSW graph: CSR (offsets + neighbours) layout with u32 internal IDs.
  • Vector storage: SoA / columnar / mmap-friendly fixed-stride payload (new LVS2 segment format).
  • HNSW build: lock-free arena, parallel level-generation, thread-local build buffers.
  • IVF / PQ training: rayon-parallel k-means + sub-quantiser.
  • HNSW deletions: tombstone-then-replace (no full rebuild).
  • DeletionBitmap: Roaring with `arc_swap` (cross-cutting with U1 lexical).
  • f16 / bf16 / int4 dense storage variants.
  • DiskANN / Vamana-style out-of-core HNSW build for segments > RAM.
  • Shared / per-schema PQ codebook (instead of per-segment training).

Reference: Qdrant / LanceDB / Lucene99 parity

Area laurus today Qdrant / LanceDB / Lucene99
HNSW adjacency `Vec<Vec<Vec>>` per node × layer × neighbour CSR `int[]` arena (Lucene99HnswVectorsFormat), `GraphLayers` (Qdrant)
Vector storage `Arc<Vec>` per vector + per-record `field_name` inline fixed-stride `.vec` blob (Lucene99), `ChunkedVectorStorage` (Qdrant), Arrow `fixed_size_list` (LanceDB)
Concurrent build `Vec<RwLock<Vec>>` fixed-stride per-node arena + spinlock byte (hnswlib), `AtomicRefCell` (Qdrant)
IVF / PQ training serial Lloyd + serial sub-quantiser rayon-parallel (Qdrant), OpenMP (FAISS)
HNSW deletion drop graph + full rebuild on next finalize `mark_deleted` + `replace_deleted` (hnswlib)
Deletion bitmap `RwLock<AHashSet>` Roaring (Qdrant `IdTracker`, Lucene)
Storage variants int8 + PQ8 only int8/int4 (Lucene 9.10), f16/bf16 (Qdrant), int4 PQ
Out-of-core build none — caps at RAM DiskANN / Vamana two-stage, LanceDB streaming IVF-PQ

Exit criteria

  • All sub-issues below closed or explicitly deferred.
  • Round-3 benchmark suite (`make bench BENCH=vector_indexing`) shows HNSW build throughput within +/- 25 % of Qdrant on the SIFT 1M dataset.

Sub-issues

ID Issue Size Title
VI-01 #619 XL perf(vector/index): switch HNSW adjacency to CSR (offsets + neighbours) packed layout
VI-02 #620 XL perf(vector/index): move from AoS f32 buffers to columnar / SoA vector storage with mmap-friendly alignment
VI-03 #621 L perf(vector/index): replace Vec<RwLock<Vec<u64>>> build-time graph with lock-free / sharded build
VI-04 #622 L perf(vector/index): parallelise IVF k-means training (Lloyd iterations + k-means++ init) and SIMD-vectorise inner loops
VI-05 #623 L perf(vector/index): add support for f16 / bf16 / int4 dense storage variants
VI-06 #624 M perf(vector/index): tombstone-then-replace ("replace_deleted") for HNSW deletions instead of full graph rebuild
VI-07 #625 M perf(vector/index): replace DeletionBitmap's AHashSet<u64> with a Roaring bitmap (with optional dense BitVec shard)
VI-08 #626 M perf(vector/index): replace HashMap<(u64, String), u32> field/doc->position index with field-id + dense ordinal lookup
VI-09 #627 M perf(vector/index): stop persisting the HNSW graph as redundant manual u64s; pack as CSR + variable-byte deltas
VI-10 #628 M perf(vector/index): transpose PQ codes to per-sub-vector columns for SIMD FastScan / PQ4
VI-11 #629 M perf(vector/index): IVF inverted lists: store as contiguous Vec<u32> per cluster with offsets, not Vec<Vec<(u64, String, Vector)>>
VI-12 #630 M perf(vector/index): out-of-core HNSW build for segments > available RAM (DiskANN / Vamana inspired)
VI-13 #631 M perf(vector/index): train-once / reuse PQ codebook across segments instead of per-segment training
VI-14 #632 M perf(vector/index): build-time visited-set / candidate-heap reuse with thread-local arenas
VI-15 #633 M perf(vector/index): tighten on-disk record by promoting field_name to a per-segment dictionary
VI-16 #634 M perf(vector/index): commit pipeline: write payload incrementally instead of one giant create_output at flush
VI-17 #635 S perf(vector/index): use Vec<u32> doc-id internal type throughout the index (instead of u64)
VI-18 #636 S perf(vector/index): avoid vectors.clone() on the segment-merge read path
VI-19 #637 S perf(vector/index): HNSW writer level-distribution generation is single-threaded
VI-20 #638 S perf(vector/index): eagerly build per-segment search-time PQ LUT cache
VI-21 #639 S perf(vector/index): drop serde_json segments.json — use a binary checksum-protected segment manifest
VI-22 #640 S perf(vector/index): active-segment (SegmentedVectorField) brute-force search uses O(N) f32 scan with no SIMD batching
VI-23 #641 S perf(vector/index): HNSW per-level fixed-stride neighbour packing during build (M0 / M known up front)
VI-24 #642 S perf(vector/index): eliminate field_index: HashMap<String, ...> lookup hash on every NRT query
VI-25 #643 S perf(vector/index): multi-vector / sparse vector layout reservation in segment header

Round-3 investigation report: ~/.claude/tasks/laurus/20260523_perf_round3_audit/.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions