Cross-cutting data-structure rewrite tracked under #537.
Current state
laurus/src/maintenance/deletion.rs:52-77 defines DeletionBitmap as RwLock<ahash::AHashSet<u64>>. This same DeletionBitmap is consumed by both the lexical store and every vector index (HNSW / Flat / IVF). The on-disk .delmap (v3) writes every deleted doc id as a raw u64.
Pain points:
- For a 10M-doc segment at 10% deletion the on-disk format is ~8 MB; a Roaring bitmap is ~125 KB.
- Every
is_deleted() is a hashed-u64 probe — measurable on HNSW per-neighbour traversals.
- HNSW search hot path takes a read lock per call.
Proposed direction
- New
.delmap v4 with RoaringBitmap::serialize payload (or FixedBitSet byte array, density-dependent).
- In-memory representation as
arc_swap::ArcSwap<RoaringBitmap>; is_deleted becomes one Arc load + one Roaring::contains.
- Single migration covering both lexical (
InvertedIndexWriter::pending_deletions: HashSet<u64>, MergeEngine::deleted_doc_ids) and vector (HnswSearcher filter, DeletionBitmap).
Acceptance
- v3 reader path retained for back-compat.
- HNSW per-neighbour
is_deleted shows measurable improvement in microbench.
- LiveDocs RAM for 10M-doc/10%-deleted segment drops by >50x.
References
- Lucene
FixedBitSet, IndexedDISI (Lucene 10+) for sparse iteration.
- Qdrant
IdTracker (RoaringBitmap-backed).
Round-3 investigation report: ~/.claude/tasks/laurus/20260523_perf_round3_audit/raw/lexical_indexing.md and .../raw/vector_indexing.md.
Cross-cutting data-structure rewrite tracked under #537.
Current state
laurus/src/maintenance/deletion.rs:52-77definesDeletionBitmapasRwLock<ahash::AHashSet<u64>>. This sameDeletionBitmapis consumed by both the lexical store and every vector index (HNSW / Flat / IVF). The on-disk.delmap(v3) writes every deleted doc id as a rawu64.Pain points:
is_deleted()is a hashed-u64 probe — measurable on HNSW per-neighbour traversals.Proposed direction
.delmapv4 withRoaringBitmap::serializepayload (orFixedBitSetbyte array, density-dependent).arc_swap::ArcSwap<RoaringBitmap>;is_deletedbecomes one Arc load + oneRoaring::contains.InvertedIndexWriter::pending_deletions: HashSet<u64>,MergeEngine::deleted_doc_ids) and vector (HnswSearcherfilter,DeletionBitmap).Acceptance
is_deletedshows measurable improvement in microbench.References
FixedBitSet,IndexedDISI(Lucene 10+) for sparse iteration.IdTracker(RoaringBitmap-backed).Round-3 investigation report:
~/.claude/tasks/laurus/20260523_perf_round3_audit/raw/lexical_indexing.mdand.../raw/vector_indexing.md.