Skip to content

perf(vector/index): replace DeletionBitmap's AHashSet<u64> with a Roaring bitmap (with optional dense BitVec shard) #625

@mosuka

Description

@mosuka

Round-3 perf push sub-issue (tracked under umbrella #535).

[M] Replace DeletionBitmap's AHashSet<u64> with a Roaring bitmap (with optional dense BitVec shard)

  • Where: laurus/src/maintenance/deletion.rs:52-77,
    is_deleted :139-141.
  • Current behaviour: RwLock<AHashSet<u64>> of deleted doc IDs.
    Memory is ~16 B/entry for small sets and 8 B per slot in the
    hashmap backing; is_deleted takes a read lock + hashes per call.
  • Why it might be a bottleneck: HNSW search probes the bitmap inside
    the hot search_layer loop (per neighbour); RwLock<AHashSet> adds a
    per-call atomic refcount on the read-guard and a hash. For a segment
    with millions of live + thousands deleted, RoaringBitmap is ~10x
    smaller and supports lock-free reads via Arc<RoaringBitmap> swap.
  • Reference precedent: Lucene's Bits interface uses Roaring or
    FixedBitSet depending on density; Qdrant IdTracker uses
    RoaringBitmap.
  • Proposed direction: keep public API of DeletionBitmap but back
    it with arc_swap::ArcSwap<RoaringBitmap> (and for very dense ranges,
    BitVec via a DenseShard variant chosen automatically when density

    1%). delete_document does a rcu-style swap. is_deleted becomes
    branch-prediction friendly: one Arc load + one Roaring contains().

  • Risk / scope: contained behind an existing public type; need to
    keep on-disk .delmap format readable (the existing v1/v2/v3 path can
    add a v4 with Roaring serialisation, with a fallback to v3).


ID: VI-07 — see ~/.claude/tasks/laurus/20260523_perf_round3_audit/task_list.md for the full Round-3 issue list.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions