perf: avoid materializing RoaringBitmap::full() in fragment allow-list#6664
Merged
jackye1995 merged 2 commits intolance-format:mainfrom May 2, 2026
Merged
Conversation
The fragment-bitmap allow-list filter dominates the cost of merge_insert and vector-search prefilter on tables that have received writes since their index was built. The filter ANDs an AllowList of full fragments with the deletion BlockList. RowAddrMask::bitand expands AllowList & BlockList into AllowList - BlockList, and the per-fragment (Full - Partial) branch in RowAddrTreeMap::sub_assign materializes RoaringBitmap::full() for every fragment with deletions. Build the equivalent BlockList directly (deletions union rows in fragments outside the index bitmap) using only Full markers and a HashMap clone. Existing lance-format#6563 stale-index tests cover correctness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Contributor
Author
|
I pushed a benchmark that demonstrates the improvement in merge_insert. The code path can also be hit by queries. Here is the result on my machine, showing about a 20x improvement on merge insert when the dataset has both deletions and new fragments not covered by the index. New new result matches the performance prior to 462faf7 |
The fragment-bitmap allow-list filter has an expensive slow path that fires only when a dataset has a fragment outside the index bitmap AND a fragment inside the bitmap with a deletion file. Neither the python merge_insert benchmark fixtures nor the existing cargo benches cover that combination. Add a criterion bench with four fixtures: clean, with_new_rows_only, with_deletions_only, and with_new_rows_and_deletions. Only the last exercises the slow path; the others serve as controls. Run with: cargo bench --bench merge_insert
ecb383b to
923f0c7
Compare
westonpace
pushed a commit
that referenced
this pull request
May 4, 2026
#6664) The fragment-bitmap allow-list filter dominates the cost of merge_insert and vector-search prefilter on tables that have received writes since their index was built. The filter ANDs an AllowList of full fragments with the deletion BlockList. RowAddrMask::bitand expands AllowList & BlockList into AllowList - BlockList, and the per-fragment (Full - Partial) branch in RowAddrTreeMap::sub_assign materializes RoaringBitmap::full() for every fragment with deletions. Build the equivalent BlockList directly (deletions union rows in fragments outside the index bitmap) using only Full markers and a HashMap clone. Existing #6563 stale-index tests cover correctness. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The fragment-bitmap allow-list filter dominates the cost of merge_insert and vector-search prefilter on tables that have received writes since their index was built. The filter ANDs an AllowList of full fragments with the deletion BlockList. RowAddrMask::bitand expands AllowList & BlockList into AllowList - BlockList, and the per-fragment (Full - Partial) branch in RowAddrTreeMap::sub_assign materializes RoaringBitmap::full() for every fragment with deletions.
Build the equivalent BlockList directly (deletions union rows in fragments outside the index bitmap) using only Full markers and a HashMap clone.
Existing #6563 stale-index tests cover correctness.