Skip to content

perf: avoid materializing RoaringBitmap::full() in fragment allow-list#6664

Merged
jackye1995 merged 2 commits intolance-format:mainfrom
wkalt:task/prefilter-allow-list-perf
May 2, 2026
Merged

perf: avoid materializing RoaringBitmap::full() in fragment allow-list#6664
jackye1995 merged 2 commits intolance-format:mainfrom
wkalt:task/prefilter-allow-list-perf

Conversation

@wkalt
Copy link
Copy Markdown
Contributor

@wkalt wkalt commented May 2, 2026

The fragment-bitmap allow-list filter dominates the cost of merge_insert and vector-search prefilter on tables that have received writes since their index was built. The filter ANDs an AllowList of full fragments with the deletion BlockList. RowAddrMask::bitand expands AllowList & BlockList into AllowList - BlockList, and the per-fragment (Full - Partial) branch in RowAddrTreeMap::sub_assign materializes RoaringBitmap::full() for every fragment with deletions.

Build the equivalent BlockList directly (deletions union rows in fragments outside the index bitmap) using only Full markers and a HashMap clone.

Existing #6563 stale-index tests cover correctness.

The fragment-bitmap allow-list filter dominates the cost of
merge_insert and vector-search prefilter on tables that have received
writes since their index was built. The filter ANDs an AllowList of
full fragments with the deletion BlockList. RowAddrMask::bitand
expands AllowList & BlockList into AllowList - BlockList, and the
per-fragment (Full - Partial) branch in RowAddrTreeMap::sub_assign
materializes RoaringBitmap::full() for every fragment with deletions.

Build the equivalent BlockList directly (deletions union rows in
fragments outside the index bitmap) using only Full markers and a
HashMap clone.

Existing lance-format#6563 stale-index tests cover correctness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 2, 2026

Codecov Report

❌ Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/index/prefilter.rs 91.66% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@wkalt
Copy link
Copy Markdown
Contributor Author

wkalt commented May 2, 2026

I pushed a benchmark that demonstrates the improvement in merge_insert. The code path can also be hit by queries.

Here is the result on my machine, showing about a 20x improvement on merge insert when the dataset has both deletions and new fragments not covered by the index. New new result matches the performance prior to 462faf7

$ cargo bench --bench merge_insert                                                                                                                                                             
...                              
<snip>
                   
merge_insert/clean      time:   [29.698 ms 34.462 ms 37.181 ms]                                                        
                        change: [-21.717% +1.1080% +28.510%] (p = 0.93 > 0.10)
                        No change in performance detected.                                                             
Found 1 outliers among 10 measurements (10.00%)                                                                        
  1 (10.00%) low mild                                                                                                  
                                                                                                                       
merge_insert/with_new_rows_only                                                                                        
                        time:   [30.537 ms 35.626 ms 38.579 ms]                                                        
                        change: [-23.007% -2.8760% +23.941%] (p = 0.82 > 0.10)                                         
                        No change in performance detected.                                                             
Found 1 outliers among 10 measurements (10.00%)                                                                        
  1 (10.00%) low mild                                                                                                  
                                                                                                                       
merge_insert/with_deletions_only                                                                                       
                        time:   [39.151 ms 43.630 ms 46.715 ms]                                                        
                        change: [-21.053% -0.0166% +27.813%] (p = 0.99 > 0.10)                                         
                        No change in performance detected.                                                             
Found 1 outliers among 10 measurements (10.00%)                                                                        
  1 (10.00%) low mild                                                                                                  
                                                                                                                       
merge_insert/with_new_rows_and_deletions                                                                               
                        time:   [28.207 ms 34.937 ms 39.069 ms]                                                        
                        change: [-97.081% -96.326% -95.465%] (p = 0.00 < 0.10)                                         
                        Performance has improved.

The fragment-bitmap allow-list filter has an expensive slow path that
fires only when a dataset has a fragment outside the index bitmap AND
a fragment inside the bitmap with a deletion file. Neither the python
merge_insert benchmark fixtures nor the existing cargo benches cover
that combination.

Add a criterion bench with four fixtures: clean, with_new_rows_only,
with_deletions_only, and with_new_rows_and_deletions. Only the last
exercises the slow path; the others serve as controls.

Run with: cargo bench --bench merge_insert
@wkalt wkalt force-pushed the task/prefilter-allow-list-perf branch from ecb383b to 923f0c7 Compare May 2, 2026 17:13
Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix!

@jackye1995 jackye1995 merged commit 23003a7 into lance-format:main May 2, 2026
35 of 36 checks passed
westonpace pushed a commit that referenced this pull request May 4, 2026
#6664)

The fragment-bitmap allow-list filter dominates the cost of merge_insert
and vector-search prefilter on tables that have received writes since
their index was built. The filter ANDs an AllowList of full fragments
with the deletion BlockList. RowAddrMask::bitand expands AllowList &
BlockList into AllowList - BlockList, and the per-fragment (Full -
Partial) branch in RowAddrTreeMap::sub_assign materializes
RoaringBitmap::full() for every fragment with deletions.

Build the equivalent BlockList directly (deletions union rows in
fragments outside the index bitmap) using only Full markers and a
HashMap clone.

Existing #6563 stale-index tests cover correctness.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants