Skip to content

perf: fallback to brute force FTS if filters matching fewer rows#4551

Merged
BubbleCal merged 2 commits intolance-format:mainfrom
BubbleCal:fts-bf
Aug 24, 2025
Merged

perf: fallback to brute force FTS if filters matching fewer rows#4551
BubbleCal merged 2 commits intolance-format:mainfrom
BubbleCal:fts-bf

Conversation

@BubbleCal
Copy link
Copy Markdown
Contributor

@BubbleCal BubbleCal commented Aug 23, 2025

if the filters match only a few rows, it may cause the WAND fails to filter out docs.
So we can evaluate only the matched rows, which would be much faster than running WAND first, this means to decompress at most num_rows_matched * num_tokens blocks

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Aug 23, 2025

Codecov Report

❌ Patch coverage is 14.70588% with 87 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.05%. Comparing base (74eefec) to head (28f4bc3).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance-index/src/scalar/inverted/wand.rs 5.63% 67 Missing ⚠️
rust/lance-index/src/scalar/inverted/index.rs 35.48% 20 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4551      +/-   ##
==========================================
- Coverage   81.13%   81.05%   -0.08%     
==========================================
  Files         308      308              
  Lines      113944   114382     +438     
  Branches   113944   114382     +438     
==========================================
+ Hits        92448    92717     +269     
- Misses      18238    18396     +158     
- Partials     3258     3269      +11     
Flag Coverage Δ
unittests 81.05% <14.70%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wkalt
Copy link
Copy Markdown
Contributor

wkalt commented Aug 23, 2025

@BubbleCal can you post some evidence or charts of the improvement?

@BubbleCal
Copy link
Copy Markdown
Contributor Author

tested with query:

plan = ds.scanner(
    columns=["id"],
    full_text_query="search text match query",
    filter="id <= 1000",
    prefilter=True,
).analyze_plan()

on random 1M docs

with this improvement:

elapsed: 3.26ms, plan:
AnalyzeExec verbose=true, metrics=[]
  TracedExec, metrics=[]
    ProjectionExec: expr=[id@2 as id, _score@1 as _score], metrics=[output_rows=854, elapsed_compute=1.667µs]
      Take: columns="_rowid, _score, (id)", metrics=[output_rows=854, elapsed_compute=104.459µs, batches_processed=1, bytes_read=8008, iops=1, requests=1]
        CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=854, elapsed_compute=11.335µs]
          MatchQuery: query=search text match query, metrics=[index_comparisons=1001, indices_loaded=0, parts_loaded=0]
            ScalarIndexQuery: query=[id <= 1000]@id_idx, metrics=[output_rows=2, elapsed_compute=690.042µs, index_comparisons=4096, indices_loaded=0, output_batches=1, parts_loaded=0, search_time=651.833µs, ser_time=15.708µs]

without this improvement:

elapsed: 72.07ms, plan:
AnalyzeExec verbose=true, metrics=[]
  TracedExec, metrics=[]
    ProjectionExec: expr=[id@2 as id, _score@1 as _score], metrics=[output_rows=854, elapsed_compute=583ns]
      Take: columns="_rowid, _score, (id)", metrics=[output_rows=854, elapsed_compute=151.126µs, batches_processed=1, bytes_read=8008, iops=1, requests=1]
        CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=854, elapsed_compute=21.875µs]
          MatchQuery: query=search text match query, metrics=[index_comparisons=850097, indices_loaded=0, parts_loaded=0]
            ScalarIndexQuery: query=[id <= 1000]@id_idx, metrics=[output_rows=2, elapsed_compute=234.833µs, index_comparisons=4096, indices_loaded=0, output_batches=1, parts_loaded=0, search_time=203.583µs, ser_time=27.958µs]

… length

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@BubbleCal BubbleCal merged commit 7480c9c into lance-format:main Aug 24, 2025
35 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants