perf: fallback to brute force FTS if filters matching fewer rows by BubbleCal · Pull Request #4551 · lance-format/lance

BubbleCal · 2025-08-23T05:01:55Z

if the filters match only a few rows, it may cause the WAND fails to filter out docs.
So we can evaluate only the matched rows, which would be much faster than running WAND first, this means to decompress at most num_rows_matched * num_tokens blocks

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

codecov-commenter · 2025-08-23T05:36:42Z

Codecov Report

❌ Patch coverage is 14.70588% with 87 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.05%. Comparing base (74eefec) to head (28f4bc3).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
rust/lance-index/src/scalar/inverted/wand.rs	5.63%	67 Missing ⚠️
rust/lance-index/src/scalar/inverted/index.rs	35.48%	20 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4551      +/-   ##
==========================================
- Coverage   81.13%   81.05%   -0.08%     
==========================================
  Files         308      308              
  Lines      113944   114382     +438     
  Branches   113944   114382     +438     
==========================================
+ Hits        92448    92717     +269     
- Misses      18238    18396     +158     
- Partials     3258     3269      +11

Flag	Coverage Δ
unittests	`81.05% <14.70%> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

wkalt · 2025-08-23T06:05:38Z

@BubbleCal can you post some evidence or charts of the improvement?

BubbleCal · 2025-08-23T06:44:33Z

tested with query:

plan = ds.scanner(
    columns=["id"],
    full_text_query="search text match query",
    filter="id <= 1000",
    prefilter=True,
).analyze_plan()

on random 1M docs

with this improvement:

elapsed: 3.26ms, plan:
AnalyzeExec verbose=true, metrics=[]
  TracedExec, metrics=[]
    ProjectionExec: expr=[id@2 as id, _score@1 as _score], metrics=[output_rows=854, elapsed_compute=1.667µs]
      Take: columns="_rowid, _score, (id)", metrics=[output_rows=854, elapsed_compute=104.459µs, batches_processed=1, bytes_read=8008, iops=1, requests=1]
        CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=854, elapsed_compute=11.335µs]
          MatchQuery: query=search text match query, metrics=[index_comparisons=1001, indices_loaded=0, parts_loaded=0]
            ScalarIndexQuery: query=[id <= 1000]@id_idx, metrics=[output_rows=2, elapsed_compute=690.042µs, index_comparisons=4096, indices_loaded=0, output_batches=1, parts_loaded=0, search_time=651.833µs, ser_time=15.708µs]

without this improvement:

elapsed: 72.07ms, plan:
AnalyzeExec verbose=true, metrics=[]
  TracedExec, metrics=[]
    ProjectionExec: expr=[id@2 as id, _score@1 as _score], metrics=[output_rows=854, elapsed_compute=583ns]
      Take: columns="_rowid, _score, (id)", metrics=[output_rows=854, elapsed_compute=151.126µs, batches_processed=1, bytes_read=8008, iops=1, requests=1]
        CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=854, elapsed_compute=21.875µs]
          MatchQuery: query=search text match query, metrics=[index_comparisons=850097, indices_loaded=0, parts_loaded=0]
            ScalarIndexQuery: query=[id <= 1000]@id_idx, metrics=[output_rows=2, elapsed_compute=234.833µs, index_comparisons=4096, indices_loaded=0, output_batches=1, parts_loaded=0, search_time=203.583µs, ser_time=27.958µs]

… length Signed-off-by: BubbleCal <bubble-cal@outlook.com>

perf: fallback to brute force FTS if filters matching fewer rows

c0afc40

Signed-off-by: BubbleCal <bubble-cal@outlook.com>

BubbleCal requested review from LuQQiu, eddyxu and westonpace August 23, 2025 05:02

github-actions Bot added python performance labels Aug 23, 2025

Xuanwo approved these changes Aug 24, 2025

View reviewed changes

LuQQiu approved these changes Aug 24, 2025

View reviewed changes

fallback to flat search if the matched rows are less than avg posting…

28f4bc3

… length Signed-off-by: BubbleCal <bubble-cal@outlook.com>

BubbleCal merged commit 7480c9c into lance-format:main Aug 24, 2025
35 of 37 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: fallback to brute force FTS if filters matching fewer rows#4551

perf: fallback to brute force FTS if filters matching fewer rows#4551
BubbleCal merged 2 commits intolance-format:mainfrom
BubbleCal:fts-bf

BubbleCal commented Aug 23, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Aug 23, 2025 •

edited

Loading

Uh oh!

wkalt commented Aug 23, 2025

Uh oh!

BubbleCal commented Aug 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

BubbleCal commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wkalt commented Aug 23, 2025

Uh oh!

BubbleCal commented Aug 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

BubbleCal commented Aug 23, 2025 •

edited

Loading

codecov-commenter commented Aug 23, 2025 •

edited

Loading