Skip to content

feat(scalar-index): push scan limit into index search#7065

Open
gstamatakis95 wants to merge 2 commits into
lance-format:mainfrom
gstamatakis95:feat/push-scan-limit-into-scalar-index
Open

feat(scalar-index): push scan limit into index search#7065
gstamatakis95 wants to merge 2 commits into
lance-format:mainfrom
gstamatakis95:feat/push-scan-limit-into-scalar-index

Conversation

@gstamatakis95
Copy link
Copy Markdown
Contributor

@gstamatakis95 gstamatakis95 commented Jun 2, 2026

Closes #6949

Problem

A scalar index query that matches many rows can be slow. For example, a large range query against a B-tree index that hits most or all of the data. When the scan has a LIMIT, the index still builds the full set of matching rows before the limit is applied later in the plan.

This change passes the limit down into the index search. A B-tree can then stop early once it has found enough matches.

What changed

Index layer (lance-index)

  • Added a new trait method ScalarIndex::search_limited(query, metrics, limit). It has a default that ignores the limit and calls search. Because of the default, existing index types and call sites do not need any changes.
  • BTreeIndex implements the new method. It searches matching pages in order and stops as soon as it has gathered at least limit real matches. The pages it has not reached yet are never read. Null page handling is skipped when a limit is set.
  • ScalarIndexExpr::evaluate_limited passes the limit only to a single index lookup. For AND, OR, and NOT it does not pass the limit, because those need the full result of each side.

Exec and planner (lance)

  • MaterializeIndexExec (legacy path) and ScalarIndexExec (default path) now accept a limit through with_limit(...).
  • Scanner::index_search_limit() computes the value to push, which is limit + offset. It is wired into both scalar_indexed_scan and new_filtered_read.
  • LogicalScalarIndex passes the limit to each of its segments.

Correctness

Pushing the limit is only a speed optimization. A GlobalLimitExec still applies the exact limit and offset at the top of the plan, so the index only needs to return at least limit + offset rows.

The pushdown is turned off in any case where returning the first N matches is not the same as returning any N matches:

  • An ORDER BY, vector search, or full text search reorders rows before the limit.
  • An aggregate is present, so the limit applies after aggregation.
  • A refine filter or an inexact (recheck) result could drop matched rows later in the plan.
  • Any relevant fragment has deletions. Deleted rows are removed after the index search, so stopping early could leave fewer than limit live rows.

Tests

  • test_search_limited_short_circuits (B-tree unit test). Confirms the search returns at least limit rows but reads fewer than all pages.
  • test_limit_pushed_into_scalar_index (scanner, end to end). Checks the exact row count and that every returned row matches the filter. A second phase adds deletions and confirms results stay correct.
  • Re-ran the B-tree, scalar index exec, inexact plan, secondary index (legacy and modern storage), all test_limit tests, and scalar_logical tests. All pass. cargo clippy --tests -- -D warnings and cargo fmt are clean.

@github-actions github-actions Bot added A-index Vector index, linalg, tokenizer enhancement New feature or request labels Jun 2, 2026
@gstamatakis95 gstamatakis95 force-pushed the feat/push-scan-limit-into-scalar-index branch from bb7aa09 to aa05c9d Compare June 2, 2026 20:28
@gstamatakis95 gstamatakis95 marked this pull request as ready for review June 2, 2026 20:32
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@gstamatakis95 gstamatakis95 force-pushed the feat/push-scan-limit-into-scalar-index branch from aa05c9d to 1d1b5c1 Compare June 2, 2026 20:42
@gstamatakis95 gstamatakis95 force-pushed the feat/push-scan-limit-into-scalar-index branch from 1d1b5c1 to 91cbb82 Compare June 2, 2026 21:44
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 6, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-index Vector index, linalg, tokenizer enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Push limit into scalar index query

1 participant