Skip to content

feat: expose FTS exec internals to enable distributed planning#6648

Merged
wkalt merged 2 commits intolance-format:mainfrom
vivek-bharathan:feat/exposeftsinternals
May 1, 2026
Merged

feat: expose FTS exec internals to enable distributed planning#6648
wkalt merged 2 commits intolance-format:mainfrom
vivek-bharathan:feat/exposeftsinternals

Conversation

@vivek-bharathan
Copy link
Copy Markdown
Contributor

@vivek-bharathan vivek-bharathan commented Apr 30, 2026

The FTS execution plan types (MatchQueryExec, PhraseQueryExec, BoostQueryExec,
BooleanQueryExec, FlatMatchQueryExec, FlatMatchFilterExec) and their supporting
helpers (load_segments, load_segment_details, build_global_bm25_scorer) are
currently private or pub(crate), with fields hidden behind constructors that always
assume that all committed segments exist on one node and are scored with statistics computed
locally.

This doesn't work for systems that distribute FTS queries across hosts
— a coordinator that wants to (for example) route segments 1–5 to host A,
segments 6–10 to host B, and still produce globally-correct BM25 scores can't do so
today: per-host execs each compute IDFs against their local segment subset, producing
locally-correct but globally-wrong scores.

This PR exposes the surface needed for that pattern, additively, without changing any
existing behavior

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added the enhancement New feature or request label Apr 30, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

❌ Patch coverage is 67.76612% with 215 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/io/exec/fts.rs 65.35% 187 Missing and 15 partials ⚠️
rust/lance-index/src/scalar/inverted.rs 86.00% 2 Missing and 5 partials ⚠️
rust/lance/src/dataset/scanner.rs 81.25% 0 Missing and 6 partials ⚠️

📢 Thoughts on this report? Let us know!

@westonpace
Copy link
Copy Markdown
Member

@claude review

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Code review skipped — your organization has reached its monthly code review spending cap.

An organization admin can view or raise the cap at claude.ai/admin-settings/claude-code. The cap resets at the start of the next billing period.

Once the cap resets or is raised, comment @claude review on this pull request to trigger a review.

Comment thread rust/lance-index/src/scalar/inverted.rs
Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few doc questions. It might also be nice (in this PR or in a separate issue) to document why we are making this change?

My blocking request is that we fill out the PR description with some kind of justification for why this change is desired? It looks like we are trying to make fts exec nodes a more complete part of the public API? This is a fine rationalization but we should at least describe it.

/// single corpus-wide scorer, so that BM25 IDF scoring uses *global*
/// statistics rather than per-segment statistics. Computes the union of
/// fuzzy-expanded terms when `params.fuzziness` is set.
pub fn build_global_bm25_scorer(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this method need to be made public? Is it to supply a MemBM25Scorer to some of the exec nodes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is mostly for api completeness since it is paired with with_base_scorer. Keeps a single source of truth for BM25 IDF arithmetic across segments. I could move it back if you prefer

Comment thread rust/lance/src/io/exec/fts.rs
@vivek-bharathan vivek-bharathan force-pushed the feat/exposeftsinternals branch from 0794f90 to 2510209 Compare April 30, 2026 17:59
Adds public getters on every FTS exec type
Promote segment loaders and aggregation arithmetic to pub
Add serde for FtsSearchParams
Add Segment-bound construction for FTS execs
Add scorer injection for FTS execs
@vivek-bharathan vivek-bharathan force-pushed the feat/exposeftsinternals branch from 2510209 to 9e06cc1 Compare April 30, 2026 18:36
Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to approve as there is now a PR description. Thanks 😄

@wkalt wkalt merged commit 0b5b95c into lance-format:main May 1, 2026
28 checks passed
@vivek-bharathan vivek-bharathan deleted the feat/exposeftsinternals branch May 1, 2026 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants