feat: expose FTS exec internals to enable distributed planning#6648
feat: expose FTS exec internals to enable distributed planning#6648wkalt merged 2 commits intolance-format:mainfrom
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
@claude review |
There was a problem hiding this comment.
An organization admin can view or raise the cap at claude.ai/admin-settings/claude-code. The cap resets at the start of the next billing period.
Once the cap resets or is raised, comment @claude review on this pull request to trigger a review.
westonpace
left a comment
There was a problem hiding this comment.
Just a few doc questions. It might also be nice (in this PR or in a separate issue) to document why we are making this change?
My blocking request is that we fill out the PR description with some kind of justification for why this change is desired? It looks like we are trying to make fts exec nodes a more complete part of the public API? This is a fine rationalization but we should at least describe it.
| /// single corpus-wide scorer, so that BM25 IDF scoring uses *global* | ||
| /// statistics rather than per-segment statistics. Computes the union of | ||
| /// fuzzy-expanded terms when `params.fuzziness` is set. | ||
| pub fn build_global_bm25_scorer( |
There was a problem hiding this comment.
Why does this method need to be made public? Is it to supply a MemBM25Scorer to some of the exec nodes?
There was a problem hiding this comment.
Actually this is mostly for api completeness since it is paired with with_base_scorer. Keeps a single source of truth for BM25 IDF arithmetic across segments. I could move it back if you prefer
0794f90 to
2510209
Compare
Adds public getters on every FTS exec type Promote segment loaders and aggregation arithmetic to pub Add serde for FtsSearchParams Add Segment-bound construction for FTS execs Add scorer injection for FTS execs
2510209 to
9e06cc1
Compare
westonpace
left a comment
There was a problem hiding this comment.
Switching to approve as there is now a PR description. Thanks 😄
The FTS execution plan types (
MatchQueryExec,PhraseQueryExec,BoostQueryExec,BooleanQueryExec,FlatMatchQueryExec,FlatMatchFilterExec) and their supportinghelpers (
load_segments,load_segment_details,build_global_bm25_scorer) arecurrently private or
pub(crate), with fields hidden behind constructors that alwaysassume that all committed segments exist on one node and are scored with statistics computed
locally.
This doesn't work for systems that distribute FTS queries across hosts
— a coordinator that wants to (for example) route segments 1–5 to host A,
segments 6–10 to host B, and still produce globally-correct BM25 scores can't do so
today: per-host execs each compute IDFs against their local segment subset, producing
locally-correct but globally-wrong scores.
This PR exposes the surface needed for that pattern, additively, without changing any
existing behavior