feat(index): merge incremental inverted index segments#6737
Merged
BubbleCal merged 4 commits intoMay 13, 2026
Merged
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
Xuanwo
approved these changes
May 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #6292
Feature
What is the new feature?
Inverted / FTS index optimization now honors
num_indices_to_mergefor incremental index segments.num_indices_to_merge=0keeps append-only delta behavior, whilenum_indices_to_merge=Nfolds newly indexed fragments together with the latest N existing inverted segments into one replacement segment.Why do we need this feature?
Repeated incremental
optimize_indices()calls could accumulate many FTS index segments even though the API already exposednum_indices_to_merge. That made the public optimize contract inconsistent for inverted indexes and could leave unnecessary segments for query-time search.How does it work?
The scalar optimize path now has an inverted-index-specific branch. It selects the requested existing FTS segments, loads them copy-on-write, filters stale / deleted old rows, merges them with newly tokenized data, writes one new inverted index root, and removes exactly the selected old metadata segments. The implementation preserves tokenizer parameters, positions settings, token-set format, FTS format version, posting-tail codec,
deleted_fragments, and stable-row-id filtering semantics.Validation
cargo fmt --allcargo clippy -p lance-index --all-targets -- -D warningscargo clippy -p lance --all-targets -- -D warningscargo test -p lance test_optimize_fts_respects_num_indices_to_mergecargo test -p lance test_optimize_ftscargo test -p lance test_fts_index_incremental_reindex_after_in_place_updatecargo test -p lance cleanup_lineagecargo test -p lance auto_clean_referenced_branchesuv run pytest python/tests/test_scalar_index.py::test_fts_optimize_num_indices_to_merge