Skip to content

feat(index): merge incremental inverted index segments#6737

Merged
BubbleCal merged 4 commits into
mainfrom
yang/oss-865-add-support-for-segment-merging-to-inverted-indexes
May 13, 2026
Merged

feat(index): merge incremental inverted index segments#6737
BubbleCal merged 4 commits into
mainfrom
yang/oss-865-add-support-for-segment-merging-to-inverted-indexes

Conversation

@BubbleCal
Copy link
Copy Markdown
Contributor

@BubbleCal BubbleCal commented May 12, 2026

Fixes #6292

Feature

What is the new feature?

Inverted / FTS index optimization now honors num_indices_to_merge for incremental index segments. num_indices_to_merge=0 keeps append-only delta behavior, while num_indices_to_merge=N folds newly indexed fragments together with the latest N existing inverted segments into one replacement segment.

Why do we need this feature?

Repeated incremental optimize_indices() calls could accumulate many FTS index segments even though the API already exposed num_indices_to_merge. That made the public optimize contract inconsistent for inverted indexes and could leave unnecessary segments for query-time search.

How does it work?

The scalar optimize path now has an inverted-index-specific branch. It selects the requested existing FTS segments, loads them copy-on-write, filters stale / deleted old rows, merges them with newly tokenized data, writes one new inverted index root, and removes exactly the selected old metadata segments. The implementation preserves tokenizer parameters, positions settings, token-set format, FTS format version, posting-tail codec, deleted_fragments, and stable-row-id filtering semantics.

Validation

  • cargo fmt --all
  • cargo clippy -p lance-index --all-targets -- -D warnings
  • cargo clippy -p lance --all-targets -- -D warnings
  • cargo test -p lance test_optimize_fts_respects_num_indices_to_merge
  • cargo test -p lance test_optimize_fts
  • cargo test -p lance test_fts_index_incremental_reindex_after_in_place_update
  • cargo test -p lance cleanup_lineage
  • cargo test -p lance auto_clean_referenced_branches
  • uv run pytest python/tests/test_scalar_index.py::test_fts_optimize_num_indices_to_merge

@github-actions github-actions Bot added bug Something isn't working A-python Python bindings labels May 12, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 84.48000% with 97 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/index/append.rs 64.84% 36 Missing and 9 partials ⚠️
rust/lance-index/src/scalar/inverted/index.rs 82.62% 23 Missing and 14 partials ⚠️
rust/lance-index/src/scalar/inverted/builder.rs 87.50% 1 Missing and 14 partials ⚠️

📢 Thoughts on this report? Let us know!

@BubbleCal BubbleCal changed the title fix(index): merge incremental inverted index segments feat(index): merge incremental inverted index segments May 12, 2026
@github-actions github-actions Bot added the enhancement New feature or request label May 12, 2026
@BubbleCal BubbleCal marked this pull request as ready for review May 13, 2026 07:12
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@BubbleCal BubbleCal merged commit c51f5a1 into main May 13, 2026
28 checks passed
@BubbleCal BubbleCal deleted the yang/oss-865-add-support-for-segment-merging-to-inverted-indexes branch May 13, 2026 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-python Python bindings bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for segment merging to inverted indexes

2 participants