Skip to content

perf: parallelize FTS prewarming#6144

Merged
BubbleCal merged 1 commit intomainfrom
yang/fts-prewarm
Mar 10, 2026
Merged

perf: parallelize FTS prewarming#6144
BubbleCal merged 1 commit intomainfrom
yang/fts-prewarm

Conversation

@BubbleCal
Copy link
Contributor

@BubbleCal BubbleCal commented Mar 10, 2026

20%+ faster for 2GB index, could be more for larger index

@github-actions
Copy link
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@github-actions
Copy link
Contributor

PR Review

P1: Memory retention in the modern prewarm path

In the modern (compressed) prewarm branch, each CompressedPostingList is constructed via:

postings.value(idx).as_binary::<i64>().clone()

This is a shallow clone — all cached entries retain Arc references to the full batch data buffer. Unlike the legacy path which calls shrink_to_fit() to compact each slice into independent memory, the modern path skips this. The comment says this is because "DeepSizeOf uses logical byte size," but the concern isn't accounting — it's actual memory behavior.

When cache eviction later removes individual posting lists, no memory is freed until every entry from that batch is evicted. If this index has many tokens, eviction becomes effectively a no-op for memory reclamation. Consider either:

  • Calling shrink_to_fit on the sliced binary array (or equivalent deep copy), or
  • Documenting that this is intentional because compressed posting lists are expected to always be fully cached

P1: No tests

The PR description says "Not run (not requested)," but repository standards require tests for all changes. The prewarm parallelization and the new modern layout code path both introduce new behavior that should be covered. At minimum, a test verifying that the modern prewarm path correctly populates the cache would catch regressions.

Minor observations

  • The list_i32_used_bytes helper assumes the child array element size is always size_of::<i32>(). This is correct for current usage (positions), but the function name doesn't communicate this constraint. Consider renaming or adding a comment.
  • PR description mentions "worker error handling" and "monitoring loop for slow posting lists" but the diff only contains prewarm and DeepSizeOf changes. Worth updating the description to match actual changes.

@BubbleCal BubbleCal changed the title Fix inverted index worker error handling and prewarm streaming perf: parallelize FTS prewarming and cache size accounting Mar 10, 2026
@codecov
Copy link

codecov bot commented Mar 10, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 22 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-index/src/scalar/inverted/index.rs 83.33% 16 Missing and 6 partials ⚠️

📢 Thoughts on this report? Let us know!

@BubbleCal BubbleCal changed the title perf: parallelize FTS prewarming and cache size accounting perf: parallelize FTS prewarming Mar 10, 2026
@BubbleCal BubbleCal merged commit 0681a08 into main Mar 10, 2026
29 checks passed
@BubbleCal BubbleCal deleted the yang/fts-prewarm branch March 10, 2026 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants