perf: parallelize FTS prewarming by BubbleCal · Pull Request #6144 · lance-format/lance

BubbleCal · 2026-03-10T04:35:06Z

20%+ faster for 2GB index, could be more for larger index

github-actions · 2026-03-10T04:35:19Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

github-actions · 2026-03-10T04:37:15Z

PR Review

P1: Memory retention in the modern prewarm path

In the modern (compressed) prewarm branch, each CompressedPostingList is constructed via:

postings.value(idx).as_binary::<i64>().clone()

This is a shallow clone — all cached entries retain Arc references to the full batch data buffer. Unlike the legacy path which calls shrink_to_fit() to compact each slice into independent memory, the modern path skips this. The comment says this is because "DeepSizeOf uses logical byte size," but the concern isn't accounting — it's actual memory behavior.

When cache eviction later removes individual posting lists, no memory is freed until every entry from that batch is evicted. If this index has many tokens, eviction becomes effectively a no-op for memory reclamation. Consider either:

Calling shrink_to_fit on the sliced binary array (or equivalent deep copy), or
Documenting that this is intentional because compressed posting lists are expected to always be fully cached

P1: No tests

The PR description says "Not run (not requested)," but repository standards require tests for all changes. The prewarm parallelization and the new modern layout code path both introduce new behavior that should be covered. At minimum, a test verifying that the modern prewarm path correctly populates the cache would catch regressions.

Minor observations

The list_i32_used_bytes helper assumes the child array element size is always size_of::<i32>(). This is correct for current usage (positions), but the function name doesn't communicate this constraint. Consider renaming or adding a comment.
PR description mentions "worker error handling" and "monitoring loop for slow posting lists" but the diff only contains prewarm and DeepSizeOf changes. Worth updating the description to match actual changes.

codecov · 2026-03-10T06:34:49Z

Codecov Report

❌ Patch coverage is 83.33333% with 22 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance-index/src/scalar/inverted/index.rs	83.33%	16 Missing and 6 partials ⚠️

📢 Thoughts on this report? Let us know!

BubbleCal changed the title ~~Fix inverted index worker error handling and prewarm streaming~~ perf: parallelize FTS prewarming and cache size accounting Mar 10, 2026

github-actions bot added the performance label Mar 10, 2026

BubbleCal force-pushed the yang/fts-prewarm branch from 2bf2d35 to 882ac8b Compare March 10, 2026 05:15

BubbleCal changed the title ~~perf: parallelize FTS prewarming and cache size accounting~~ perf: parallelize FTS prewarming Mar 10, 2026

perf(index): improve inverted index prewarm

ffcf928

BubbleCal force-pushed the yang/fts-prewarm branch from b07726d to ffcf928 Compare March 10, 2026 11:35

Xuanwo approved these changes Mar 10, 2026

View reviewed changes

BubbleCal merged commit 0681a08 into main Mar 10, 2026
29 checks passed

BubbleCal deleted the yang/fts-prewarm branch March 10, 2026 14:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: parallelize FTS prewarming#6144

perf: parallelize FTS prewarming#6144
BubbleCal merged 1 commit intomainfrom
yang/fts-prewarm

BubbleCal commented Mar 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

codecov bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BubbleCal commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

PR Review

P1: Memory retention in the modern prewarm path

P1: No tests

Minor observations

Uh oh!

codecov bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BubbleCal commented Mar 10, 2026 •

edited

Loading

codecov bot commented Mar 10, 2026 •

edited

Loading