feat: support IVF partitions multi-split by BubbleCal · Pull Request #6423 · lance-format/lance

BubbleCal · 2026-04-07T10:39:02Z

Feature

What is the new feature?

This PR allows a single optimize_indices call on the v3 IVF incremental optimize path to split multiple oversized IVF partitions in one pass.

Why do we need this feature?

Previously, optimize could split at most one oversized partition per run. After large appends, several partitions can exceed the split threshold at the same time, which forced repeated optimize cycles to bring the index back into a healthy partition layout.

How does it work?

check_partition_adjustment now collects all split candidates from the current snapshot and keeps the existing single-partition join fallback.
The multi-split path preserves existing partition ids and appends one new partition per split partition.
When any split happens, optimize continues to merge all existing delta indices in the same round, preserving the existing merge semantics.
Candidate rows from overlapping reassign partitions are resolved globally so the same row is moved at most once, choosing the best destination by distance.

Performance Improvement

What is the performance issue or bottleneck?

The initial multi-split implementation removed the functional limitation, but overlapping reassign partitions still had avoidable overhead:

split planning was done sequentially
split plans retained full raw vector payloads longer than necessary
a reused candidate partition recomputed its baseline distance to the original centroid for every overlapping split request

How does this PR improve performance?

split plans are now built with bounded parallelism across compute CPUs
split plans no longer retain raw partition vectors after producing the original-partition assign ops
each reused candidate partition is loaded once and computes its baseline distance once, then reuses that result across overlapping split requests
best candidate moves are updated in-place per row id instead of materializing intermediate move vectors per request

Testing

cargo test -p lance compute_reassign_candidate_moves_vectors_to_new_centroids
cargo test -p lance test_partition_split_on_append_multivec
cargo test -p lance test_split_multiple_partitions_in_one_optimize
cargo test -p lance test_join_partition_on_delete_multivec

github-actions · 2026-04-07T10:39:17Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

codecov · 2026-04-07T11:21:06Z

Codecov Report

❌ Patch coverage is 90.73634% with 39 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance/src/index/vector/builder.rs	87.62%	22 Missing and 15 partials ⚠️
rust/lance/src/index/vector/ivf/v2.rs	98.36%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Xuanwo · 2026-04-08T09:12:28Z

+        T::Native: Dot + L2 + Normalize,
+        PrimitiveArray<T>: From<Vec<T::Native>>,
+    {
+        let Some((row_ids, vectors)) = self.load_partition_raw_vectors(part_idx).await? else {


load_partition_raw_vectors will load multivector into flat ones so one row id could match different vectors. Will this make the multivector data been missing in our index?

Xuanwo · 2026-04-08T09:17:50Z

        new_centroids.extend(centroids.iter().map(|vec| vec.unwrap()));
+        let split_plans = stream::iter(split_partitions.iter().copied().enumerate())
+            .map(|(split_order, part_idx)| async move {
+                let centroid2_part_idx = ivf.num_partitions() + split_order;


The centroid2_part_idx is calculated using ivf.num_partitions(). Could it be that some partitions have been filtered, causing the centroid2_part_idx in the plan to exceed the actual number? If so, the subsequent assign_ops[*target_idx] operation might panic.

…artitions

Xuanwo

Thank you for working on this!

perf(index): reduce overlap work in ivf multi-split

90764bf

BubbleCal changed the title ~~feat reduce overlap work in ivf multi-split~~ feat: support IVF partitions multi-split Apr 7, 2026

github-actions Bot added the enhancement New feature or request label Apr 7, 2026

fix(test): remove redundant row count cast

a957dfa

Xuanwo reviewed Apr 8, 2026

View reviewed changes

BubbleCal added 2 commits April 8, 2026 17:24

Merge remote-tracking branch 'origin/main' into yang/split-multiple-p…

e0b0762

…artitions

fix(index): preserve multivectors in multi-split

5574dd4

Xuanwo approved these changes Apr 8, 2026

View reviewed changes

BubbleCal merged commit 5310f36 into main Apr 8, 2026
29 checks passed

BubbleCal deleted the yang/split-multiple-partitions branch April 8, 2026 11:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support IVF partitions multi-split#6423

feat: support IVF partitions multi-split#6423
BubbleCal merged 4 commits into
mainfrom
yang/split-multiple-partitions

BubbleCal commented Apr 7, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 7, 2026

Uh oh!

codecov Bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

Xuanwo Apr 8, 2026

Uh oh!

Xuanwo Apr 8, 2026

Uh oh!

Xuanwo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BubbleCal commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Feature

What is the new feature?

Why do we need this feature?

How does it work?

Performance Improvement

What is the performance issue or bottleneck?

How does this PR improve performance?

Testing

Uh oh!

github-actions Bot commented Apr 7, 2026

Uh oh!

codecov Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Xuanwo Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Xuanwo Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Xuanwo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BubbleCal commented Apr 7, 2026 •

edited

Loading

codecov Bot commented Apr 7, 2026 •

edited

Loading