Skip to content

fix: do no work when optimize_indices is called but no new data has arrived#6711

Merged
westonpace merged 4 commits into
lance-format:mainfrom
westonpace:fix-noop-optimize
May 12, 2026
Merged

fix: do no work when optimize_indices is called but no new data has arrived#6711
westonpace merged 4 commits into
lance-format:mainfrom
westonpace:fix-noop-optimize

Conversation

@westonpace
Copy link
Copy Markdown
Member

Summary

  • Make optimize_indices() a no-op when no fragment is unindexed and the caller hasn't asked for retrain or num_indices_to_merge >= 1. Without this, a steady-state dataset (everything already indexed) had every call rewrite identical scalar-index files and trigger a vector single-segment auto-rebalance, producing a new manifest, transaction file, and per-index files on every call.
  • Adds a Python regression test that builds one of every index type (BTREE, BITMAP, LABEL_LIST, INVERTED, NGRAM, ZONEMAP, BLOOMFILTER, IVF_PQ), appends a fragment, runs optimize_indices() twice, and asserts the second call writes zero new files under the dataset directory.

Why

Two independent paths were both writing files when there was nothing to do:

  1. Scalar indicesmerge_indices in rust/lance/src/index/append.rs always called index.update(...) even when unindexed.is_empty(), producing a new UUID directory with identical contents.
  2. Vector indices — the single-segment auto-rebalance from feat: optimize one segmented vector segment per run #6402 fires when num_segments > 1 && num_indices_to_merge.is_none() && !retrain && unindexed.is_empty(). After a prior optimize_indices() had folded a delta into its own segment, every subsequent call re-rebalanced one segment.

The user-visible symptom was that calling optimize_indices() in a loop kept bumping the dataset version and creating index files indefinitely, even when the dataset hadn't changed.

Approach

The fix lives in Dataset::optimize_indices in rust/lance/src/index.rs. Before calling merge_indices for each grouped delta set, we now skip the group when:

  • the caller did not pass retrain = true, and
  • num_indices_to_merge is None or Some(0) (no explicit delta consolidation requested), and
  • every fragment in the dataset is already covered by the union of the group's existing fragment_bitmaps.

The bitmap union is computed from the metadata we already loaded (no extra IO). Pre-0.8 indices without a fragment_bitmap are treated conservatively — they still go through the merge path so they pick up the new format on first optimize.

If any index group has work to do, the rest of the function is unchanged; the new check is purely a continue. When every group is skipped, the existing if new_indices.is_empty() { return Ok(()); } short-circuit prevents any commit, so no manifest or transaction file is written.

Test plan

  • New Python test test_optimize_indices_second_call_is_noop passes
  • python/tests/test_optimize.py, python/tests/test_scalar_index.py, python/tests/test_vector_index.py — 225 passed, 10 skipped (pre-existing), 1 deselected (unrelated pre-existing failure in test_create_index_progress_callback_error_before_completion_propagates)
  • cargo test -p lance --lib index:: — 389 passed, 1 ignored (the auto-rebalance test noted above)
  • cargo clippy --lib --tests -- -D warnings — clean
  • CI green

Notes for reviewers

  • The fragment-bitmap-only check intentionally skips unioning bitmaps from system indices (__lance_frag_reuse, __lance_mem_wal); those are filtered out before this loop by the existing is_system_index check.

westonpace and others added 2 commits May 7, 2026 21:30
Repeated optimize_indices() calls on a steady-state dataset were rewriting
identical files on every invocation. For each scalar index, the merge path
unconditionally called update() and produced a new index file even with an
empty unindexed input; for vector indices, the default-options auto-rebalance
introduced in lance-format#6402 fired whenever multiple segments existed, which is the
common state immediately after a previous optimize.

Skip an index group at the top of the per-index loop when:
  - no fragments are unindexed for that group, AND
  - the caller did not request retrain, AND
  - num_indices_to_merge is None or 0 (no explicit delta merge).

Pre-0.8 indices without a fragment_bitmap are conservatively treated as still
needing optimize. The auto-rebalance path is unreachable through default
options after this change; its regression test is marked #[ignore] until the
feature is reintroduced behind an explicit opt-in.

Adds a Python regression test that builds one index of each type, appends a
fragment, runs optimize_indices() twice, and asserts the second call adds no
files to the dataset directory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… check

The previous commit's top-level skip in optimize_indices made the
single-segment auto-rebalance from lance-format#6402 unreachable, since rebalance
fires precisely when there are no unindexed fragments. That path is the
real reason the regression test in lance-format#6402 exists, so the test was being
ignored rather than maintained.

Restructure so the top-level skip applies only to scalar indices (which
have no rebalance concept) and add a rebalance-aware no-op inside the
vector branch of merge_indices_with_unindexed_frags: when there is no
new data and no retrain/merge requested, call
select_segment_for_single_rebalance and bail when it returns no
candidate. This preserves both behaviors:

- A truly steady-state dataset is a no-op (the user's bug).
- A skewed multi-segment dataset still self-heals one segment per call
  on the default path.

Removes the #[ignore] from test_segmented_optimize_rebalances_only_one_segment
and adjusts the python regression test to use num_partitions=1 so its
small dataset isn't perpetually below MIN_PARTITION_SIZE_PERCENT and
caught by the join-rebalance heuristic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added bug Something isn't working python labels May 7, 2026
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

❌ Patch coverage is 95.65217% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/index.rs 85.00% 3 Missing ⚠️
rust/lance/src/index/append.rs 98.61% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@westonpace
Copy link
Copy Markdown
Member Author

@claude review once

Comment on lines +195 to +201
if unindexed.is_empty()
&& !options.retrain
&& options.num_indices_to_merge.is_none()
&& select_segment_for_single_rebalance(&ivf_view)?.is_none()
{
return Ok(None);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new vector no-op check at append.rs:197 uses options.num_indices_to_merge.is_none(), but the matching scalar no-op check in index.rs uses is_none_or(|n| n == 0). OptimizeOptions::append() is defined as num_indices_to_merge: Some(0) and is used by real callers (e.g. cleanup.rs and tests via optimize_indices(&OptimizeOptions::append())), so on a steady-state vector index a second append() call still falls through to optimize_vector_indices, writes a new UUID directory + manifest, and reproduces exactly the regression this PR is trying to fix. Suggest using is_none_or(|n| n == 0) on append.rs:197 (and the matching use_single_segment_rebalance gate two lines below) to align the two checks with the PR description ("None or Some(0)").

Extended reasoning...

What the bug is

The PR description states the no-op should fire when num_indices_to_merge is None or Some(0). The two checks the PR introduces don't agree on what "no merge requested" means:

  • Scalar path (rust/lance/src/index.rs:1260): options.num_indices_to_merge.is_none_or(|n| n == 0) — correctly covers both None and Some(0).
  • Vector path (rust/lance/src/index/append.rs:197): options.num_indices_to_merge.is_none() — only covers None, not Some(0).

The pre-existing use_single_segment_rebalance gate two lines below (append.rs:204) is also is_none(), so for Some(0) neither gate fires.

Why Some(0) is a real, reachable input

OptimizeOptions::append() in rust/lance-index/src/optimize.rs:78-84 is defined as:

pub fn append() -> Self {
    Self {
        num_indices_to_merge: Some(0),
        ...
    }
}

This is a public entry point used by real Rust callers — notably rust/lance/src/dataset/cleanup.rs (auto-cleanup) calls optimize_indices(&OptimizeOptions::append()), and the test suite (rust/lance/src/index/append.rs:569) does the same. The Python regression test added in this PR uses the default kwargs, which build OptimizeOptions::default() (None), so the test happens to take the path that does work and never exercises Some(0).

Step-by-step proof on a steady-state vector index

  1. Create a dataset with an IVF_PQ index covering all fragments, no unindexed fragments.
  2. Call dataset.optimize_indices(&OptimizeOptions::append())options.num_indices_to_merge == Some(0).
  3. In Dataset::optimize_indices (index.rs), the new scalar short-circuit at line 1260 has index_group_is_scalar == false for the vector group, so the continue is not taken. Control reaches merge_indices(..).
  4. merge_indices calls merge_indices_with_unindexed_frags with unindexed == [].
  5. At append.rs:191-200, the new vector no-op check evaluates unindexed.is_empty() && !options.retrain && options.num_indices_to_merge.is_none() && select_segment_for_single_rebalance(&ivf_view)?.is_none(). The third conjunct (is_none()) is false for Some(0), so the no-op short-circuit does not return None.
  6. At append.rs:202-206, use_single_segment_rebalance is gated on options.num_indices_to_merge.is_none() — also false for Some(0). So we fall into the else branch (line 271+).
  7. new_data_stream is None (unindexed.is_empty()). optimize_vector_indices is called.
  8. In rust/lance/src/index/vector/ivf.rs around line 415, let writer = object_store.create(&index_file).await? unconditionally creates a new UUID-named index file.
  9. Back in Dataset::optimize_indices, this populates new_indices with a non-empty IndexMergeResults, so the if new_indices.is_empty() { return Ok(()) } short-circuit does not fire, and the function commits a new manifest + transaction.
  10. Result: every call to optimize_indices(&OptimizeOptions::append()) on a steady-state vector index writes a new UUID directory and bumps the dataset version — the exact symptom the PR aims to fix.

Impact

OptimizeOptions::append() is documented as an entry point and is wired into auto-cleanup, so this isn't a hypothetical input. Any Rust caller (or anyone building OptimizeOptions { num_indices_to_merge: Some(0), .. } directly) on a vector index keeps re-emitting manifests and index files indefinitely. The PR's own description ("None or Some(0)") states the intended invariant — only half of the implementation enforces it.

Suggested fix

Change line 197 and line 204 of rust/lance/src/index/append.rs from:

&& options.num_indices_to_merge.is_none()

to:

&& options.num_indices_to_merge.is_none_or(|n| n == 0)

to match the scalar path and the documented intent. Strengthening the Python regression test to also call optimize_indices(num_indices_to_merge=0) would lock this in.

Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the claude comment is valid, but not an important edge case.

Thanks for implementing this!

The new vector no-op check used num_indices_to_merge.is_none() while the
scalar gate at the top of optimize_indices uses is_none_or(|n| n == 0).
OptimizeOptions::append() is Some(0) and is used by real callers
(cleanup.rs, internal tests), so on a steady-state vector index a second
append() call still fell through to optimize_vector_indices and wrote a
new UUID directory + manifest — reproducing the exact regression this
PR is meant to fix.

Switch both the no-op check and the use_single_segment_rebalance gate
two lines below to is_none_or(|n| n == 0) so default() and append()
behave the same on a steady state: bail when no segment needs
rebalancing; otherwise enter the single-segment rebalance branch.

Add a Rust regression test in index::append::tests that exercises
OptimizeOptions::append() exactly: build a 1-partition IVF_PQ index,
append a fragment, call optimize_indices(append()) once to fold it in,
then call it again and assert the manifest version is unchanged and no
new index directory was created.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@westonpace westonpace merged commit 374d58f into lance-format:main May 12, 2026
28 checks passed
@andrea-reale
Copy link
Copy Markdown

andrea-reale commented May 12, 2026

Thanks for this fix!

One thing I noted: this "fast path" is gated on num_indices_to_merge being 0 (or None). This means that if as a caller passes Some(arbitarily_large_n) than this goes back to copying all indices even if the scalar index is already optimal with only 1 delta.

Am I reading this wrong? Is this intended behavior?

@westonpace
Copy link
Copy Markdown
Member Author

Am I reading this wrong? Is this intended behavior?

I think you're right, we could do better here. I guess it shouldn't matter at all what the value of num_indices_to_merge is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants