Skip to content

fix(index): reject user-specified index_uuid for segmented builds#7053

Draft
zhangyue19921010 wants to merge 2 commits into
lance-format:mainfrom
zhangyue19921010:user-specific-index-uuid
Draft

fix(index): reject user-specified index_uuid for segmented builds#7053
zhangyue19921010 wants to merge 2 commits into
lance-format:mainfrom
zhangyue19921010:user-specific-index-uuid

Conversation

@zhangyue19921010
Copy link
Copy Markdown
Contributor

closes: #7032

What & why

The segmented-index architecture generates segment UUIDs internally — each segment is written to its own indices/<uuid>/. Accepting a user-specified index_uuid there is unsafe: it is used verbatim as the directory, so concurrent distributed workers (or a collision with an already-committed index) can silently overwrite each other's segment files. Once an index type is built on this architecture, a user-specified UUID has no valid use.

This adds a single guard in the Rust core (CreateIndexBuilder::execute_uncommitted, the one chokepoint all language bindings funnel through) that rejects index_uuid for any index built via the segmented-index path. It is keyed on the existing uses_segment_commit_path predicate, so it automatically covers each index type as it migrates to the architecture — no per-type wiring.

Behavior changes

  1. Indexes built on the segmented-index architecture no longer accept index_uuid.
    They reject it with:

    index_uuid is no longer accepted for <type> indexes; segment UUIDs are generated by Lance and returned in the index metadata.

    This applies to both full and fragment-scoped/distributed builds. Callers omit it and read the generated UUID from the returned index metadata.

  2. (Bug fix, side effect) The segment-commit path now actually runs for these indexes.
    It was previously dead code — the predicate compared the user-facing index name against the internal __lance_vector_index kind id and was always false — so builds fell through to the legacy single-index commit. Output is equivalent and all existing tests pass.

Scope / impact

  • Affected now: index types currently on the segmented-index architecture (vector). Passing index_uuid via Python (create_index / create_index_uncommitted), Java (IndexOptions.withIndexUUID), or Rust (CreateIndexBuilder::index_uuid) now errors.
  • Not yet affected: index types still on the legacy distributed merge_index_metadata flow (scalar: BTREE / BITMAP / INVERTED / …) keep accepting a shared index_uuid. They are picked up by the same guard automatically once migrated — no API change needed then.
  • No public API signature removedindex_uuid parameters stay; the rule is enforced centrally in the core.
  • Benchmark: distributed_vector_build was rebuilt around the segmented merge API (merge_existing_index_segments); it previously drove the now-removed shared-UUID merge_index_metadata path.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added A-python Python bindings bug Something isn't working labels Jun 2, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@zhangyue19921010 zhangyue19921010 marked this pull request as draft June 2, 2026 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-python Python bindings bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Manual specification of UUID by users is no longer supported under the Segmented index distributed index building framework.

1 participant