fix(index): reject user-specified index_uuid for segmented builds#7053
Draft
zhangyue19921010 wants to merge 2 commits into
Draft
fix(index): reject user-specified index_uuid for segmented builds#7053zhangyue19921010 wants to merge 2 commits into
zhangyue19921010 wants to merge 2 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
closes: #7032
What & why
The segmented-index architecture generates segment UUIDs internally — each segment is written to its own
indices/<uuid>/. Accepting a user-specifiedindex_uuidthere is unsafe: it is used verbatim as the directory, so concurrent distributed workers (or a collision with an already-committed index) can silently overwrite each other's segment files. Once an index type is built on this architecture, a user-specified UUID has no valid use.This adds a single guard in the Rust core (
CreateIndexBuilder::execute_uncommitted, the one chokepoint all language bindings funnel through) that rejectsindex_uuidfor any index built via the segmented-index path. It is keyed on the existinguses_segment_commit_pathpredicate, so it automatically covers each index type as it migrates to the architecture — no per-type wiring.Behavior changes
Indexes built on the segmented-index architecture no longer accept
index_uuid.They reject it with:
This applies to both full and fragment-scoped/distributed builds. Callers omit it and read the generated UUID from the returned index metadata.
(Bug fix, side effect) The segment-commit path now actually runs for these indexes.
It was previously dead code — the predicate compared the user-facing index name against the internal
__lance_vector_indexkind id and was always false — so builds fell through to the legacy single-index commit. Output is equivalent and all existing tests pass.Scope / impact
index_uuidvia Python (create_index/create_index_uncommitted), Java (IndexOptions.withIndexUUID), or Rust (CreateIndexBuilder::index_uuid) now errors.merge_index_metadataflow (scalar:BTREE/BITMAP/INVERTED/ …) keep accepting a sharedindex_uuid. They are picked up by the same guard automatically once migrated — no API change needed then.index_uuidparameters stay; the rule is enforced centrally in the core.distributed_vector_buildwas rebuilt around the segmented merge API (merge_existing_index_segments); it previously drove the now-removed shared-UUIDmerge_index_metadatapath.