Skip to content

fix: branch_identfier unstable for legacy branches#6390

Merged
jackye1995 merged 4 commits into
mainfrom
codex/stable-synthetic-branch-identifier
May 20, 2026
Merged

fix: branch_identfier unstable for legacy branches#6390
jackye1995 merged 4 commits into
mainfrom
codex/stable-synthetic-branch-identifier

Conversation

@majin1102
Copy link
Copy Markdown
Contributor

@majin1102 majin1102 commented Apr 2, 2026

Problem

Legacy branches, i.e. branches whose BranchContents were written without a persisted branch_identifier, currently deserialize through BranchIdentifier::none(). That fallback generates a fresh random UUID on each read, so the same unchanged branch can surface a different branch_identifier across repeated loads.

This makes branch identity unstable in both Python and Java for legacy datasets. On the Python side, branches.list() / branches_ordered() expose branch_identifier directly, so callers that diff, cache, or snapshot branch metadata can observe false changes even when the branch itself has not changed. On the Java side, the same legacy branch can also appear with a different identifier across refreshes, which makes equality-style comparisons unstable as well.

Summary

  • stabilize fallback branch identifiers for legacy branch metadata by replacing the missing-identifier sentinel with a deterministic synthetic UUID during branch metadata reads
  • keep the fallback logic localized to Rust branch metadata loading so Python and Java continues returning stable branch_identifier values without API shape changes
  • add a lightweight Rust regression test that exercises BranchContents::from_path on in-memory branch metadata and verifies stable repeated reads plus distinct identifiers for different branch names

@github-actions github-actions Bot added bug Something isn't working A-python Python bindings labels Apr 2, 2026
@majin1102 majin1102 changed the title fix: stabilize synthetic branch identifier fallback fix: branch_identfier unstable for legacy branches Apr 2, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 96.73913% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/refs.rs 96.73% 1 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@majin1102 majin1102 marked this pull request as ready for review April 2, 2026 16:59
@majin1102 majin1102 added the A-java Java bindings + JNI label Apr 3, 2026
@majin1102
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4095bcaff0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread rust/lance/src/dataset/refs.rs
@jackye1995 jackye1995 self-requested a review May 15, 2026 05:20
@majin1102 majin1102 force-pushed the codex/stable-synthetic-branch-identifier branch from 0444261 to 581f549 Compare May 15, 2026 05:52
@majin1102
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 581f54939f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +837 to +841
pub async fn from_path(
path: &Path,
object_store: &ObjectStore,
branch_name: &str,
) -> Result<Self> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve the existing BranchContents::from_path API

Because dataset::refs is public, changing this method from BranchContents::from_path(path, store) to require branch_name breaks downstream Rust callers of the existing API even though this bugfix can keep the old signature as a wrapper and put the branch-aware fallback in a separate internal helper. This violates the repository rule to deprecate rather than break public API signatures and makes a compatibility bugfix semver-breaking for Rust users.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@Deniwn22 Deniwn22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks correct. One suggestion: the early return on line prevents the cleanup function from running — worth adding a finally block.

@majin1102 majin1102 force-pushed the codex/stable-synthetic-branch-identifier branch from 581f549 to 7b3e91c Compare May 16, 2026 04:40
@majin1102
Copy link
Copy Markdown
Contributor Author

Logic looks correct. One suggestion: the early return on line prevents the cleanup function from running — worth adding a finally block.

Hi, @Deniwn22, thanks for your attention.

This PR has been open for quite a while. There might be some context I don't get. Would you mind explain more on your thought?

@majin1102
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d0457adcdd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread rust/lance/src/dataset/refs.rs
Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me!

@jackye1995 jackye1995 merged commit 742e6a3 into main May 20, 2026
28 checks passed
@jackye1995 jackye1995 deleted the codex/stable-synthetic-branch-identifier branch May 20, 2026 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-java Java bindings + JNI A-python Python bindings bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants