fix: branch_identfier unstable for legacy branches#6390
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4095bcaff0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
0444261 to
581f549
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 581f54939f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pub async fn from_path( | ||
| path: &Path, | ||
| object_store: &ObjectStore, | ||
| branch_name: &str, | ||
| ) -> Result<Self> { |
There was a problem hiding this comment.
Preserve the existing BranchContents::from_path API
Because dataset::refs is public, changing this method from BranchContents::from_path(path, store) to require branch_name breaks downstream Rust callers of the existing API even though this bugfix can keep the old signature as a wrapper and put the branch-aware fallback in a separate internal helper. This violates the repository rule to deprecate rather than break public API signatures and makes a compatibility bugfix semver-breaking for Rust users.
Useful? React with 👍 / 👎.
Deniwn22
left a comment
There was a problem hiding this comment.
Logic looks correct. One suggestion: the early return on line prevents the cleanup function from running — worth adding a finally block.
581f549 to
7b3e91c
Compare
Hi, @Deniwn22, thanks for your attention. This PR has been open for quite a while. There might be some context I don't get. Would you mind explain more on your thought? |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d0457adcdd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Problem
Legacy branches, i.e. branches whose
BranchContentswere written without a persistedbranch_identifier, currently deserialize throughBranchIdentifier::none(). That fallback generates a fresh random UUID on each read, so the same unchanged branch can surface a differentbranch_identifieracross repeated loads.This makes branch identity unstable in both Python and Java for legacy datasets. On the Python side,
branches.list()/branches_ordered()exposebranch_identifierdirectly, so callers that diff, cache, or snapshot branch metadata can observe false changes even when the branch itself has not changed. On the Java side, the same legacy branch can also appear with a different identifier across refreshes, which makes equality-style comparisons unstable as well.Summary
branch_identifiervalues without API shape changesBranchContents::from_pathon in-memory branch metadata and verifies stable repeated reads plus distinct identifiers for different branch names