[Just for sharing] Abort commit when writer's declared base diverges from current#611
Draft
dushyantk1509 wants to merge 1 commit into
Draft
Conversation
Catalog commits arriving via the snapshots endpoint smuggle the writer's intended state in opaque properties (snapshotsJsonToBePut, commitKey). When two concurrent commits race, Iceberg's BaseTransaction.applyUpdates silently refresh-and-reapplies the property set against the post-race base, after which OpenHouse's subtractive merge cannot distinguish a writer-intended snapshot removal from a race-acquired snapshot, and the loser's commit is dropped on the floor (BaseTransaction:466 "Failed to load metadata for a committed snapshot, skipping clean-up" WARN). doCommit now compares the writer's declared base (COMMIT_KEY) against the actual base loaded into the table operations. If they diverge, throw CommitFailedException instead of merging; the writer can refresh and retry against the current state. INITIAL_VERSION (table-create) and post-create paths where COMMIT_KEY is absent are unaffected. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
| * through Iceberg's commit-retry path to the application. | ||
| */ | ||
| @Test | ||
| void testDoCommitAbortsOnStaleClaimedBase() throws IOException { |
Collaborator
There was a problem hiding this comment.
This fails without the change?
mkuchenbecker
added a commit
to mkuchenbecker/openhouse
that referenced
this pull request
May 28, 2026
PR linkedin#611's existing testDoCommitAbortsOnStaleClaimedBase exploits a null base.metadataFileLocation() to make the comparison trip, which bypasses the URI-normalization branch of isSameMetadataPath. In production, both the writer's COMMIT_KEY and base.metadataFileLocation() are non-null concrete paths; the URI.create(...).getPath() comparison is what actually matters. Adds two tests next to my existing repro: - abortFiresWhenBothPathsAreNonNullAndDifferent: writes a real metadata.json via TableMetadataParser, reads it back to get a non-null metadataFileLocation, then asserts CommitFailedException when COMMIT_KEY points at a different real path. Verified: pre-fix: FAILS (silent snapshot drop — bug fires) post-fix: PASSES (CommitFailedException thrown — fix fires) This is the production-realistic regression guard. - abortDoesNotFireWhenPathsDifferOnlyInUriScheme: same setup, but writer's COMMIT_KEY differs from base.metadataFileLocation() only in URI scheme (file:// prefix vs scheme-less). Asserts that doCommit does NOT throw. Non-regression check — the isSameMetadataPath URI normalization must not falsely abort legitimate commits with scheme variation. The original snapshotALandsThenIsSilentlyDroppedByStaleBaseSecondCommit test is intentionally kept passing UNCHANGED with the fix applied — it documents that the fix is narrow (gated on COMMIT_KEY presence). Any future code path that calls doCommit without setting COMMIT_KEY will re-introduce this bug class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mkuchenbecker
added a commit
to mkuchenbecker/openhouse
that referenced
this pull request
May 28, 2026
PR linkedin#614's SparkConcurrentInsertFunctionalTest passes 4/4 with this change. Root cause identified via end-to-end instrumentation: - Two racing INSERTs reach houseTableRepository.save() with the same cached `tableVersion` (HTS-level CAS input). - Both saves succeed at the JPA layer because HouseTable had no @Version field, so Spring Data JPA's em.merge() did not emit a versioned UPDATE. - Result: last writer overwrites HTS pointer, loser's metadata.json is orphaned, loser's data file is unreachable. 10 inserts report success but only 9 rows visible. Adding @Version makes em.merge produce UPDATE house_table SET ..., version = ? WHERE id = ? AND version = ? which the H2 backend's MVCC enforces. The second writer's UPDATE matches 0 rows -> ObjectOptimisticLockingFailureException -> wrapped as 409 -> Iceberg client retries with fresh base -> data lands. Validated: Before: test fails (rows=9, expected=10) - reliable repro After: test passes 4 consecutive runs This complements PR linkedin#611's catalog-level abortIfWriterBaseDivergedFromCatalog check (the BaseTransaction.applyUpdates silent-rebase variant of the bug class). Together they close two distinct races in the same bug class: - Silent-rebase variant: covered by the doCommit-level CAS - HTS-save variant: covered by this @Version field Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mkuchenbecker
added a commit
to mkuchenbecker/openhouse
that referenced
this pull request
May 28, 2026
…remove dead @Version code PR linkedin#614's test reproduces a race against the H2 test fixture (HouseTablesH2Repository, @primary in tests). The fixture short-circuits the production HTS service: no UserTableVersionMapper, no UserTableRow @Version. The race it catches doesn't exist on the prod MySQL+HTS path. Removing. Restore testDoCommitAbortsOnStaleClaimedBase from PR linkedin#611 (production-realistic shape, writes a real metadata.json on disk so the URI-normalized path comparison runs). Remove the commented-out @Version block on HouseTable; that path was the wrong layer.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Catalog commits arriving via the snapshots endpoint smuggle the writer's intended state in opaque properties (snapshotsJsonToBePut, commitKey). When two concurrent commits race, Iceberg's BaseTransaction.applyUpdates silently refresh-and-reapplies the property set against the post-race base, after which OpenHouse's subtractive merge cannot distinguish a writer-intended snapshot removal from a race-acquired snapshot, and the loser's commit is dropped on the floor (BaseTransaction:466 "Failed to load metadata for a committed snapshot, skipping clean-up" WARN).
doCommit now compares the writer's declared base (COMMIT_KEY) against the actual base loaded into the table operations. If they diverge, throw CommitFailedException instead of merging; the writer can refresh and retry against the current state. INITIAL_VERSION (table-create) and post-create paths where COMMIT_KEY is absent are unaffected.
Summary
Issue] Briefly discuss the summary of the changes made in this
pull request in 2-3 lines.
Changes
For all the boxes checked, please include additional details of the changes made in this pull request.
Testing Done
For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.
Additional Information
For all the boxes checked, include additional details of the changes made in this pull request.