Skip to content

fix(catalog): exclude replica tables from stale-base abort#622

Open
mkuchenbecker wants to merge 19 commits into
linkedin:mainfrom
mkuchenbecker:mkuchenb/incident-12185-exclude-replicas
Open

fix(catalog): exclude replica tables from stale-base abort#622
mkuchenbecker wants to merge 19 commits into
linkedin:mainfrom
mkuchenbecker:mkuchenb/incident-12185-exclude-replicas

Conversation

@mkuchenbecker
Copy link
Copy Markdown
Collaborator

Summary

Exempts replica tables from the doCommit stale-base CAS
(abortIfWriterBaseDivergedFromCatalog) added for incident-12185.

The CAS compares the writer's declared base (COMMIT_KEY) against the
catalog's current persisted base and aborts on divergence. That contract
does not hold for replica tables: the replication job replays the
primary's authoritative snapshot list wholesale rather than committing
incrementally against a loaded base, so its COMMIT_KEY does not encode
a "base I read" that the catalog head can be compared against. Enforcing
the CAS there would spuriously abort legitimate replication commits.

Changes

  • Early return in abortIfWriterBaseDivergedFromCatalog when the table
    is a replica (openhouse.isTableReplicated=true).
  • Extract a broad isReplicatedTable(properties) helper (flag only, no
    version check) and refactor isReplicatedTableCreate to reuse it. The
    exemption uses the broad check intentionally so it covers replica
    updates, not just creates.

Testing

testDoCommitDoesNotAbortStaleBaseForReplicaTable reuses the abort
test's COMMIT_KEY/base divergence setup but sets the replicated flag
and a non-initial table version (the case isReplicatedTableCreate
would miss); asserts doCommit does not throw and
houseTableRepository.save is invoked.

./gradlew :iceberg:openhouse:internalcatalog:test --tests "*OpenHouseInternalTableOperationsTest"

Both the new test and the original
testDoCommitMustAbortStaleBaseRebaseToPreventSnapshotLoss pass.

Stacking

Stacked on top of #620 (the stale-base abort + regression tests).
Because that branch is not yet on main, this PR's diff includes #620's
commits until it merges — review only the replica-exemption commit here.

🤖 Generated with Claude Code

mkuchenbecker and others added 15 commits May 29, 2026 10:39
…cident-12185)

Single-JVM, no-threads, no-docker reproduction of the BaseTransaction.applyUpdates
silent-rebase lost-update. A stale writer stages its L1 snapshot view + COMMIT_KEY=L1
in a held transaction; a racing writer advances the catalog to L2; committing the
stale transaction rebases the stale payload onto L2 so doCommit would subtract the
racing snapshot. Asserts the racing snapshot survives.

Passes on main (with the linkedin#612 stale-base abort CAS); reproduces the silent drop on
the pre-fix tree.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Factor the staged-conflict concurrency into a shared core and add a second test
where the table already holds committed data history (two snapshots) before the
race begins. Both assert the racing snapshot survives.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both contending writers now append data: the stale writer commits its own data
snapshot at base L1 (payload omitting the racing snapshot), modeling a genuine
concurrent-insert race rather than a property-only update. Also assert the stale
writer's conflicting insert is rejected wholesale, not merged onto the racing commit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Covers the race on a freshly created table with no committed data snapshots: the
stale writer's first insert races a concurrent first insert. Confirms the racing
snapshot survives even when the base has no prior history.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Models the production maintenance shape: an expire-snapshots commit (keep-subset
payload computed at base L1) racing a concurrent data insert. The stale expire's
subset omits the racing snapshot, so the subtractive merge would expire the fresh
data commit. Asserts the data commit survives and the stale expire is rejected
wholesale.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
count=0 (plain create) already reproduces the insert race; the single-snapshot
case added no coverage. Keep: zero-prior-data insert, two-snapshot prior-data
insert, and the expire-vs-insert race (which requires base snapshots).
Co-authored-by: Mike Kuchenbecker <mike@michaelkuchenbecker.com>
Validated by running pre-fix (d4fc9fe): both tests FAIL (racing snapshot silently
dropped); post-fix (linkedin#612): both PASS.

Three layered guards must all be bypassed for the lost update:
  1. HTS optimistic-version CAS  -> bypassed by applyUpdates' silent rebase (held txn)
  2. failIfRetryUpdate per-JVM cache -> bypassed cross-replica (cleared in the test)
  3. Iceberg snapshot sequence-number validation -> only bypassed by a SUBTRACTIVE
     stale commit (adds no new snapshot). A stale writer adding its own snapshot on a
     multi-snapshot base is rejected by guard #3, so it cannot lose data -- which is
     why incident-12185 was an expire/optimizer drop, not an insert race.

Tests (both subtractive): expire racing a data insert (prior history); two concurrent
first-inserts on a fresh table.
…scribe behavior only

Third case: two writers insert into a table that already holds data; the catalog
rejects the stale commit and the concurrent insert remains. Comments rewritten to
describe observable behavior without external references.
Inline commitThroughRepository, snapshotJson, idOf, and refs at their call sites.
Assert the table holds exactly the prior snapshots plus the concurrently committed
snapshot, regardless of whether the stale commit throws.
Prepare each commit's snapshot list explicitly at the call site so the base + new
snapshot is visible in the flow.
Use assertThrows(CommitFailedException) for the rejection, then assert the table
holds exactly the prior snapshots plus the concurrent commit.
…ate case

Assert the stale commit is rejected (any exception) rather than a specific type, so
the contract is rejection + final table state. Add a case where the stale commit
changes metadata without adding a snapshot.
Set an actual table property (plus the base snapshot view, no new snapshot) so the
case is a genuine property-changing commit rather than internal bookkeeping only.
The stale-base CAS added in `abortIfWriterBaseDivergedFromCatalog`
(linkedin#612, incident-12185) compares the writer's declared base (COMMIT_KEY)
against the catalog's current persisted base and aborts on divergence.
That contract does not hold for replica tables: the replication job
replays the primary's authoritative snapshot list wholesale rather than
committing incrementally against a loaded base, so its COMMIT_KEY does
not encode a "base I read" that the catalog head can be compared
against. Enforcing the CAS there would spuriously abort legitimate
replication commits.

Add an early return in `abortIfWriterBaseDivergedFromCatalog` when the
table is a replica (`openhouse.isTableReplicated=true`). Extract a broad
`isReplicatedTable(properties)` helper (flag only, no version check) and
refactor `isReplicatedTableCreate` to reuse it. The exemption uses the
broad check intentionally so it covers replica updates, not just
creates.

Test: `testDoCommitDoesNotAbortStaleBaseForReplicaTable` reuses the
abort test's COMMIT_KEY/base divergence setup but sets the replicated
flag and a non-initial version (the case isReplicatedTableCreate would
miss); asserts doCommit does not throw and houseTableRepository.save is
invoked.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per review on linkedin#622:
- Inline the replica exemption directly in
  abortIfWriterBaseDivergedFromCatalog instead of adding an
  isReplicatedTable helper; revert isReplicatedTableCreate to its
  original form (no refactor).
- Drop the replica paragraph from the method javadoc (out of place); the
  early-return's inline comment already explains the exemption.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mkuchenbecker mkuchenbecker requested review from cbb330 and teamurko May 29, 2026 22:54
@mkuchenbecker mkuchenbecker marked this pull request as ready for review May 29, 2026 22:54
mkuchenbecker and others added 2 commits May 29, 2026 16:24
Describe behaviour, not the ticket: rename the temp-dir prefix from an
incident identifier to one that names what the test exercises
(replica / stale base).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…replicas' into mkuchenb/incident-12185-exclude-replicas
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant