Fixes #28718: prevent lost-update race in PolicyConditionUpdater#28720
Fixes #28718: prevent lost-update race in PolicyConditionUpdater#28720harshach wants to merge 2 commits into
Conversation
Concurrent tag/glossary renames touching overlapping policy rows could clobber each other's condition rewrites: each changed policy was written with a blind full-row update and no conflict check (classic lost-update). Replace the blind getDao().update(policy) with an optimistic content-based compare-and-swap (new EntityDAO.updateIfMatches) plus a bounded re-read / re-apply retry, so a concurrent writer's change is never lost. Content-CAS is used instead of version-CAS so the automated rewrite keeps its silent semantics (no version bump, no version history). Also paginate the policy scan to drop the silent 10000-policy cap. Validated the CAS SQL on real MySQL 8.3 and Postgres 15; added a concurrent-rename regression test in PolicyResourceIT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🟡 Playwright Results — all passed (9 flaky)✅ 4271 passed · ❌ 0 failed · 🟡 9 flaky · ⏭️ 88 skipped
🟡 9 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
The content-CAS retry re-read used a non-locking SELECT. Inside the enclosing rename/delete transaction, MySQL's default REPEATABLE READ serves the transaction's stale snapshot for non-locking reads, so after a CAS conflict the retry kept re-reading the pre-conflict row and the CAS never converged -- the second concurrent rename was dropped (the exact bug this PR fixes). Postgres READ COMMITTED happened to work. Re-read with EntityDAO.findByIdForUpdate (SELECT ... FOR UPDATE) so the retry sees the latest committed row and conflicting writers serialize. Verified on real MySQL 8.3 in REPEATABLE READ transactions: two concurrent renames both survive with the locking read; the second is lost with a plain read (retry exhausts re-reading the stale snapshot). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Code Review ✅ Approved 1 resolved / 1 findingsPrevents lost-update race conditions in PolicyConditionUpdater by implementing a content-based compare-and-swap with bounded retries. The concurrent update logic now correctly handles stale re-reads, ensuring all policy condition rewrites persist. ✅ 1 resolved✅ Bug: CAS retry re-read is stale under MySQL REPEATABLE READ, fix defeated
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|



Describe your changes:
Fixes #28718
I reworked
PolicyConditionUpdater.updateAllPolicyConditionsbecause concurrent tag/glossary renames touching overlapping policy rows could clobber each other's condition rewrites — each changed policy was written with a blind full-row update and no conflict check (classic lost-update). The blindgetDao().update(policy)is replaced with an optimistic content-based compare-and-swap (newEntityDAO.updateIfMatches, which writes only if the stored JSON still equals the snapshot last read) plus a bounded retry that re-reads with aFOR UPDATElock and re-applies the rewriter. I chose content-CAS over the existing version-CAS so the automated rewrite keeps its silent semantics (no version bump, no version-history entry). I also paginated the policy scan to drop the silent 10000-policy cap.Type of change:
High-level design:
EntityDAO.updateIfMatches(...)addsAND json = CAST(:expectedJson AS JSON)(MySQL) /AND json = (:expectedJson :: jsonb)(Postgres) to the row update, returning rows-updated (0 = a concurrent writer won → retry).rewriteSinglePolicyre-reads the row viafindByIdForUpdate(SELECT json ... FOR UPDATE), re-applies the rewriter, and CAS-writes; on conflict it re-reads and retries up to 5×. The locking read is essential: this runs inside the enclosing rename/delete@Transaction, so under MySQL's default REPEATABLE READ a non-locking re-read returns the transaction's stale snapshot and the retry could never converge (the second rename would be silently dropped) —FOR UPDATEreturns the latest committed row and serializes conflicting writers. Version-CAS (updateWithVersion) was rejected because it only detects conflicts when each write bumps the version, which would break the deliberate silent-rewrite behavior and desync version history.Tests:
Use cases covered
Unit tests
PolicyConditionUpdaterTest(41 tests) still green; the pure rewrite helpers are unchanged.Backend integration tests
PolicyResourceIT.test_concurrentTagRenamesBothUpdatePolicyCondition(barrier-aligned concurrent renames, repeated, asserts both rewrites persist). Runs against MySQL in CI.Ingestion integration tests
Playwright (UI) tests
Manual testing performed
Reproduced the race on real MySQL 8.3 with two
pymysqlconnections in REPEATABLE READ transactions (the same isolation as production), each holding a snapshot of the policy row:matchAnyTag(A2, B).FOR UPDATEre-read, B reads the latest committed row, re-applies, and succeeds in 1 attempt — both survive:matchAnyTag(A2, B2).Same harness, only the re-read strategy differs, confirming both the bug and the fix.
UI screen recording / screenshots:
Not applicable.
Checklist:
Fixes <issue-number>: <short explanation>Fixes #<issue-number>above.test_concurrentTagRenamesBothUpdatePolicyCondition, see Lost-update race in PolicyConditionUpdater on concurrent tag/glossary renames #28718).