Fix flaky domain & data product rename by proceeding on search version conflicts#28580
Conversation
…n conflicts Domains.spec.ts intermittently fails when renaming a domain or data product. On rename, the FQN change is propagated to every related search document via updateByQuery. With concurrent writes touching the same assets, Elasticsearch/OpenSearch raise version_conflict_engine_exception, which aborts the whole updateByQuery and leaves the rename half-applied in the index (stale FQNs), so the assertions on the renamed entity flake. Make the rename-propagation updateByQuery calls resilient by setting conflicts=proceed, mirroring the handling already on main (#25751): Domain rename: - updateDomainFqnByPrefix - updateAssetDomainFqnByPrefix (also scope the query to matching domain documents instead of match_all) Data product rename: - updateDataProductReferences - updateAssetDomainsForDataProduct - updateAssetDomainsByIds Applied to both ElasticSearchEntityManager and OpenSearchEntityManager. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
🔴 Playwright Results — 1 failure(s), 26 flaky✅ 3463 passed · ❌ 1 failed · 🟡 26 flaky · ⏭️ 97 skipped
Genuine Failures (failed on all attempts)❌
|
|
The Python checkstyle failed. Please run You can install the pre-commit hooks with |
Code Review ✅ Approved 1 resolved / 1 findingsIncreases domain and data product rename stability by enabling conflict progression in search index updates. Note that the refined prefix query may inadvertently match sibling domains with shared FQN prefixes. ✅ 1 resolved✅ Performance: Prefix query may over-match sibling domains with shared prefix
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
Describe your changes:
Domains.spec.tsflakes in the nightly AUT runs (e.g. PostgreSQL 1.9.1 ➡ 1.12.10) when a domain or data product is renamed.Root cause. On rename, the new FQN is propagated to every related search document with
updateByQuery. When concurrent writes touch the same assets, Elasticsearch/OpenSearch raiseversion_conflict_engine_exception, which aborts the entireupdateByQueryand leaves the rename half-applied in the index (stale FQNs). The UI then reads stale data and the spec assertions on the renamed entity/its subdomains/assets fail intermittently.Fix. Set
conflicts=proceedon the rename-propagationupdateByQuerycalls so a per-document version conflict is retried/skipped instead of aborting the batch. This mirrors the handling already present onmain(#25751); this PR backports just that conflict handling to the 1.12 line.Methods updated in both
ElasticSearchEntityManagerandOpenSearchEntityManager:updateDomainFqnByPrefix,updateAssetDomainFqnByPrefixupdateDataProductReferences,updateAssetDomainsForDataProduct,updateAssetDomainsByIdsupdateAssetDomainFqnByPrefixadditionally scopes its query to documents matching the domain FQN prefix (buildDomainFqnPrefixQuery) instead ofmatch_all, so it only rewrites affected assets — fewer documents touched, fewer conflicts.Surgical: search-layer only, +34/−6 across the two manager classes, no behavior change beyond conflict resilience.
Type of change:
Checklist:
mvn spotless:checkonopenmetadata-service(BUILD SUCCESS) and verified theConflictsenum /conflicts(...)builder resolve against the bundled ES (9.2.4) / OS (3.5.0) clients.