fix migration: update legacy relatedTerms in glossaryTerm version history after the glossary term realtion changes#27770
Conversation
1c79286 to
7c22773
Compare
There was a problem hiding this comment.
Pull request overview
Extends the existing 1.13.0 post-data-migration to also remove legacy relatedTerms data from glossary term version snapshots stored in entity_extension, preventing Jackson deserialization failures (500s) when requesting historical term versions.
Changes:
- Add a Postgres migration step to strip
relatedTermsfromentity_extensionrows withextensionmatchingglossaryTerm.version.%. - Add a MySQL migration step to strip
relatedTermsfrom the sameentity_extensionversion snapshot rows.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| bootstrap/sql/migrations/native/1.13.0/postgres/postDataMigrationSQLScript.sql | Removes relatedTerms from glossary term version snapshot JSON in entity_extension for Postgres. |
| bootstrap/sql/migrations/native/1.13.0/mysql/postDataMigrationSQLScript.sql | Removes relatedTerms from glossary term version snapshot JSON in entity_extension for MySQL. |
Extends PR #26586. That fix cleaned glossary_term_entity but not the version snapshots in entity_extension, so GET /versions/{v} still 500s on any pre-1.13 term whose relatedTerms had legacy shape: UnrecognizedPropertyException: Unrecognized field "id" (class TermRelation, has only "term" and "relationType") Predicate matches only legacy snapshots — first item has bare `id` (EntityReference) instead of `term` (TermRelation). Skips correctly- shaped snapshots written on 1.13+. Stripping is safe: relatedTerms is loaded from entity_relationship at read time post-#25886.
7c22773 to
26220f8
Compare
🟡 Playwright Results — all passed (11 flaky)✅ 3963 passed · ❌ 0 failed · 🟡 11 flaky · ⏭️ 86 skipped
🟡 11 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
…stripping Replace the SQL UPDATE that stripped relatedTerms from entity_extension version snapshots with a Java migration that wraps each legacy EntityReference[] item as TermRelation[] (term + relationType="relatedTo"). Version reads deserialize entity_extension JSON directly without rehydrating from entity_relationship, so a strip would lose history per version. The transform preserves it. Designed for tables with millions of rows: keyset paginated by PK (id, extension), batched updates, idempotent on re-run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ation The previous edit added the comment pointer above the legacy UPDATE entity_extension SET json = JSON_REMOVE(... '$.relatedTerms') block without removing it. On MySQL that SQL would have stripped relatedTerms from version snapshots BEFORE the Java transform runs, defeating the migration and losing related-term history. Postgres was already correct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code Review ✅ Approved 1 resolved / 1 findingsUpdates the migration to transform legacy relatedTerms in glossaryTerm version snapshots, resolving the MySQL SQL stripping issue that previously defeated the Java transformation. ✅ 1 resolved✅ Bug: MySQL SQL still strips relatedTerms, defeating the Java transform
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
…tory after the glossary term realtion changes (#27770) * fix: strip stale relatedTerms from glossary term version snapshots Extends PR #26586. That fix cleaned glossary_term_entity but not the version snapshots in entity_extension, so GET /versions/{v} still 500s on any pre-1.13 term whose relatedTerms had legacy shape: UnrecognizedPropertyException: Unrecognized field "id" (class TermRelation, has only "term" and "relationType") Predicate matches only legacy snapshots — first item has bare `id` (EntityReference) instead of `term` (TermRelation). Skips correctly- shaped snapshots written on 1.13+. Stripping is safe: relatedTerms is loaded from entity_relationship at read time post-#25886. * v1130: transform legacy relatedTerms in version snapshots instead of stripping Replace the SQL UPDATE that stripped relatedTerms from entity_extension version snapshots with a Java migration that wraps each legacy EntityReference[] item as TermRelation[] (term + relationType="relatedTo"). Version reads deserialize entity_extension JSON directly without rehydrating from entity_relationship, so a strip would lose history per version. The transform preserves it. Designed for tables with millions of rows: keyset paginated by PK (id, extension), batched updates, idempotent on re-run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mysql): remove leftover entity_extension strip in v1130 post-migration The previous edit added the comment pointer above the legacy UPDATE entity_extension SET json = JSON_REMOVE(... '$.relatedTerms') block without removing it. On MySQL that SQL would have stripped relatedTerms from version snapshots BEFORE the Java transform runs, defeating the migration and losing related-term history. Postgres was already correct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit 5254855)
|
Cherrypicked to 1.13 |
…tory after the glossary term realtion changes (open-metadata#27770) * fix: strip stale relatedTerms from glossary term version snapshots Extends PR open-metadata#26586. That fix cleaned glossary_term_entity but not the version snapshots in entity_extension, so GET /versions/{v} still 500s on any pre-1.13 term whose relatedTerms had legacy shape: UnrecognizedPropertyException: Unrecognized field "id" (class TermRelation, has only "term" and "relationType") Predicate matches only legacy snapshots — first item has bare `id` (EntityReference) instead of `term` (TermRelation). Skips correctly- shaped snapshots written on 1.13+. Stripping is safe: relatedTerms is loaded from entity_relationship at read time post-open-metadata#25886. * v1130: transform legacy relatedTerms in version snapshots instead of stripping Replace the SQL UPDATE that stripped relatedTerms from entity_extension version snapshots with a Java migration that wraps each legacy EntityReference[] item as TermRelation[] (term + relationType="relatedTo"). Version reads deserialize entity_extension JSON directly without rehydrating from entity_relationship, so a strip would lose history per version. The transform preserves it. Designed for tables with millions of rows: keyset paginated by PK (id, extension), batched updates, idempotent on re-run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mysql): remove leftover entity_extension strip in v1130 post-migration The previous edit added the comment pointer above the legacy UPDATE entity_extension SET json = JSON_REMOVE(... '$.relatedTerms') block without removing it. On MySQL that SQL would have stripped relatedTerms from version snapshots BEFORE the Java transform runs, defeating the migration and losing related-term history. Postgres was already correct. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>



Summary
Fixes #27802 .Extends PR #26586. That fix cleaned
glossary_term_entityof legacyrelatedTermsdata but missed the version history inentity_extension, soGET /glossaryTerms/{id}/versions/{v}still 500s on any pre-1.13 term with non-emptyrelatedTerms:What this PR does
Adds a Java migration
migrateGlossaryTermVersionRelatedTermsToTermRelation(v1130) that transforms each legacyEntityReference[]item into the newTermRelation[]shape —{ "term": <ref>, "relationType": "relatedTo" }— at:relatedTermsfield of eachentity_extensionversion history, andchangeDescription.fieldsAdded[*].newValueandfieldsDeleted[*].oldValue, so version-history diffs continue to render strikethrough/green for related-term changes.The earlier draft of this PR did a SQL strip; that has been replaced with the Java migration. The SQL files now contain a comment pointer to the Java migration.
Why transform instead of strip
Version reads (
EntityRepository.getVersion) deserialize the snapshot JSON directly — noentity_relationshiprehydration, nosetFieldsrepopulation. A strip would silently lose every related-term that ever existed at any historical version. The transform preserves that history exactly.The live-row strip in
glossary_term_entity(from #26586) stays as-is; that read path does rehydrate fromentity_relationship.Scale
Designed for tables with 10M+ rows:
extension_index+ JSON-side legacy-shape predicate (relatedTerms[0].idexists in mysql /jsonb_existsin postgres). Continuous-migration cost on a fully-transformed install: a single short-circuiting index scan, no JSON evaluation, no Java parsing.(id, extension)— no OFFSET cost.PreparedBatchupdates (1 round-trip per 500 rows).Test plan
EntityReference[]shape (script:scripts/seed_legacy_related_terms_v1130.py).pages=3 transformed=1285 skipped=0with zero failures.term+relationType: "relatedTo";termsub-objects preserveid,name,fullyQualifiedName,type,description,displayName,deleted.GET /glossaryTerms/{id}/versions/{v}returns 200 withTermRelation[]shape across root, child, and grandchild terms.