Skip to content

fix migration: update legacy relatedTerms in glossaryTerm version history after the glossary term realtion changes#27770

Merged
sonika-shah merged 3 commits intomainfrom
fix/strip-related-terms-entity-extension
Apr 29, 2026
Merged

fix migration: update legacy relatedTerms in glossaryTerm version history after the glossary term realtion changes#27770
sonika-shah merged 3 commits intomainfrom
fix/strip-related-terms-entity-extension

Conversation

@sonika-shah
Copy link
Copy Markdown
Collaborator

@sonika-shah sonika-shah commented Apr 27, 2026

Summary

Fixes #27802 .Extends PR #26586. That fix cleaned glossary_term_entity of legacy relatedTerms data but missed the version history in entity_extension, so GET /glossaryTerms/{id}/versions/{v} still 500s on any pre-1.13 term with non-empty relatedTerms:

UnrecognizedPropertyException: Unrecognized field "id" (class TermRelation)

What this PR does

Adds a Java migration migrateGlossaryTermVersionRelatedTermsToTermRelation (v1130) that transforms each legacy EntityReference[] item into the new TermRelation[] shape — { "term": <ref>, "relationType": "relatedTo" } — at:

  1. the top-level relatedTerms field of each entity_extension version history, and
  2. the JSON value strings inside changeDescription.fieldsAdded[*].newValue and fieldsDeleted[*].oldValue, so version-history diffs continue to render strikethrough/green for related-term changes.

The earlier draft of this PR did a SQL strip; that has been replaced with the Java migration. The SQL files now contain a comment pointer to the Java migration.

Why transform instead of strip

Version reads (EntityRepository.getVersion) deserialize the snapshot JSON directly — no entity_relationship rehydration, no setFields repopulation. A strip would silently lose every related-term that ever existed at any historical version. The transform preserves that history exactly.

The live-row strip in glossary_term_entity (from #26586) stays as-is; that read path does rehydrate from entity_relationship.

Scale

Designed for tables with 10M+ rows:

  • Pre-filters at SQL using extension_index + JSON-side legacy-shape predicate (relatedTerms[0].id exists in mysql / jsonb_exists in postgres). Continuous-migration cost on a fully-transformed install: a single short-circuiting index scan, no JSON evaluation, no Java parsing.
  • Keyset pagination on PK (id, extension) — no OFFSET cost.
  • PreparedBatch updates (1 round-trip per 500 rows).
  • Per-row try/catch isolates malformed snapshots; cursor advances even on caught errors so a poison-pill row can't loop forever.
  • Idempotent: re-runs on already-transformed rows do shape-detection-then-skip with zero DB writes.

Test plan

  • Seeded a 1.12.7 install with 500 glossary terms across glossary → root → child → grandchild levels, 3 update passes each, ~1500 version history(s) in legacy EntityReference[] shape (script: scripts/seed_legacy_related_terms_v1130.py).
  • Upgraded to 1.13.0; migration ran cleanly: pages=3 transformed=1285 skipped=0 with zero failures.
  • Verified zero data loss in DB: items per snapshot match pre-migration counts; every wrapped item has term + relationType: "relatedTo"; term sub-objects preserve id, name, fullyQualifiedName, type, description, displayName, deleted.
  • GET /glossaryTerms/{id}/versions/{v} returns 200 with TermRelation[] shape across root, child, and grandchild terms.
  • Version-history UI renders correctly with strikethrough/green diff annotations for related-term changes on covered snapshots.

Copilot AI review requested due to automatic review settings April 27, 2026 13:26
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 27, 2026
@sonika-shah sonika-shah force-pushed the fix/strip-related-terms-entity-extension branch from 1c79286 to 7c22773 Compare April 27, 2026 13:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the existing 1.13.0 post-data-migration to also remove legacy relatedTerms data from glossary term version snapshots stored in entity_extension, preventing Jackson deserialization failures (500s) when requesting historical term versions.

Changes:

  • Add a Postgres migration step to strip relatedTerms from entity_extension rows with extension matching glossaryTerm.version.%.
  • Add a MySQL migration step to strip relatedTerms from the same entity_extension version snapshot rows.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
bootstrap/sql/migrations/native/1.13.0/postgres/postDataMigrationSQLScript.sql Removes relatedTerms from glossary term version snapshot JSON in entity_extension for Postgres.
bootstrap/sql/migrations/native/1.13.0/mysql/postDataMigrationSQLScript.sql Removes relatedTerms from glossary term version snapshot JSON in entity_extension for MySQL.

Comment thread bootstrap/sql/migrations/native/1.13.0/postgres/postDataMigrationSQLScript.sql Outdated
Comment thread bootstrap/sql/migrations/native/1.13.0/mysql/postDataMigrationSQLScript.sql Outdated
@sonika-shah sonika-shah changed the title fix: strip stale relatedTerms from glossary term version snapshots fix migration: strip stale relatedTerms from glossary term version snapshots Apr 27, 2026
@sonika-shah sonika-shah changed the title fix migration: strip stale relatedTerms from glossary term version snapshots fix migration: strip stale relatedTerms from glossary term version history Apr 27, 2026
Extends PR #26586. That fix cleaned glossary_term_entity but not the
version snapshots in entity_extension, so GET /versions/{v} still
500s on any pre-1.13 term whose relatedTerms had legacy shape:

  UnrecognizedPropertyException: Unrecognized field "id"
  (class TermRelation, has only "term" and "relationType")

Predicate matches only legacy snapshots — first item has bare `id`
(EntityReference) instead of `term` (TermRelation). Skips correctly-
shaped snapshots written on 1.13+.

Stripping is safe: relatedTerms is loaded from entity_relationship at
read time post-#25886.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

🟡 Playwright Results — all passed (11 flaky)

✅ 3963 passed · ❌ 0 failed · 🟡 11 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
🟡 Shard 2 739 0 5 8
🟡 Shard 3 747 0 1 7
✅ Shard 4 759 0 0 18
✅ Shard 5 687 0 0 41
🟡 Shard 6 733 0 4 8
🟡 11 flaky test(s) (passed on retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
  • Features/DataQuality/ColumnLevelTests.spec.ts › Column Value Median To Be Between (shard 2, 1 retry)
  • Features/DomainFilterQueryFilter.spec.ts › Domain filter should persist across page navigation (shard 2, 1 retry)
  • Features/DomainFilterQueryFilter.spec.ts › Domain filter should use exact match and prefix with dot to prevent false positives (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Pages/Lineage/DataAssetLineage.spec.ts › Column lineage for mlModel -> container (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/ServiceEntity.spec.ts › Tier Add, Update and Remove (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Create team with domain and verify visibility of inherited domain in user profile after team removal (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

…stripping

Replace the SQL UPDATE that stripped relatedTerms from entity_extension
version snapshots with a Java migration that wraps each legacy
EntityReference[] item as TermRelation[] (term + relationType="relatedTo").

Version reads deserialize entity_extension JSON directly without
rehydrating from entity_relationship, so a strip would lose history per
version. The transform preserves it.

Designed for tables with millions of rows: keyset paginated by
PK (id, extension), batched updates, idempotent on re-run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 28, 2026 21:41
Comment thread bootstrap/sql/migrations/native/1.13.0/mysql/postDataMigrationSQLScript.sql Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread bootstrap/sql/migrations/native/1.13.0/mysql/postDataMigrationSQLScript.sql Outdated
…ation

The previous edit added the comment pointer above the legacy
UPDATE entity_extension SET json = JSON_REMOVE(... '$.relatedTerms') block
without removing it. On MySQL that SQL would have stripped relatedTerms
from version snapshots BEFORE the Java transform runs, defeating the
migration and losing related-term history. Postgres was already correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonika-shah sonika-shah changed the title fix migration: strip stale relatedTerms from glossary term version history fix migration: transform legacy relatedTerms in glossaryTerm version snapshots Apr 28, 2026
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 28, 2026

Code Review ✅ Approved 1 resolved / 1 findings

Updates the migration to transform legacy relatedTerms in glossaryTerm version snapshots, resolving the MySQL SQL stripping issue that previously defeated the Java transformation.

✅ 1 resolved
Bug: MySQL SQL still strips relatedTerms, defeating the Java transform

📄 bootstrap/sql/migrations/native/1.13.0/mysql/postDataMigrationSQLScript.sql:85-90
The MySQL postDataMigrationSQLScript.sql still contains the old SQL UPDATE entity_extension SET json = JSON_REMOVE(json, '$.relatedTerms') at lines 87-90. This SQL runs before the Java migration (runDataMigration is called after SQL scripts), so it will strip all relatedTerms from version snapshots before the Java code ever gets a chance to transform them.

The Postgres file correctly removed the equivalent SQL (visible in the diff as deleted lines), but the MySQL file only added the comment without removing the old SQL. This means:

  • On MySQL: related-term history is silently lost (the exact outcome the PR aims to prevent).
  • On Postgres: works as intended (transform preserves history).

Remove lines 85-90 from the MySQL SQL file, matching what was done for Postgres.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonika-shah sonika-shah changed the title fix migration: transform legacy relatedTerms in glossaryTerm version snapshots fix migration: update legacy relatedTerms in glossaryTerm version history after the glossary term realtion changes Apr 28, 2026
@sonika-shah sonika-shah enabled auto-merge (squash) April 28, 2026 22:32
@sonika-shah sonika-shah merged commit 5254855 into main Apr 29, 2026
76 of 82 checks passed
@sonika-shah sonika-shah deleted the fix/strip-related-terms-entity-extension branch April 29, 2026 03:35
@sonarqubecloud
Copy link
Copy Markdown

sonika-shah added a commit that referenced this pull request Apr 29, 2026
…tory after the glossary term realtion changes (#27770)

* fix: strip stale relatedTerms from glossary term version snapshots

Extends PR #26586. That fix cleaned glossary_term_entity but not the
version snapshots in entity_extension, so GET /versions/{v} still
500s on any pre-1.13 term whose relatedTerms had legacy shape:

  UnrecognizedPropertyException: Unrecognized field "id"
  (class TermRelation, has only "term" and "relationType")

Predicate matches only legacy snapshots — first item has bare `id`
(EntityReference) instead of `term` (TermRelation). Skips correctly-
shaped snapshots written on 1.13+.

Stripping is safe: relatedTerms is loaded from entity_relationship at
read time post-#25886.

* v1130: transform legacy relatedTerms in version snapshots instead of stripping

Replace the SQL UPDATE that stripped relatedTerms from entity_extension
version snapshots with a Java migration that wraps each legacy
EntityReference[] item as TermRelation[] (term + relationType="relatedTo").

Version reads deserialize entity_extension JSON directly without
rehydrating from entity_relationship, so a strip would lose history per
version. The transform preserves it.

Designed for tables with millions of rows: keyset paginated by
PK (id, extension), batched updates, idempotent on re-run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(mysql): remove leftover entity_extension strip in v1130 post-migration

The previous edit added the comment pointer above the legacy
UPDATE entity_extension SET json = JSON_REMOVE(... '$.relatedTerms') block
without removing it. On MySQL that SQL would have stripped relatedTerms
from version snapshots BEFORE the Java transform runs, defeating the
migration and losing related-term history. Postgres was already correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit 5254855)
@sonika-shah
Copy link
Copy Markdown
Collaborator Author

Cherrypicked to 1.13

jatinmasaram pushed a commit to jatinmasaram/OpenMetadata that referenced this pull request May 2, 2026
…tory after the glossary term realtion changes (open-metadata#27770)

* fix: strip stale relatedTerms from glossary term version snapshots

Extends PR open-metadata#26586. That fix cleaned glossary_term_entity but not the
version snapshots in entity_extension, so GET /versions/{v} still
500s on any pre-1.13 term whose relatedTerms had legacy shape:

  UnrecognizedPropertyException: Unrecognized field "id"
  (class TermRelation, has only "term" and "relationType")

Predicate matches only legacy snapshots — first item has bare `id`
(EntityReference) instead of `term` (TermRelation). Skips correctly-
shaped snapshots written on 1.13+.

Stripping is safe: relatedTerms is loaded from entity_relationship at
read time post-open-metadata#25886.

* v1130: transform legacy relatedTerms in version snapshots instead of stripping

Replace the SQL UPDATE that stripped relatedTerms from entity_extension
version snapshots with a Java migration that wraps each legacy
EntityReference[] item as TermRelation[] (term + relationType="relatedTo").

Version reads deserialize entity_extension JSON directly without
rehydrating from entity_relationship, so a strip would lose history per
version. The transform preserves it.

Designed for tables with millions of rows: keyset paginated by
PK (id, extension), batched updates, idempotent on re-run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(mysql): remove leftover entity_extension strip in v1130 post-migration

The previous edit added the comment pointer above the legacy
UPDATE entity_extension SET json = JSON_REMOVE(... '$.relatedTerms') block
without removing it. On MySQL that SQL would have stripped relatedTerms
from version snapshots BEFORE the Java transform runs, defeating the
migration and losing related-term history. Postgres was already correct.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(migration): legacy relatedTerms in v1130 glossaryTerm version snapshots break /versions/{v}

3 participants