fix(fqn): support double-quotes in fully qualified names + guard/repair corrupt FQNs#28697
fix(fqn): support double-quotes in fully qualified names + guard/repair corrupt FQNs#28697mohityadav766 wants to merge 6 commits into
Conversation
…ir corrupt FQNs
Names containing a double-quote could not be represented in an FQN: the Fqn
grammar had no escape mechanism, yet quoteName() backslash-escaped the quote and
stored an unparseable segment. Building the FQN is a pure string op, so such
values were written successfully (insert hashes only the entity's own FQN); they
then detonated later with a 500 (ParseCancellationException) the first time a
nested FQN was hashed (e.g. a tags read), and were painful to migrate.
Three layered fixes:
- Grammar + quoteName: NAME_WITH_RESERVED now allows any character with '"'
escaped by doubling it (""). quoteName/unquoteName encode/decode accordingly
and are idempotent. Names without a quote encode identically to before, so
existing FQNs and their hashes are unchanged (no reindex/migration needed).
- Ingest guard: FullyQualifiedName.validateFqnName() asserts a name round-trips
through encode->parse->decode, wired into every nested-FQN setter (columns,
pipeline tasks, topic/searchIndex/apiEndpoint fields, mlFeatures). A name that
cannot be hashed is now rejected at ingest with a clear 400 instead of being
stored to fail later.
- Heal-on-read: FullyQualifiedName.isValid() detects legacy-corrupt FQNs;
PipelineRepository repairs unparseable task FQNs on the fly by re-deriving them
from the task name, so existing poisoned data reads cleanly (200) without a
migration. The repair is in-memory and persists on the next update.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes a long-standing FQN corruption path by making " representable inside a quoted FQN segment (via "" escaping), adding server-side validation to reject unhashable nested names at write time, and introducing a heal-on-read repair for legacy-corrupt pipeline task FQNs so reads no longer 500.
Changes:
- Update the ANTLR FQN grammar to support embedded
"in quoted segments using""escaping. - Rework
FullyQualifiedName.quoteName/unquoteName, addvalidateFqnName()andisValid(), and expand unit tests for quote round-trips. - Wire
validateFqnName()into nested-FQN setters across repositories, and add a pipeline task FQN repair step during reads.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-spec/src/main/antlr4/org/openmetadata/schema/Fqn.g4 | Extends quoted-segment grammar to allow escaped quotes via "". |
| openmetadata-service/src/test/java/org/openmetadata/service/util/FullyQualifiedNameTest.java | Adds tests for quote escaping/idempotency, round-trip hashing, and new validation. |
| openmetadata-service/src/main/java/org/openmetadata/service/util/FullyQualifiedName.java | Implements new quote escaping semantics plus validation and parseability checks. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TopicRepository.java | Validates nested field names before deriving/setting field FQNs. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SearchIndexRepository.java | Validates nested field names before deriving/setting field FQNs. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PipelineRepository.java | Validates task names on write and repairs legacy-corrupt task FQNs on read. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/MlModelRepository.java | Validates ML feature/source names before deriving/setting their FQNs. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ContainerRepository.java | Validates container column names before deriving/setting column FQNs. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ColumnUtil.java | Validates column names before deriving/setting column FQNs (used in multiple read/write paths). |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ColumnRepository.java | Validates data-model column names before deriving/setting column FQNs. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/APIEndpointRepository.java | Validates API endpoint field names before deriving/setting field FQNs. |
🔴 Playwright Results — 1 failure(s), 7 flaky✅ 4272 passed · ❌ 1 failed · 🟡 7 flaky · ⏭️ 88 skipped
Genuine Failures (failed on all attempts)❌
|
…n-read
Heal-on-read (PipelineRepository.repairTaskFqns) ran a full ANTLR parse for
every task on every pipeline read to subsidize a finite set of already-corrupt
rows, was incomplete (the bulk/LIST/search path still 500'd), and could NPE on
a null task FQN. Replace it with a one-time migration so the corruption leaves
the stored data and reads pay no per-request cost.
- Remove repairTaskFqns and its setFields() call; keep the validateFqnName
write-path guard that rejects un-representable names at ingest (400).
- Add migration v11211 (mysql + postgres): re-derive task FQNs where !isValid,
persist only when changed.
- Harden FullyQualifiedName.isValid to treat null/empty as invalid (no NPE).
- Require >=1 char inside a quoted FQN segment (grammar + not *), rejecting
empty quoted segments ("").
FullyQualifiedNameTest: 17/17.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
validateFqnName returned early when quoteName(name) was unchanged, letting
empty names through (quoteName("") == ""). An empty pipeline task name (the
schema sets no minLength on task.name) then produced an unhashable empty FQN
segment ("parent.") that 500'd on the next FQN hash -- the same failure class
as unrepresentable names. Treat null/empty as invalid so every nested-FQN
setter (columns, tasks, fields, mlFeatures) rejects them up front with a 400.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
❌ PR checklist incompleteThis PR cannot be merged until the following are addressed on its linked issue:
The fields live on the linked issue in the Shipping project (open the issue → right sidebar → Projects). After you set them, re-run this check (or push a commit) — issue/project changes do not re-trigger it automatically. Maintainers can bypass this check by adding the |
Address review feedback on the one-time repair migration: - Performance: scan pipelines in pages of 1000 via listAfterWithOffset instead of selecting every id and calling findEntityById per pipeline, dropping the N+1 round-trips and the full id list held in memory. Only changed rows are written. - Observability: track scanned/repaired/failed counts and log a prominent WARN with up to 100 pipeline ids that could not be repaired, instead of swallowing each failure as a lone WARN, so operators get a concrete remediation list. - Search: document (completion log + schemaChanges) that repaired task FQNs are reflected in the search index after the standard post-upgrade reindex, matching existing FQN-fix migration behavior. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add MigrationUtilTest for the v11211 repairPipelineTaskFqns migration: repair correctness (re-derive unparseable/null task FQNs, leave valid ones untouched, skip task-less pipelines) and migration-path resilience -- a single unreadable row or a failing update must not abort the upgrade. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| private void givenPipelinePage(String... jsons) { | ||
| when(pipelineDAO.listAfterWithOffset(anyInt(), anyInt())) | ||
| .thenAnswer(inv -> (int) inv.getArgument(1) == 0 ? List.of(jsons) : List.of()); | ||
| } |
There was a problem hiding this comment.
💡 Quality: Migration pagination loop is not covered by any test
All tests return their data only when offset == 0 and an empty list otherwise, so every test exercises a single page. The production repairPipelineTaskFqns loop (MigrationUtil.java:46-57) increments offset += PAGE_SIZE and re-queries until an empty page is returned. Because the mock matches listAfterWithOffset(anyInt(), anyInt()) with arbitrary ints, a regression that swapped the limit/offset arguments, mis-incremented the offset, or terminated early would still pass these tests undetected.
Consider adding a multi-page test that stubs distinct pages by offset (e.g., offset 0 returns a full page, offset 1000 returns a second page, offset 2000 returns empty) and asserts that pipelines from the second page are scanned/repaired. This locks in correct pagination and offset advancement.
This is test-only code, so severity is minor — the existing per-row behavior coverage is good.
Was this helpful? React with 👍 / 👎
Code Review 👍 Approved with suggestions 6 resolved / 7 findingsAdds robust support for double-quoted FQNs, implements ingest-time validation, and includes a heal-on-read mechanism for corrupt legacy data. Consider adding a test case to verify migration pagination behavior beyond the initial result set. 💡 Quality: Migration pagination loop is not covered by any testAll tests return their data only when offset == 0 and an empty list otherwise, so every test exercises a single page. The production Consider adding a multi-page test that stubs distinct pages by offset (e.g., offset 0 returns a full page, offset 1000 returns a second page, offset 2000 returns empty) and asserts that pipelines from the second page are scanned/repaired. This locks in correct pagination and offset advancement. This is test-only code, so severity is minor — the existing per-row behavior coverage is good. ✅ 6 resolved✅ Bug: Heal-on-read missing in bulk read path; list reads still 500
✅ Edge Case: isValid(null)/split(null) can throw NPE instead of healing
✅ Edge Case: quoteName/isQuotedName collapses literal two-quote name to empty
✅ Bug: Migration silently skips corrupt pipelines with no read-time fallback
✅ Bug: Repair migration updates DB blob but does not reindex search
...and 1 more resolved from earlier reviews 🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|



Problem
A name containing a double-quote (
") could not be represented in a fully qualified name. TheFqngrammar had no escape mechanism for", yetquoteName()backslash-escaped it (\") and stored an unparseable FQN segment.Building an FQN is a pure string operation, so these corrupt values were written successfully —
insertonly hashes the entity's own FQN, not nested objects. They then detonated later with a 500 (ParseCancellationException) the first time a nested FQN was hashed (e.g. a tags-inclusive read, or the update path resolving the entity with tags). This was observed on real Mulesoft pipeline task names (e.g...._"agents"_...) and also on customer column names, and the corrupt rows could not even be deleted via the API.Fix (three layers)
1. Grammar +
quoteName— make"representableNAME_WITH_RESERVEDnow allows any character, with"escaped by doubling it (""):QUOTE ( ~["] | '""' )* QUOTE.quoteName/unquoteNameencode/decode with""-doubling and are idempotent."encode identically to before, so existing FQNs and their hashes are unchanged — no reindex/migration required for current data. Verified by the unchanged pre-existingFullyQualifiedNameTestcases.2. Ingest guard — catch the un-representable early
FullyQualifiedName.validateFqnName()asserts a name round-trips through encode→parse→decode, wired into every nested-FQN setter: columns (ColumnUtil,ColumnRepository,ContainerRepository), pipeline tasks, topic/searchIndex/apiEndpoint fields, mlFeatures.3. Heal-on-read — recover legacy poisoned data without a migration
FullyQualifiedName.isValid()detects legacy-corrupt FQNs (parse fails).PipelineRepository.repairTaskFqns()re-derives an unparseable task FQN from the tasknameon the fly, so existing poisoned rows read cleanly (200) again. The repair is in-memory (no write-amplification) and persists naturally on the next update.Validation
FullyQualifiedNameTest: 16/16 green (incl. quote round-trip +validateFqnName).400 Invalid name ..."agents"..., nothing persisted.200with the FQN repaired to the""-escaped form; stored blob untouched until next write.Notes
isValid+ re-derive pattern generalizes to columns/fields/features if we want it.Fqn.g4at build time, so they pick up the grammar change automatically.🤖 Generated with Claude Code