Skip to content

feat(search): consumer-field contract test + index mapping health check#28751

Draft
pmbrull wants to merge 3 commits into
mainfrom
pmbrull/search-indexing-audit
Draft

feat(search): consumer-field contract test + index mapping health check#28751
pmbrull wants to merge 3 commits into
mainfrom
pmbrull/search-indexing-audit

Conversation

@pmbrull
Copy link
Copy Markdown
Member

@pmbrull pmbrull commented Jun 5, 2026

Describe your changes:

No tracking issue yet — opened from a search-indexing reliability audit. Please link an issue before this leaves draft.

Search denormalizes a set of fields (tags.tagFQN, tier, certification, owners, domains, fqnParts, testCaseResult.testCaseStatus, …) into every entity document, and RBAC, Data Quality, Incident Manager, Lineage, and Data Insights query them directly — so a mapping change that renames, retypes, or drops one breaks those features silently, with no compile- or boot-time failure. This PR adds two guards against that, plus the bug the guards immediately surfaced.

Type of change:

  • Bug fix
  • Improvement

High-level design:

  • SearchConsumerFields — one source of truth for the consumer contract (denormalized leaf fields + required type, and the core data-asset canary set), referenced by both guards so the contract lives in one place.
  • SearchConsumerFieldContractTest (CI) — over every entity mapping × 4 languages: (1) the denormalized leaf fields keep their keyword type wherever present, and (2) core data-asset indexes expose the full denormalized set. Failures name the affected consumer.
  • "Index Mapping Consistency" health check — a non-blocking StepValidation added to SystemRepository.validateSystem() (the existing /system/validate Search Instance check). It reads the live deployed mapping (new read-only getIndexFieldNames on SearchClient → Elasticsearch/OpenSearch index managers) and warns "reindex required" when a deployed core index is missing these fields (e.g. left stale after an upgrade). Chosen over a boot-time fail-fast — which would block the very reindex needed to fix it — and over the hash version-table, which is CLI-only-populated and would false-positive on UI reindexes.
  • Fix: the jp/topic index mapping was missing top-level domains (the new test caught it), so JP-locale topic search had lost domain-based RBAC and DQ/Incident filtering. Added to match en/ru/zh.

Tests:

Use cases covered

  • A mapping change that retypes/renames/drops a shared consumer field fails CI, naming the affected consumer.
  • /system/validate reports a non-blocking warning when a deployed core data-asset index is missing denormalized fields.

Unit tests

  • Added. SearchConsumerFieldContractTest (2) and SystemRepositoryMappingConsistencyTest (4); existing IndexMappingNestedFieldConsistencyTest still green. All 9 pass; mvn -pl openmetadata-service spotless:check clean.

Backend integration tests

  • Not applicable — no new API endpoint; the validateSystem change is covered by the unit test.

Ingestion integration tests

  • Not applicable (no ingestion changes).

Playwright (UI) tests

  • Not applicable (no UI changes).

Manual testing performed

  1. mvn -pl openmetadata-service test -Dtest='SearchConsumerFieldContractTest,SystemRepositoryMappingConsistencyTest,IndexMappingNestedFieldConsistencyTest' → 9/9 pass.
  2. mvn -pl openmetadata-service spotless:check → clean.

UI screen recording / screenshots:

Not applicable.

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • My PR is linked to a GitHub issue via Fixes #<issue-number> above.
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: not needed — the change is an additive Elasticsearch index-mapping field; existing indexes pick it up on the next reindex, which the new health check now surfaces.
  • For UI changes: not applicable.
  • I have added tests (unit) and listed them above.

Guard the denormalized search-index fields that RBAC, Data Quality, Incident
Manager, Lineage, and Data Insights query (tags.tagFQN, tier, certification,
owners, domains, fqnParts, testCaseStatus, ...). Renaming, retyping, or dropping
one silently breaks those consumers with no compile- or boot-time failure.

- SearchConsumerFields: single source of truth for the consumer contract.
- SearchConsumerFieldContractTest: fails CI on a type change or a dropped field
  in any mapping (across all 4 languages), naming the affected consumers.
- "Index Mapping Consistency" StepValidation on /system/validate: non-blocking
  health check that reads the live deployed mapping and warns "reindex required"
  when a core data-asset index is missing these fields (e.g. stale after upgrade).
  Adds a read-only getIndexFieldNames to the SearchClient (Elasticsearch + OpenSearch).
- Fix jp/topic mapping missing top-level `domains` (found by the new test) —
  JP-locale topic search had lost domain-based RBAC/DQ filtering.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Jun 5, 2026
Comment on lines +871 to +885
private StepValidation getMappingConsistencyValidation() {
StepValidation step =
new StepValidation().withDescription(ValidationStepDescription.SEARCH_MAPPING.key);
SearchRepository searchRepository = Entity.getSearchRepository();
StepValidation result;
if (searchRepository.getSearchClient().isClientAvailable()) {
List<String> inconsistent = findIndexesWithMissingConsumerFields(searchRepository);
result =
step.withPassed(inconsistent.isEmpty())
.withMessage(buildMappingConsistencyMessage(inconsistent));
} else {
result =
step.withPassed(Boolean.TRUE).withMessage("Skipped: search instance is not reachable.");
}
return result;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Performance: validateSystem issues up to 10 uncached getMapping round-trips

getMappingConsistencyValidation is invoked on every /system/validate call and performs one synchronous getMapping network round-trip per entry in CANARY_DATA_ASSET_ENTITIES (10 indexes), with no caching. This is acceptable for a manually triggered admin endpoint, but if /system/validate is ever polled by monitoring/health tooling the added latency and ES/OS load become noticeable. Consider a short-lived cache of the result (similar to the existing HEALTH_CHECK_CACHE_MS pattern in the search clients) or documenting that this endpoint is admin-only/low-frequency.

Was this helpful? React with 👍 / 👎

pmbrull and others added 2 commits June 5, 2026 15:13
…ract

Index a document into the real en/jp/ru/zh index mapping for topic and
test_case, then run the actual search/filter/aggregation each feature uses
(tag/tier/certification/domain term filters, owners nested RBAC query,
testCaseResult.testCaseStatus aggregation, fqnParts term) and assert it returns
the document. A failure reads as a broken feature in a specific language, not
"index is missing a mapping" — and would have caught the jp/topic domains drop.

Runs against a real OpenSearch testcontainer. Text analyzers are stripped before
index creation (the consumer fields are keyword/nested and analyzer-independent)
so each language index is creatable without analysis plugins while the real
field structure is still validated per language.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…zers

30 of 55 jp index mappings referenced an analyzer name the file does not define:
an incomplete om_analyzer -> om_analyzer_jp rename left field-level analyzer /
search_analyzer references pointing at the other name. OpenSearch/Elasticsearch
reject index creation with "analyzer [...] has not been configured", so those jp
indexes could not be built at all — independent of the analysis-kuromoji plugin
requirement (a separate deployment concern for jp).

Repoint each dangling reference to the analyzer that file actually defines:
behavior-preserving, only analyzer reference strings change (no field, type, or
analyzer definition is touched). Files that define both analyzers are left as-is.
Surfaced by SearchConsumerFieldBehaviorIT.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment on lines 93 to +95
"name": {
"type": "text",
"analyzer": "om_analyzer_jp",
"analyzer": "om_analyzer",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Edge Case: jp/api_collection repointed to English analyzer, not kuromoji

This commit repoints analyzer references the right way for almost every jp index: those files define om_analyzer_jp (kuromoji) in settings.analysis.analyzer but had dangling field references to om_analyzer, so they were correctly changed to om_analyzer_jp. jp/api_collection_index_mapping.json is the lone exception — it went the opposite direction (om_analyzer_jp -> om_analyzer) on name, displayName, and description.

Root cause: unlike every other jp mapping, api_collection's settings block was never localized — it still defines om_analyzer with the English kstem stemmer (lines 27-56) rather than om_analyzer_jp with the kuromoji tokenizer/filters that topic, container, glossary, etc. define. Because the only analyzer actually declared in this file is om_analyzer, the fix made the field references consistent with the (un-localized) settings rather than fixing the settings.

Impact: the dangling-reference fix is correct (referencing an undefined om_analyzer_jp would have failed index creation), but the result is that JP full-text search over API Collections (name/displayName/description) silently uses English tokenization instead of Japanese — inconsistent with the rest of the jp locale. The denormalized consumer fields (keyword/nested) are unaffected, so RBAC/DQ still work; this is a search-quality regression for Japanese text on this one index.

Suggested fix: localize api_collection's settings to define om_analyzer_jp (kuromoji) matching the other jp indexes, and keep the three field references at om_analyzer_jp rather than downgrading them to om_analyzer.

Was this helpful? React with 👍 / 👎

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Jun 5, 2026

Code Review ⚠️ Changes requested 1 resolved / 3 findings

Implements a search consumer contract test and index mapping health check to prevent breaking changes in denormalized fields. The implementation performs excessive synchronous network round-trips during system validation and contains a misconfiguration where jp/api_collection uses the English instead of the kuromoji analyzer.

⚠️ Edge Case: jp/api_collection repointed to English analyzer, not kuromoji

📄 openmetadata-spec/src/main/resources/elasticsearch/jp/api_collection_index_mapping.json:27-35 📄 openmetadata-spec/src/main/resources/elasticsearch/jp/api_collection_index_mapping.json:93-95 📄 openmetadata-spec/src/main/resources/elasticsearch/jp/api_collection_index_mapping.json:119-121 📄 openmetadata-spec/src/main/resources/elasticsearch/jp/api_collection_index_mapping.json:142-144

This commit repoints analyzer references the right way for almost every jp index: those files define om_analyzer_jp (kuromoji) in settings.analysis.analyzer but had dangling field references to om_analyzer, so they were correctly changed to om_analyzer_jp. jp/api_collection_index_mapping.json is the lone exception — it went the opposite direction (om_analyzer_jp -> om_analyzer) on name, displayName, and description.

Root cause: unlike every other jp mapping, api_collection's settings block was never localized — it still defines om_analyzer with the English kstem stemmer (lines 27-56) rather than om_analyzer_jp with the kuromoji tokenizer/filters that topic, container, glossary, etc. define. Because the only analyzer actually declared in this file is om_analyzer, the fix made the field references consistent with the (un-localized) settings rather than fixing the settings.

Impact: the dangling-reference fix is correct (referencing an undefined om_analyzer_jp would have failed index creation), but the result is that JP full-text search over API Collections (name/displayName/description) silently uses English tokenization instead of Japanese — inconsistent with the rest of the jp locale. The denormalized consumer fields (keyword/nested) are unaffected, so RBAC/DQ still work; this is a search-quality regression for Japanese text on this one index.

Suggested fix: localize api_collection's settings to define om_analyzer_jp (kuromoji) matching the other jp indexes, and keep the three field references at om_analyzer_jp rather than downgrading them to om_analyzer.

💡 Performance: validateSystem issues up to 10 uncached getMapping round-trips

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:871-885

getMappingConsistencyValidation is invoked on every /system/validate call and performs one synchronous getMapping network round-trip per entry in CANARY_DATA_ASSET_ENTITIES (10 indexes), with no caching. This is acceptable for a manually triggered admin endpoint, but if /system/validate is ever polled by monitoring/health tooling the added latency and ES/OS load become noticeable. Consider a short-lived cache of the result (similar to the existing HEALTH_CHECK_CACHE_MS pattern in the search clients) or documenting that this endpoint is admin-only/low-frequency.

✅ 1 resolved
Edge Case: Health check treats unreadable/empty live mappings as healthy

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:920-934 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManager.java:67-81
getIndexFieldNames swallows any exception and returns an empty set, and collectMissingConsumerFields skips an index entirely when liveFields.isEmpty(). This intentionally avoids double-reporting a not-yet-created index, but it also means a transient failure reading a deployed index's mapping (cluster timeout, permission error, partial outage) is indistinguishable from "index absent" — both result in the index being skipped and the overall step reporting passed=true ("All core data asset indexes expose the fields..."). A genuinely stale/broken index can therefore be masked behind a green check whenever the mapping read fails. Since this is a non-blocking advisory check the impact is limited, but consider distinguishing "index missing" from "mapping read failed" (e.g. surface a separate "could not verify N indexes" note) so transient errors don't produce a false all-clear.

🤖 Prompt for agents
Code Review: Implements a search consumer contract test and index mapping health check to prevent breaking changes in denormalized fields. The implementation performs excessive synchronous network round-trips during system validation and contains a misconfiguration where jp/api_collection uses the English instead of the kuromoji analyzer.

1. 💡 Performance: validateSystem issues up to 10 uncached getMapping round-trips
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java:871-885

   `getMappingConsistencyValidation` is invoked on every `/system/validate` call and performs one synchronous `getMapping` network round-trip per entry in `CANARY_DATA_ASSET_ENTITIES` (10 indexes), with no caching. This is acceptable for a manually triggered admin endpoint, but if `/system/validate` is ever polled by monitoring/health tooling the added latency and ES/OS load become noticeable. Consider a short-lived cache of the result (similar to the existing `HEALTH_CHECK_CACHE_MS` pattern in the search clients) or documenting that this endpoint is admin-only/low-frequency.

2. ⚠️ Edge Case: jp/api_collection repointed to English analyzer, not kuromoji
   Files: openmetadata-spec/src/main/resources/elasticsearch/jp/api_collection_index_mapping.json:27-35, openmetadata-spec/src/main/resources/elasticsearch/jp/api_collection_index_mapping.json:93-95, openmetadata-spec/src/main/resources/elasticsearch/jp/api_collection_index_mapping.json:119-121, openmetadata-spec/src/main/resources/elasticsearch/jp/api_collection_index_mapping.json:142-144

   This commit repoints analyzer references the right way for almost every jp index: those files define `om_analyzer_jp` (kuromoji) in `settings.analysis.analyzer` but had dangling field references to `om_analyzer`, so they were correctly changed to `om_analyzer_jp`. `jp/api_collection_index_mapping.json` is the lone exception — it went the opposite direction (`om_analyzer_jp` -> `om_analyzer`) on `name`, `displayName`, and `description`.
   
   Root cause: unlike every other jp mapping, `api_collection`'s settings block was never localized — it still defines `om_analyzer` with the English `kstem` stemmer (lines 27-56) rather than `om_analyzer_jp` with the kuromoji tokenizer/filters that `topic`, `container`, `glossary`, etc. define. Because the only analyzer actually declared in this file is `om_analyzer`, the fix made the field references consistent with the (un-localized) settings rather than fixing the settings.
   
   Impact: the dangling-reference fix is correct (referencing an undefined `om_analyzer_jp` would have failed index creation), but the result is that JP full-text search over API Collections (`name`/`displayName`/`description`) silently uses English tokenization instead of Japanese — inconsistent with the rest of the jp locale. The denormalized consumer fields (keyword/nested) are unaffected, so RBAC/DQ still work; this is a search-quality regression for Japanese text on this one index.
   
   Suggested fix: localize `api_collection`'s settings to define `om_analyzer_jp` (kuromoji) matching the other jp indexes, and keep the three field references at `om_analyzer_jp` rather than downgrading them to `om_analyzer`.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant