Fix #20052: Add tier field mapping with lowercase_normalizer to DI index and combine filters by manerow · Pull Request #26177 · open-metadata/OpenMetadata

manerow · 2026-03-02T10:50:00Z

Fixes #20052

Summary

Tier field mapping for DI index (main fix)

The Data Insights index (di-data-assets-*) was missing an explicit tier field mapping in its component template
The tier field was dynamically mapped by Elasticsearch as plain text + keyword (case-sensitive), while the Advanced Search QueryBuilder returns lowercase values (e.g., tier.tier1) because entity search indexes use lowercase_normalizer
This casing mismatch (tier.tier1 ≠ Tier.Tier1) caused tier filters from the Custom Dashboard chart builder's Advanced Search to always return 0 results
Added explicit tier mapping with lowercase_normalizer on the keyword subfield to both Elasticsearch and OpenSearch DI index templates
Existing deployments will need a DI data reindex for the new mapping to take effect

Combine user filter with existing metric filter (additional fix)

DataInsightSystemChartRepository.getPreviewData() and listChartData() replaced the metric's pre-existing filter (e.g., exclude_tags_filter) with the user-provided filter instead of combining them
Several system charts (all summary cards and line charts like percentage_of_data_asset_with_description) have a built-in exclude_tags_filter that excludes tag, glossaryTerm, and dataProduct entity types — this exclusion was silently dropped when a user applied a tier/team filter
Added a combineFilters() method that wraps both the existing metric filter and the user-provided filter in a bool.must clause, ensuring both are applied simultaneously

Note on testing and deployment

The DI component template is only created when the data stream doesn't exist. Since the DI data streams already exist from a previous deployment, the template (with the new tier mapping) is never updated — createOrUpdateDataAssetsDataStream() skips template updates when data streams already exist.

To test the fix on an existing deployment, you need to re-run the Data Insights app with "Recreate Data Assets Index" enabled from the UI (Settings > Applications > Data Insights > Configure > enable Recreate). This will delete the old data streams and re-create them with the updated template.

...a-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java

openmetadata-service/src/main/resources/dataInsights/elasticsearch/indexMappingsTemplate.json

...a-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java

…nsights charts

…dex templates

gitar-bot · 2026-03-05T08:15:06Z

🔍 CI failure analysis for b7d20a7: All CI failures are pre-existing infrastructure/flakiness issues unrelated to this PR's changes (tier field mapping and combineFilters logic in Data Insights).

Issue

All CI failures are unrelated to this PR's changes (Data Insights tier field mapping and combineFilters() in DataInsightSystemChartRepository.java).

Root Cause

`playwright-ci-postgresql (4, 6)` — Resource exhaustion after long parallel test run

After 691+ tests pass over ~1.4 hours, failures occur from browser context crashes and search API timeouts:

Entity.spec.ts:2137:7 - page.waitForResponse: Target page, context or browser has been closed
NotificationAlerts.spec.ts:440:1 - page.goto: Timeout 60000ms exceeded
Customproperties-part2.spec.ts:187:9 - Test timeout of 180000ms waiting for /api/v1/search/query response

This is consistent with gradual resource exhaustion during a long parallel test run, not a code regression. The PersonaFlow test data mismatch (PW Persona 16a290d0 vs PW Persona 3671cf98) further indicates race conditions in test setup.

`integration-tests-postgres-opensearch` / `integration-tests-mysql-elasticsearch` — DJL/vector service infrastructure failure

org.openmetadata.service.search.vector.client.DjlEmbeddingClient$EmbeddingGenerationException:
  Unexpected error during DJL batch embedding generation
Caused by: java.lang.IllegalStateException: NDManager has been closed already.

and search indexing latency timeouts:

org.awaitility.core.ConditionTimeoutException: Condition with alias 'Wait for entity to appear in search index'
  didn't complete within 1 minutes 30 seconds

This is a pre-existing CI infrastructure issue with the DJL NDManager lifecycle and connection pool shutdown during concurrent test execution.

Other pre-existing/unrelated failures

maven-collate-ci — External downstream build failure in openmetadata-collate repository.
maven-sonarcloud-ci — SonarCloud token/configuration issue (Project not found).
py-run-tests (3.10) — Pre-existing Oracle unit test failures (test_oracle.py); this PR doesn't touch the ingestion layer.
Integration Tests (shard-1, 3.10) — Transient network timeout downloading IBM iAccess ODBC driver during Docker build.
Test Report — Downstream consequence of integration test failures above.

Details

This PR only modifies:

DataInsightSystemChartRepository.java — combineFilters() method
dataInsights/elasticsearch/indexMappingsTemplate.json — tier field mapping
dataInsights/opensearch/indexMappingsTemplate.json — tier field mapping

None of the failing jobs involve Data Insights charts, tier field mappings, or the combineFilters logic.

Code Review ⚠️ Changes requested 3 resolved / 4 findings

Adds tier field mapping with lowercase_normalizer to the DI index and combines filters, resolving the filter accumulation issue. However, chart.setChartDetails() still mutates the cached entity object, which needs to be addressed before merge.

⚠️

Bug: chart.setChartDetails() still mutates the cached entity object

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:736 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:780

The code now correctly creates shallow copies of chartDetails and metrics maps (good fix for the inner mutation), but still calls chart.setChartDetails(chartDetailsCopy) on the entity object retrieved from the cache via Entity.getEntityByName().

Since chart is the cached reference, calling setChartDetails() on it replaces the cached entity's chart details. On the next request:

chart.getChartDetails() returns the previously modified copy (with already-combined filters)
metrics.get("filter") retrieves the already-combined filter string
combineFilters wraps it again in another bool.must

This causes nested bool.must accumulation across requests, eventually producing deeply nested or bloated Elasticsearch queries.

Fix: Clone the chart object itself before modifying it, so the cached entity is never touched. Alternatively, work with the copies without calling setChartDetails — build a local chart-details map and pass it through a different code path.

Suggested fix

        // Clone the chart to avoid mutating the cached entity
        DataInsightCustomChart chartCopy = chart; // replace with proper deep copy
        HashMap<String, Object> chartDetailsCopy = new LinkedHashMap<>(chartDetails);
        chartDetailsCopy.put("metrics", metricsCopies);
        chartCopy = JsonUtils.readValue(
            JsonUtils.pojoToJson(chart), DataInsightCustomChart.class);
        chartCopy.setChartDetails(chartDetailsCopy);

✅ 3 resolved

✅ Bug: In-place mutation of cached entity causes filter accumulation

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:726 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:762
Both getPreviewData and listChartData mutate the chartDetails map of the entity object in place via metrics.put("filter", combineFilters(...)). In listChartData, the entity is fetched via Entity.getEntityByName(...) which defaults to fromCache=true, meaning it returns a cached instance.

Because the cached entity is mutated, each call permanently modifies the filter in the cache. On the next call:

existingFilter is read from the now-mutated cached entity (which already contains the previously combined filter)

combineFilters wraps it again with the new user filter, creating nested bool.must clauses

This grows unboundedly with each request

For example, after 3 calls with filter F, the cached metric filter becomes:
must[must[must[original, F], F], F]

This also causes cross-request leakage — one user's filter could appear in another user's query results if they hit the same cached entity.

Fix: Deep-copy the chart (or at least chartDetails) before mutating, so the cached entity remains unchanged. Alternatively, pass the combined filter downstream without modifying the entity's map.

✅ Edge Case: combineFilters discards non-query top-level fields from both inputs

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:681
When both filters are non-empty and valid, combineFilters extracts only the query field from each filter JSON (lines 692-693) and builds a new result containing only {"query": ...}. Any other top-level fields present in either input (e.g., hypothetical size, _source, sort, aggs) would be silently dropped.

Currently, the known filter format (e.g., exclude_tags_filter in MigrationUtil.java) only uses a top-level query field, so this isn't a problem today. However, if the filter schema evolves to include additional fields, this method would silently discard them. Consider either:

Documenting that filters must only contain a query field, or

Copying non-query fields from the existing filter into the result to be more resilient to future changes.

This is a low-risk observation since the current contract appears consistent.

✅ Bug: Missing lowercase_normalizer definition will break index creation

📄 openmetadata-service/src/main/resources/dataInsights/elasticsearch/indexMappingsTemplate.json:26 📄 openmetadata-service/src/main/resources/dataInsights/opensearch/indexMappingsTemplate.json:26
The tier.keyword field references "normalizer": "lowercase_normalizer", but this normalizer is never defined in the index settings. Neither indexMappingsTemplate.json (which only has max_ngram_diff: 17) nor indexSettingsTemplate.json (which only has lifecycle policy) include the required settings.analysis.normalizer section.

Elasticsearch/OpenSearch requires custom normalizers to be declared in the index settings before they can be referenced in field mappings. Without this, applying the template will fail with: "Unknown normalizer [lowercase_normalizer]".

Other index mappings in this codebase (e.g., openmetadata-spec/src/main/resources/elasticsearch/jp/topic_index_mapping.json) correctly define it:
"settings": {
  "analysis": {
    "normalizer": {
      "lowercase_normalizer": {
        "type": "custom",
        "char_filter": [],
        "filter": ["lowercase"]
      }
    }
  }
}
This definition must be added to the settings section of both the Elasticsearch and OpenSearch indexMappingsTemplate.json files (or the corresponding settings templates).

🤖 Prompt for agents

Code Review: Adds tier field mapping with lowercase_normalizer to the DI index and combines filters, resolving the filter accumulation issue. However, chart.setChartDetails() still mutates the cached entity object, which needs to be addressed before merge.

1. ⚠️ Bug: chart.setChartDetails() still mutates the cached entity object
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:736, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:780

   The code now correctly creates shallow copies of `chartDetails` and `metrics` maps (good fix for the inner mutation), but still calls `chart.setChartDetails(chartDetailsCopy)` on the entity object retrieved from the cache via `Entity.getEntityByName()`.
   
   Since `chart` is the cached reference, calling `setChartDetails()` on it replaces the cached entity's chart details. On the next request:
   1. `chart.getChartDetails()` returns the previously modified copy (with already-combined filters)
   2. `metrics.get("filter")` retrieves the already-combined filter string
   3. `combineFilters` wraps it again in another `bool.must`
   
   This causes nested `bool.must` accumulation across requests, eventually producing deeply nested or bloated Elasticsearch queries.
   
   **Fix**: Clone the `chart` object itself before modifying it, so the cached entity is never touched. Alternatively, work with the copies without calling `setChartDetails` — build a local chart-details map and pass it through a different code path.

   Suggested fix:
   // Clone the chart to avoid mutating the cached entity
   DataInsightCustomChart chartCopy = chart; // replace with proper deep copy
   HashMap<String, Object> chartDetailsCopy = new LinkedHashMap<>(chartDetails);
   chartDetailsCopy.put("metrics", metricsCopies);
   chartCopy = JsonUtils.readValue(
       JsonUtils.pojoToJson(chart), DataInsightCustomChart.class);
   chartCopy.setChartDetails(chartDetailsCopy);

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-03-05T11:29:31Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-03-06T07:19:03Z

Changes have been cherry-picked to the 1.12.2 branch.

manerow had a problem deploying to test March 2, 2026 10:50 — with GitHub Actions Error

manerow self-assigned this Mar 2, 2026

github-actions bot added backend safe to test Add this label to run secure Github workflows on PRs labels Mar 2, 2026

manerow added the To release Will cherry-pick this PR into the release branch label Mar 2, 2026

manerow temporarily deployed to test March 2, 2026 10:51 — with GitHub Actions Inactive

gitar-bot bot reviewed Mar 2, 2026

View reviewed changes

...a-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java Show resolved Hide resolved

manerow changed the title ~~Fix #20052: Combine user filter with existing metric filter in Data Insights charts~~ Fix: Combine user filter with existing metric filter in Data Insights system charts Mar 2, 2026

manerow temporarily deployed to test March 2, 2026 14:02 — with GitHub Actions Inactive

manerow changed the title ~~Fix: Combine user filter with existing metric filter in Data Insights system charts~~ Fix #20052: Add tier field mapping with lowercase_normalizer to DI index and combine filters Mar 2, 2026

gitar-bot bot reviewed Mar 2, 2026

View reviewed changes

openmetadata-service/src/main/resources/dataInsights/elasticsearch/indexMappingsTemplate.json Show resolved Hide resolved

manerow had a problem deploying to test March 3, 2026 08:41 — with GitHub Actions Error

manerow temporarily deployed to test March 4, 2026 09:44 — with GitHub Actions Inactive

manerow force-pushed the fix/20052-tier-filter-data-insights branch from 5629bd1 to ead2a0b Compare March 4, 2026 17:42

manerow temporarily deployed to test March 4, 2026 17:42 — with GitHub Actions Inactive

manerow had a problem deploying to test March 4, 2026 17:42 — with GitHub Actions Failure

manerow temporarily deployed to test March 4, 2026 17:42 — with GitHub Actions Inactive

gitar-bot bot reviewed Mar 4, 2026

View reviewed changes

...a-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java Show resolved Hide resolved

manerow added 2 commits March 5, 2026 09:10

Fix #20052: Combine user filter with existing metric filter in Data I…

7b4e2b6

…nsights charts

Fix #20052: Add tier field mapping with lowercase_normalizer to DI in…

b7d20a7

…dex templates

manerow force-pushed the fix/20052-tier-filter-data-insights branch from ead2a0b to b7d20a7 Compare March 5, 2026 08:10

manerow temporarily deployed to test March 5, 2026 08:10 — with GitHub Actions Inactive

manerow had a problem deploying to test March 5, 2026 08:10 — with GitHub Actions Failure

manerow temporarily deployed to test March 5, 2026 08:10 — with GitHub Actions Inactive

manerow had a problem deploying to test March 5, 2026 10:28 — with GitHub Actions Failure

manerow temporarily deployed to test March 5, 2026 15:42 — with GitHub Actions Inactive

harshach approved these changes Mar 5, 2026

View reviewed changes

manerow merged commit 3760859 into main Mar 6, 2026
49 of 57 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #20052: Add tier field mapping with lowercase_normalizer to DI index and combine filters#26177

Fix #20052: Add tier field mapping with lowercase_normalizer to DI index and combine filters#26177
manerow merged 2 commits intomainfrom
fix/20052-tier-filter-data-insights

manerow commented Mar 2, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gitar-bot bot commented Mar 5, 2026 •

edited

Loading

Issue

Root Cause

`playwright-ci-postgresql (4, 6)` — Resource exhaustion after long parallel test run

`integration-tests-postgres-opensearch` / `integration-tests-mysql-elasticsearch` — DJL/vector service infrastructure failure

Other pre-existing/unrelated failures

Details

Uh oh!

sonarqubecloud bot commented Mar 5, 2026

Uh oh!

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

manerow commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tier field mapping for DI index (main fix)

Combine user filter with existing metric filter (additional fix)

Note on testing and deployment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gitar-bot bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Root Cause

playwright-ci-postgresql (4, 6) — Resource exhaustion after long parallel test run

integration-tests-postgres-opensearch / integration-tests-mysql-elasticsearch — DJL/vector service infrastructure failure

Other pre-existing/unrelated failures

Details

Uh oh!

sonarqubecloud bot commented Mar 5, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

manerow commented Mar 2, 2026 •

edited

Loading

gitar-bot bot commented Mar 5, 2026 •

edited

Loading

`playwright-ci-postgresql (4, 6)` — Resource exhaustion after long parallel test run

`integration-tests-postgres-opensearch` / `integration-tests-mysql-elasticsearch` — DJL/vector service infrastructure failure