You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Data Insights index (di-data-assets-*) was missing an explicit tier field mapping in its component template
The tier field was dynamically mapped by Elasticsearch as plain text + keyword (case-sensitive), while the Advanced Search QueryBuilder returns lowercase values (e.g., tier.tier1) because entity search indexes use lowercase_normalizer
This casing mismatch (tier.tier1 ≠ Tier.Tier1) caused tier filters from the Custom Dashboard chart builder's Advanced Search to always return 0 results
Added explicit tier mapping with lowercase_normalizer on the keyword subfield to both Elasticsearch and OpenSearch DI index templates
Existing deployments will need a DI data reindex for the new mapping to take effect
Combine user filter with existing metric filter (additional fix)
DataInsightSystemChartRepository.getPreviewData() and listChartData() replaced the metric's pre-existing filter (e.g., exclude_tags_filter) with the user-provided filter instead of combining them
Several system charts (all summary cards and line charts like percentage_of_data_asset_with_description) have a built-in exclude_tags_filter that excludes tag, glossaryTerm, and dataProduct entity types — this exclusion was silently dropped when a user applied a tier/team filter
Added a combineFilters() method that wraps both the existing metric filter and the user-provided filter in a bool.must clause, ensuring both are applied simultaneously
Note on testing and deployment
The DI component template is only created when the data stream doesn't exist. Since the DI data streams already exist from a previous deployment, the template (with the new tier mapping) is never updated — createOrUpdateDataAssetsDataStream() skips template updates when data streams already exist.
To test the fix on an existing deployment, you need to re-run the Data Insights app with "Recreate Data Assets Index" enabled from the UI (Settings > Applications > Data Insights > Configure > enable Recreate). This will delete the old data streams and re-create them with the updated template.
manerow
changed the title
Fix #20052: Combine user filter with existing metric filter in Data Insights charts
Fix: Combine user filter with existing metric filter in Data Insights system charts
Mar 2, 2026
manerow
changed the title
Fix: Combine user filter with existing metric filter in Data Insights system charts
Fix #20052: Add tier field mapping with lowercase_normalizer to DI index and combine filters
Mar 2, 2026
🔍 CI failure analysis for b7d20a7: All CI failures are pre-existing infrastructure/flakiness issues unrelated to this PR's changes (tier field mapping and combineFilters logic in Data Insights).
Issue
All CI failures are unrelated to this PR's changes (Data Insights tier field mapping and combineFilters() in DataInsightSystemChartRepository.java).
Root Cause
playwright-ci-postgresql (4, 6) — Resource exhaustion after long parallel test run
After 691+ tests pass over ~1.4 hours, failures occur from browser context crashes and search API timeouts:
Entity.spec.ts:2137:7 - page.waitForResponse: Target page, context or browser has been closed
NotificationAlerts.spec.ts:440:1 - page.goto: Timeout 60000ms exceeded
Customproperties-part2.spec.ts:187:9 - Test timeout of 180000ms waiting for /api/v1/search/query response
This is consistent with gradual resource exhaustion during a long parallel test run, not a code regression. The PersonaFlow test data mismatch (PW Persona 16a290d0 vs PW Persona 3671cf98) further indicates race conditions in test setup.
integration-tests-postgres-opensearch / integration-tests-mysql-elasticsearch — DJL/vector service infrastructure failure
org.openmetadata.service.search.vector.client.DjlEmbeddingClient$EmbeddingGenerationException:
Unexpected error during DJL batch embedding generation
Caused by: java.lang.IllegalStateException: NDManager has been closed already.
and search indexing latency timeouts:
org.awaitility.core.ConditionTimeoutException: Condition with alias 'Wait for entity to appear in search index'
didn't complete within 1 minutes 30 seconds
This is a pre-existing CI infrastructure issue with the DJL NDManager lifecycle and connection pool shutdown during concurrent test execution.
Other pre-existing/unrelated failures
maven-collate-ci — External downstream build failure in openmetadata-collate repository.
maven-sonarcloud-ci — SonarCloud token/configuration issue (Project not found).
py-run-tests (3.10) — Pre-existing Oracle unit test failures (test_oracle.py); this PR doesn't touch the ingestion layer.
Integration Tests (shard-1, 3.10) — Transient network timeout downloading IBM iAccess ODBC driver during Docker build.
Test Report — Downstream consequence of integration test failures above.
Adds tier field mapping with lowercase_normalizer to the DI index and combines filters, resolving the filter accumulation issue. However, chart.setChartDetails() still mutates the cached entity object, which needs to be addressed before merge.
⚠️Bug: chart.setChartDetails() still mutates the cached entity object
The code now correctly creates shallow copies of chartDetails and metrics maps (good fix for the inner mutation), but still calls chart.setChartDetails(chartDetailsCopy) on the entity object retrieved from the cache via Entity.getEntityByName().
Since chart is the cached reference, calling setChartDetails() on it replaces the cached entity's chart details. On the next request:
chart.getChartDetails() returns the previously modified copy (with already-combined filters)
metrics.get("filter") retrieves the already-combined filter string
combineFilters wraps it again in another bool.must
This causes nested bool.must accumulation across requests, eventually producing deeply nested or bloated Elasticsearch queries.
Fix: Clone the chart object itself before modifying it, so the cached entity is never touched. Alternatively, work with the copies without calling setChartDetails — build a local chart-details map and pass it through a different code path.
Suggested fix
// Clone the chart to avoid mutating the cached entity
DataInsightCustomChart chartCopy = chart; // replace with proper deep copy
HashMap<String, Object> chartDetailsCopy = new LinkedHashMap<>(chartDetails);
chartDetailsCopy.put("metrics", metricsCopies);
chartCopy = JsonUtils.readValue(
JsonUtils.pojoToJson(chart), DataInsightCustomChart.class);
chartCopy.setChartDetails(chartDetailsCopy);
📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:726📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:762
Both getPreviewData and listChartData mutate the chartDetails map of the entity object in place via metrics.put("filter", combineFilters(...)). In listChartData, the entity is fetched via Entity.getEntityByName(...) which defaults to fromCache=true, meaning it returns a cached instance.
Because the cached entity is mutated, each call permanently modifies the filter in the cache. On the next call:
existingFilter is read from the now-mutated cached entity (which already contains the previously combined filter)
combineFilters wraps it again with the new user filter, creating nested bool.must clauses
This grows unboundedly with each request
For example, after 3 calls with filter F, the cached metric filter becomes: must[must[must[original, F], F], F]
This also causes cross-request leakage — one user's filter could appear in another user's query results if they hit the same cached entity.
Fix: Deep-copy the chart (or at least chartDetails) before mutating, so the cached entity remains unchanged. Alternatively, pass the combined filter downstream without modifying the entity's map.
✅ Edge Case: combineFilters discards non-query top-level fields from both inputs
📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:681
When both filters are non-empty and valid, combineFilters extracts only the query field from each filter JSON (lines 692-693) and builds a new result containing only {"query": ...}. Any other top-level fields present in either input (e.g., hypothetical size, _source, sort, aggs) would be silently dropped.
Currently, the known filter format (e.g., exclude_tags_filter in MigrationUtil.java) only uses a top-level query field, so this isn't a problem today. However, if the filter schema evolves to include additional fields, this method would silently discard them. Consider either:
Documenting that filters must only contain a query field, or
Copying non-query fields from the existing filter into the result to be more resilient to future changes.
This is a low-risk observation since the current contract appears consistent.
✅ Bug: Missing lowercase_normalizer definition will break index creation
📄 openmetadata-service/src/main/resources/dataInsights/elasticsearch/indexMappingsTemplate.json:26📄 openmetadata-service/src/main/resources/dataInsights/opensearch/indexMappingsTemplate.json:26
The tier.keyword field references "normalizer": "lowercase_normalizer", but this normalizer is never defined in the index settings. Neither indexMappingsTemplate.json (which only has max_ngram_diff: 17) nor indexSettingsTemplate.json (which only has lifecycle policy) include the required settings.analysis.normalizer section.
Elasticsearch/OpenSearch requires custom normalizers to be declared in the index settings before they can be referenced in field mappings. Without this, applying the template will fail with: "Unknown normalizer [lowercase_normalizer]".
Other index mappings in this codebase (e.g., openmetadata-spec/src/main/resources/elasticsearch/jp/topic_index_mapping.json) correctly define it:
This definition must be added to the settings section of both the Elasticsearch and OpenSearch indexMappingsTemplate.json files (or the corresponding settings templates).
🤖 Prompt for agents
Code Review: Adds tier field mapping with lowercase_normalizer to the DI index and combines filters, resolving the filter accumulation issue. However, chart.setChartDetails() still mutates the cached entity object, which needs to be addressed before merge.
1. ⚠️ Bug: chart.setChartDetails() still mutates the cached entity object
Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:736, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:780
The code now correctly creates shallow copies of `chartDetails` and `metrics` maps (good fix for the inner mutation), but still calls `chart.setChartDetails(chartDetailsCopy)` on the entity object retrieved from the cache via `Entity.getEntityByName()`.
Since `chart` is the cached reference, calling `setChartDetails()` on it replaces the cached entity's chart details. On the next request:
1. `chart.getChartDetails()` returns the previously modified copy (with already-combined filters)
2. `metrics.get("filter")` retrieves the already-combined filter string
3. `combineFilters` wraps it again in another `bool.must`
This causes nested `bool.must` accumulation across requests, eventually producing deeply nested or bloated Elasticsearch queries.
**Fix**: Clone the `chart` object itself before modifying it, so the cached entity is never touched. Alternatively, work with the copies without calling `setChartDetails` — build a local chart-details map and pass it through a different code path.
Suggested fix:
// Clone the chart to avoid mutating the cached entity
DataInsightCustomChart chartCopy = chart; // replace with proper deep copy
HashMap<String, Object> chartDetailsCopy = new LinkedHashMap<>(chartDetails);
chartDetailsCopy.put("metrics", metricsCopies);
chartCopy = JsonUtils.readValue(
JsonUtils.pojoToJson(chart), DataInsightCustomChart.class);
chartCopy.setChartDetails(chartDetailsCopy);
Tip
Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on
Options
Auto-apply is off → Gitar will not commit updates to this branch. Display: compact → Showing less information.
Changes have been cherry-picked to the 1.12.2 branch.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
backendsafe to testAdd this label to run secure Github workflows on PRsTo releaseWill cherry-pick this PR into the release branch
2 participants
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #20052
Summary
Tier field mapping for DI index (main fix)
di-data-assets-*) was missing an explicittierfield mapping in its component templatetierfield was dynamically mapped by Elasticsearch as plaintext+keyword(case-sensitive), while the Advanced Search QueryBuilder returns lowercase values (e.g.,tier.tier1) because entity search indexes uselowercase_normalizertier.tier1≠Tier.Tier1) caused tier filters from the Custom Dashboard chart builder's Advanced Search to always return 0 resultstiermapping withlowercase_normalizeron thekeywordsubfield to both Elasticsearch and OpenSearch DI index templatesCombine user filter with existing metric filter (additional fix)
DataInsightSystemChartRepository.getPreviewData()andlistChartData()replaced the metric's pre-existing filter (e.g.,exclude_tags_filter) with the user-provided filter instead of combining thempercentage_of_data_asset_with_description) have a built-inexclude_tags_filterthat excludestag,glossaryTerm, anddataProductentity types — this exclusion was silently dropped when a user applied a tier/team filtercombineFilters()method that wraps both the existing metric filter and the user-provided filter in abool.mustclause, ensuring both are applied simultaneouslyNote on testing and deployment
The DI component template is only created when the data stream doesn't exist. Since the DI data streams already exist from a previous deployment, the template (with the new
tiermapping) is never updated —createOrUpdateDataAssetsDataStream()skips template updates when data streams already exist.To test the fix on an existing deployment, you need to re-run the Data Insights app with "Recreate Data Assets Index" enabled from the UI (Settings > Applications > Data Insights > Configure > enable Recreate). This will delete the old data streams and re-create them with the updated template.