Skip to content

Fix #20052: Add tier field mapping with lowercase_normalizer to DI index and combine filters#26177

Merged
manerow merged 2 commits intomainfrom
fix/20052-tier-filter-data-insights
Mar 6, 2026
Merged

Fix #20052: Add tier field mapping with lowercase_normalizer to DI index and combine filters#26177
manerow merged 2 commits intomainfrom
fix/20052-tier-filter-data-insights

Conversation

@manerow
Copy link
Contributor

@manerow manerow commented Mar 2, 2026

Fixes #20052

Summary

Tier field mapping for DI index (main fix)

  • The Data Insights index (di-data-assets-*) was missing an explicit tier field mapping in its component template
  • The tier field was dynamically mapped by Elasticsearch as plain text + keyword (case-sensitive), while the Advanced Search QueryBuilder returns lowercase values (e.g., tier.tier1) because entity search indexes use lowercase_normalizer
  • This casing mismatch (tier.tier1Tier.Tier1) caused tier filters from the Custom Dashboard chart builder's Advanced Search to always return 0 results
  • Added explicit tier mapping with lowercase_normalizer on the keyword subfield to both Elasticsearch and OpenSearch DI index templates
  • Existing deployments will need a DI data reindex for the new mapping to take effect
image

Combine user filter with existing metric filter (additional fix)

  • DataInsightSystemChartRepository.getPreviewData() and listChartData() replaced the metric's pre-existing filter (e.g., exclude_tags_filter) with the user-provided filter instead of combining them
  • Several system charts (all summary cards and line charts like percentage_of_data_asset_with_description) have a built-in exclude_tags_filter that excludes tag, glossaryTerm, and dataProduct entity types — this exclusion was silently dropped when a user applied a tier/team filter
  • Added a combineFilters() method that wraps both the existing metric filter and the user-provided filter in a bool.must clause, ensuring both are applied simultaneously

Note on testing and deployment

The DI component template is only created when the data stream doesn't exist. Since the DI data streams already exist from a previous deployment, the template (with the new tier mapping) is never updated — createOrUpdateDataAssetsDataStream() skips template updates when data streams already exist.

To test the fix on an existing deployment, you need to re-run the Data Insights app with "Recreate Data Assets Index" enabled from the UI (Settings > Applications > Data Insights > Configure > enable Recreate). This will delete the old data streams and re-create them with the updated template.

@manerow manerow changed the title Fix #20052: Combine user filter with existing metric filter in Data Insights charts Fix: Combine user filter with existing metric filter in Data Insights system charts Mar 2, 2026
@manerow manerow changed the title Fix: Combine user filter with existing metric filter in Data Insights system charts Fix #20052: Add tier field mapping with lowercase_normalizer to DI index and combine filters Mar 2, 2026
@gitar-bot
Copy link

gitar-bot bot commented Mar 5, 2026

🔍 CI failure analysis for b7d20a7: All CI failures are pre-existing infrastructure/flakiness issues unrelated to this PR's changes (tier field mapping and combineFilters logic in Data Insights).

Issue

All CI failures are unrelated to this PR's changes (Data Insights tier field mapping and combineFilters() in DataInsightSystemChartRepository.java).

Root Cause

playwright-ci-postgresql (4, 6) — Resource exhaustion after long parallel test run

After 691+ tests pass over ~1.4 hours, failures occur from browser context crashes and search API timeouts:

Entity.spec.ts:2137:7 - page.waitForResponse: Target page, context or browser has been closed
NotificationAlerts.spec.ts:440:1 - page.goto: Timeout 60000ms exceeded
Customproperties-part2.spec.ts:187:9 - Test timeout of 180000ms waiting for /api/v1/search/query response

This is consistent with gradual resource exhaustion during a long parallel test run, not a code regression. The PersonaFlow test data mismatch (PW Persona 16a290d0 vs PW Persona 3671cf98) further indicates race conditions in test setup.

integration-tests-postgres-opensearch / integration-tests-mysql-elasticsearch — DJL/vector service infrastructure failure

org.openmetadata.service.search.vector.client.DjlEmbeddingClient$EmbeddingGenerationException:
  Unexpected error during DJL batch embedding generation
Caused by: java.lang.IllegalStateException: NDManager has been closed already.

and search indexing latency timeouts:

org.awaitility.core.ConditionTimeoutException: Condition with alias 'Wait for entity to appear in search index'
  didn't complete within 1 minutes 30 seconds

This is a pre-existing CI infrastructure issue with the DJL NDManager lifecycle and connection pool shutdown during concurrent test execution.

Other pre-existing/unrelated failures

  • maven-collate-ci — External downstream build failure in openmetadata-collate repository.
  • maven-sonarcloud-ci — SonarCloud token/configuration issue (Project not found).
  • py-run-tests (3.10) — Pre-existing Oracle unit test failures (test_oracle.py); this PR doesn't touch the ingestion layer.
  • Integration Tests (shard-1, 3.10) — Transient network timeout downloading IBM iAccess ODBC driver during Docker build.
  • Test Report — Downstream consequence of integration test failures above.

Details

This PR only modifies:

  • DataInsightSystemChartRepository.javacombineFilters() method
  • dataInsights/elasticsearch/indexMappingsTemplate.jsontier field mapping
  • dataInsights/opensearch/indexMappingsTemplate.jsontier field mapping

None of the failing jobs involve Data Insights charts, tier field mappings, or the combineFilters logic.

Code Review ⚠️ Changes requested 3 resolved / 4 findings

Adds tier field mapping with lowercase_normalizer to the DI index and combines filters, resolving the filter accumulation issue. However, chart.setChartDetails() still mutates the cached entity object, which needs to be addressed before merge.

⚠️ Bug: chart.setChartDetails() still mutates the cached entity object

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:736 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:780

The code now correctly creates shallow copies of chartDetails and metrics maps (good fix for the inner mutation), but still calls chart.setChartDetails(chartDetailsCopy) on the entity object retrieved from the cache via Entity.getEntityByName().

Since chart is the cached reference, calling setChartDetails() on it replaces the cached entity's chart details. On the next request:

  1. chart.getChartDetails() returns the previously modified copy (with already-combined filters)
  2. metrics.get("filter") retrieves the already-combined filter string
  3. combineFilters wraps it again in another bool.must

This causes nested bool.must accumulation across requests, eventually producing deeply nested or bloated Elasticsearch queries.

Fix: Clone the chart object itself before modifying it, so the cached entity is never touched. Alternatively, work with the copies without calling setChartDetails — build a local chart-details map and pass it through a different code path.

Suggested fix
        // Clone the chart to avoid mutating the cached entity
        DataInsightCustomChart chartCopy = chart; // replace with proper deep copy
        HashMap<String, Object> chartDetailsCopy = new LinkedHashMap<>(chartDetails);
        chartDetailsCopy.put("metrics", metricsCopies);
        chartCopy = JsonUtils.readValue(
            JsonUtils.pojoToJson(chart), DataInsightCustomChart.class);
        chartCopy.setChartDetails(chartDetailsCopy);
✅ 3 resolved
Bug: In-place mutation of cached entity causes filter accumulation

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:726 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:762
Both getPreviewData and listChartData mutate the chartDetails map of the entity object in place via metrics.put("filter", combineFilters(...)). In listChartData, the entity is fetched via Entity.getEntityByName(...) which defaults to fromCache=true, meaning it returns a cached instance.

Because the cached entity is mutated, each call permanently modifies the filter in the cache. On the next call:

  1. existingFilter is read from the now-mutated cached entity (which already contains the previously combined filter)
  2. combineFilters wraps it again with the new user filter, creating nested bool.must clauses
  3. This grows unboundedly with each request

For example, after 3 calls with filter F, the cached metric filter becomes:
must[must[must[original, F], F], F]

This also causes cross-request leakage — one user's filter could appear in another user's query results if they hit the same cached entity.

Fix: Deep-copy the chart (or at least chartDetails) before mutating, so the cached entity remains unchanged. Alternatively, pass the combined filter downstream without modifying the entity's map.

Edge Case: combineFilters discards non-query top-level fields from both inputs

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:681
When both filters are non-empty and valid, combineFilters extracts only the query field from each filter JSON (lines 692-693) and builds a new result containing only {"query": ...}. Any other top-level fields present in either input (e.g., hypothetical size, _source, sort, aggs) would be silently dropped.

Currently, the known filter format (e.g., exclude_tags_filter in MigrationUtil.java) only uses a top-level query field, so this isn't a problem today. However, if the filter schema evolves to include additional fields, this method would silently discard them. Consider either:

  1. Documenting that filters must only contain a query field, or
  2. Copying non-query fields from the existing filter into the result to be more resilient to future changes.

This is a low-risk observation since the current contract appears consistent.

Bug: Missing lowercase_normalizer definition will break index creation

📄 openmetadata-service/src/main/resources/dataInsights/elasticsearch/indexMappingsTemplate.json:26 📄 openmetadata-service/src/main/resources/dataInsights/opensearch/indexMappingsTemplate.json:26
The tier.keyword field references "normalizer": "lowercase_normalizer", but this normalizer is never defined in the index settings. Neither indexMappingsTemplate.json (which only has max_ngram_diff: 17) nor indexSettingsTemplate.json (which only has lifecycle policy) include the required settings.analysis.normalizer section.

Elasticsearch/OpenSearch requires custom normalizers to be declared in the index settings before they can be referenced in field mappings. Without this, applying the template will fail with: "Unknown normalizer [lowercase_normalizer]".

Other index mappings in this codebase (e.g., openmetadata-spec/src/main/resources/elasticsearch/jp/topic_index_mapping.json) correctly define it:

"settings": {
  "analysis": {
    "normalizer": {
      "lowercase_normalizer": {
        "type": "custom",
        "char_filter": [],
        "filter": ["lowercase"]
      }
    }
  }
}

This definition must be added to the settings section of both the Elasticsearch and OpenSearch indexMappingsTemplate.json files (or the corresponding settings templates).

🤖 Prompt for agents
Code Review: Adds tier field mapping with lowercase_normalizer to the DI index and combines filters, resolving the filter accumulation issue. However, chart.setChartDetails() still mutates the cached entity object, which needs to be addressed before merge.

1. ⚠️ Bug: chart.setChartDetails() still mutates the cached entity object
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:736, openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/DataInsightSystemChartRepository.java:780

   The code now correctly creates shallow copies of `chartDetails` and `metrics` maps (good fix for the inner mutation), but still calls `chart.setChartDetails(chartDetailsCopy)` on the entity object retrieved from the cache via `Entity.getEntityByName()`.
   
   Since `chart` is the cached reference, calling `setChartDetails()` on it replaces the cached entity's chart details. On the next request:
   1. `chart.getChartDetails()` returns the previously modified copy (with already-combined filters)
   2. `metrics.get("filter")` retrieves the already-combined filter string
   3. `combineFilters` wraps it again in another `bool.must`
   
   This causes nested `bool.must` accumulation across requests, eventually producing deeply nested or bloated Elasticsearch queries.
   
   **Fix**: Clone the `chart` object itself before modifying it, so the cached entity is never touched. Alternatively, work with the copies without calling `setChartDetails` — build a local chart-details map and pass it through a different code path.

   Suggested fix:
   // Clone the chart to avoid mutating the cached entity
   DataInsightCustomChart chartCopy = chart; // replace with proper deep copy
   HashMap<String, Object> chartDetailsCopy = new LinkedHashMap<>(chartDetails);
   chartDetailsCopy.put("metrics", metricsCopies);
   chartCopy = JsonUtils.readValue(
       JsonUtils.pojoToJson(chart), DataInsightCustomChart.class);
   chartCopy.setChartDetails(chartDetailsCopy);

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 5, 2026

@manerow manerow merged commit 3760859 into main Mar 6, 2026
49 of 57 checks passed
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Changes have been cherry-picked to the 1.12.2 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tier Filter not working in Insights Page Advance Search

2 participants