Skip to content

fix(search): resolve entity-specific aliases to canonical index names#27813

Merged
pmbrull merged 1 commit intomainfrom
fix-search-alias-bleed
Apr 29, 2026
Merged

fix(search): resolve entity-specific aliases to canonical index names#27813
pmbrull merged 1 commit intomainfrom
fix-search-alias-bleed

Conversation

@mohityadav766
Copy link
Copy Markdown
Member

Closes #27761.

Problem

When the UI migrated from index=table_search_index to index=table, every search for tables started returning column docs too. ES alias expansion is bidirectional through the parent/child graph in indexMapping.json: column_search_index is created with table as one of its aliases (because tableColumn lists "table" in its parentAliases), so when ES resolves alias table it expands to both table_search_index AND column_search_index.

Same bleed exists for every alias whose entities also act as a parent for some other entity — testCase, testSuite, database, …

Fix

One method. SearchRepository.getIndexOrAliasName(String) now resolves entity-specific aliases to their canonical *_search_index names so we hand ES literal index names and bypass alias expansion entirely.

Input Output Notes
"table" (entity alias) "<cluster>_table_search_index" The fix. Entity-specific alias resolves to canonical, no children leak.
"dataAsset" / "all" (compound) "<cluster>_dataAsset" Passes through. ES expands natively. UI widgets that span the umbrella keep working.
"table_search_index" (canonical) "<cluster>_table_search_index" Legacy callers unchanged.
"<cluster>_table_search_index" unchanged Idempotent — internal code that hands a resolved value back doesn't double-prefix.
"table," / "," empty tokens dropped All-empty input preserved so ES surfaces "unknown index" instead of an empty-target failure.

That's the entire change. No new query params, no schema changes, no method signatures touched. Every existing caller of getIndexOrAliasName picks up the new behavior:

  • /v1/search/query, /v1/search/export, /v1/search/preview, /v1/search/nlq/query — resource layer already pre-resolves via this method.
  • /v1/search/aggregate (GET + POST), /v1/search/fieldQuery, /v1/search/entityTypeCounts — ES/OS managers already resolve via this method.

Why no flag mechanism

A previous draft of this fix (PR #27762, fix-cluster-aliasing branch) added fetchParentsAliases / fetchChildAliases flags so callers could opt into selective expansion. Audit of every caller showed nobody actually needs that — UI sites with SearchIndex.TABLE / TOPIC / etc. want only that type; UI sites that need mixed entity types use SearchIndex.ALL / SearchIndex.DATA_ASSET (compound aliases, unchanged by this fix); internal Java callers (RBAC, propagation, DataInsightSystemChartRepository) pass entity-specific aliases for entity-specific operations. Adding a flag for a use case nobody has just enlarges the API surface.

Test plan

  • Unit tests in SearchRepositoryBehaviorTest pin: entity-alias → canonical resolution; compound-alias passthrough; idempotent already-prefixed token; comma-separated independent resolution; empty-token / all-empty handling; existing indexNameHelpersRespectClusterAlias test (canonical-name input) unchanged.
  • Integration smoke against a running stack:
    curl '…/v1/search/query?q=*&index=table&size=20' -H "Authorization: Bearer $TOKEN" \
      | jq '[.hits.hits[]._source.entityType] | unique'
    # Expect: ["table"]   (no "column")
    curl '…/v1/search/query?q=*&index=dataAsset&size=20' -H "Authorization: Bearer $TOKEN" \
      | jq '[.hits.hits[]._source.entityType] | unique'
    # Expect: includes table, topic, dashboard, … (compound alias still works)
  • Multi-tenant deployment with clusterAlias configured: ?index=table resolves to <tenant>_table_search_index exactly once (no double-prefix).

🤖 Generated with Claude Code

Closes #27761.

Background. When the UI started passing the alias `index=table` (instead
of the legacy `index=table_search_index`), every search for tables also
returned column docs. Root cause: ES alias expansion is bidirectional
through the parent/child graph in indexMapping.json. column_search_index
is created with `table` as one of its aliases (because tableColumn lists
"table" in parentAliases), so when ES sees alias `table` it expands it
to both `table_search_index` and `column_search_index`. The same shape
exists for every alias whose entities also act as a parent for some
other entity (testCase, testSuite, database, …).

Fix. Resolve entity-specific aliases at the API boundary into their
canonical `*_search_index` names so we send literal index names to ES,
bypassing alias expansion entirely. Compound aliases (`all`,
`dataAsset`) have no canonical index — they pass through unchanged so
ES expands them natively, preserving the "everything under the
data-asset umbrella" use case the UI's MyData / CuratedAssets / search-
bar widgets rely on.

The change is one method, `SearchRepository.getIndexOrAliasName(String)`:

  * entity-specific alias (`"table"`)         → `"<cluster>_table_search_index"`
  * compound alias (`"dataAsset"`, `"all"`)   → `"<cluster>_dataAsset"` (passes through)
  * canonical name (`"table_search_index"`)   → `"<cluster>_table_search_index"` (legacy callers)
  * already cluster-prefixed                  → returned unchanged (idempotent)
  * empty token from `"table,"` / `","`       → dropped, with all-empty input preserved

All four search/export/preview/NLQ resource paths already call this
method; `searchByField`, `aggregate`, and `getEntityTypeCounts` already
call it inside the ES/OS managers. So the fix takes effect across every
endpoint that accepts an `index` parameter without changing the public
API surface — no new query params, no schema changes, no signature
churn.

No caller passes an entity-specific alias and expects child entity types
back: UI sites with `SearchIndex.TABLE`/`TOPIC`/etc. all want only that
type (asset-type filter chips, advanced-search builders, lineage
selection, alert rule scoping). UI sites that DO want mixed entity
types use `SearchIndex.ALL` or `SearchIndex.DATA_ASSET`, which are
compound aliases that this change leaves unchanged. Internal Java
callers (RBAC, propagation, DataInsightSystemChartRepository) pass
entity-specific aliases for entity-specific operations — no leakage
expected there either.

Tests pin: entity-alias → canonical resolution; compound-alias passes
through; idempotent prefix; comma-separated input; empty-token
handling; existing canonical-name behavior unchanged.
Copilot AI review requested due to automatic review settings April 29, 2026 10:11
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 29, 2026
Comment on lines +284 to +298
@Test
void getIndexOrAliasNameResolvesEntitySpecificAliasToCanonicalIndex() {
assertEquals("cluster_table_search_index", repository.getIndexOrAliasName("table"));
assertEquals("cluster_domain_search_index", repository.getIndexOrAliasName("domain"));
}

/**
* Compound aliases like {@code "all"} and {@code "dataAsset"} have no entry in
* {@code entityIndexMap} (they're meta-aliases registered against many entities at index
* creation time). The resolver passes them through with the cluster prefix so ES expands them
* natively — searching {@code dataAsset} should still surface every data-asset entity.
*/
@Test
void getIndexOrAliasNamePassesCompoundAliasesThroughForNativeESExpansion() {
assertEquals("cluster_dataAsset", repository.getIndexOrAliasName("dataAsset"));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Edge Case: No test coverage for non-clustered (null/empty clusterAlias) path

All new tests in SearchRepositoryBehaviorTest use clusterAlias = "cluster". The fix changes behavior for non-clustered deployments too: previously getIndexOrAliasName("table") with no cluster alias returned "table" (passthrough), now it returns "table_search_index" (resolved canonical name). This is the core of the fix and arguably the more common deployment mode, but it has zero dedicated test coverage. If IndexMapping.getIndexName(null) or the prefix == null path in resolveSingleAliasToken ever regresses, no test will catch it.

Suggested fix:

@Test
void getIndexOrAliasNameResolvesAliasWithoutClusterAlias() {
  SearchRepository noPrefixRepo =
      newRepository(
          Map.of(Entity.TABLE, TABLE_MAPPING, Entity.DOMAIN, DOMAIN_MAPPING),
          null);
  assertEquals("table_search_index", noPrefixRepo.getIndexOrAliasName("table"));
  assertEquals("dataAsset", noPrefixRepo.getIndexOrAliasName("dataAsset"));
  assertEquals("table_search_index", noPrefixRepo.getIndexOrAliasName("table_search_index"));
}

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 29, 2026

Code Review 👍 Approved with suggestions 0 resolved / 1 findings

Resolves entity-specific aliases to canonical index names within the search repository. Add test coverage for the non-clustered (null or empty clusterAlias) execution path to ensure robust behavior.

💡 Edge Case: No test coverage for non-clustered (null/empty clusterAlias) path

📄 openmetadata-service/src/test/java/org/openmetadata/service/search/SearchRepositoryBehaviorTest.java:284-298

All new tests in SearchRepositoryBehaviorTest use clusterAlias = "cluster". The fix changes behavior for non-clustered deployments too: previously getIndexOrAliasName("table") with no cluster alias returned "table" (passthrough), now it returns "table_search_index" (resolved canonical name). This is the core of the fix and arguably the more common deployment mode, but it has zero dedicated test coverage. If IndexMapping.getIndexName(null) or the prefix == null path in resolveSingleAliasToken ever regresses, no test will catch it.

Suggested fix
@Test
void getIndexOrAliasNameResolvesAliasWithoutClusterAlias() {
  SearchRepository noPrefixRepo =
      newRepository(
          Map.of(Entity.TABLE, TABLE_MAPPING, Entity.DOMAIN, DOMAIN_MAPPING),
          null);
  assertEquals("table_search_index", noPrefixRepo.getIndexOrAliasName("table"));
  assertEquals("dataAsset", noPrefixRepo.getIndexOrAliasName("dataAsset"));
  assertEquals("table_search_index", noPrefixRepo.getIndexOrAliasName("table_search_index"));
}
🤖 Prompt for agents
Code Review: Resolves entity-specific aliases to canonical index names within the search repository. Add test coverage for the non-clustered (null or empty clusterAlias) execution path to ensure robust behavior.

1. 💡 Edge Case: No test coverage for non-clustered (null/empty clusterAlias) path
   Files: openmetadata-service/src/test/java/org/openmetadata/service/search/SearchRepositoryBehaviorTest.java:284-298

   All new tests in `SearchRepositoryBehaviorTest` use `clusterAlias = "cluster"`. The fix changes behavior for **non-clustered** deployments too: previously `getIndexOrAliasName("table")` with no cluster alias returned `"table"` (passthrough), now it returns `"table_search_index"` (resolved canonical name). This is the core of the fix and arguably the more common deployment mode, but it has zero dedicated test coverage. If `IndexMapping.getIndexName(null)` or the `prefix == null` path in `resolveSingleAliasToken` ever regresses, no test will catch it.

   Suggested fix:
   @Test
   void getIndexOrAliasNameResolvesAliasWithoutClusterAlias() {
     SearchRepository noPrefixRepo =
         newRepository(
             Map.of(Entity.TABLE, TABLE_MAPPING, Entity.DOMAIN, DOMAIN_MAPPING),
             null);
     assertEquals("table_search_index", noPrefixRepo.getIndexOrAliasName("table"));
     assertEquals("dataAsset", noPrefixRepo.getIndexOrAliasName("dataAsset"));
     assertEquals("table_search_index", noPrefixRepo.getIndexOrAliasName("table_search_index"));
   }

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes search-result “bleed” caused by Elasticsearch/OpenSearch alias expansion by resolving entity-specific index aliases (e.g., table) to their canonical *_search_index names before issuing queries.

Changes:

  • Updated SearchRepository.getIndexOrAliasName to resolve entity-specific aliases via entityIndexMap to canonical index names (while leaving compound aliases like all/dataAsset to expand natively).
  • Added unit tests in SearchRepositoryBehaviorTest to cover entity-alias resolution, compound passthrough, idempotence, comma-separated handling, and empty-token behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java Implements entity-alias → canonical-index resolution and safer token parsing/prefixing logic.
openmetadata-service/src/test/java/org/openmetadata/service/search/SearchRepositoryBehaviorTest.java Adds regression and behavior tests validating the new index/alias resolution rules.

@sonarqubecloud
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (16 flaky)

✅ 3966 passed · ❌ 0 failed · 🟡 16 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
🟡 Shard 2 742 0 3 8
🟡 Shard 3 752 0 3 7
🟡 Shard 4 757 0 2 18
🟡 Shard 5 685 0 2 41
🟡 Shard 6 732 0 5 8
🟡 16 flaky test(s) (passed on retry)
  • Features/DataAssetRulesDisabled.spec.ts › Database Schema (shard 1, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
  • Features/DataProductRenameConsolidation.spec.ts › Rename then change owner - assets should be preserved (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Flow/AddRoleAndAssignToUser.spec.ts › Verify assigned role to new user (shard 3, 1 retry)
  • Flow/PersonaFlow.spec.ts › Set default persona for team should work properly (shard 3, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Chart (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Add and update Security and SLA tabs (shard 4, 1 retry)
  • Pages/EntityDataConsumer.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
  • Pages/EntityDataConsumer.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
  • Features/AutoPilot.spec.ts › Agents created by AutoPilot should be deleted (shard 6, 1 retry)
  • Pages/Lineage/DataAssetLineage.spec.ts › Column lineage for searchIndex -> searchIndex (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Users.spec.ts › Check permissions for Data Steward (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@pmbrull pmbrull merged commit 8b0f4a7 into main Apr 29, 2026
60 of 64 checks passed
@pmbrull pmbrull deleted the fix-search-alias-bleed branch April 29, 2026 15:04
mohityadav766 added a commit that referenced this pull request Apr 29, 2026
…#27813)

Closes #27761.

Background. When the UI started passing the alias `index=table` (instead
of the legacy `index=table_search_index`), every search for tables also
returned column docs. Root cause: ES alias expansion is bidirectional
through the parent/child graph in indexMapping.json. column_search_index
is created with `table` as one of its aliases (because tableColumn lists
"table" in parentAliases), so when ES sees alias `table` it expands it
to both `table_search_index` and `column_search_index`. The same shape
exists for every alias whose entities also act as a parent for some
other entity (testCase, testSuite, database, …).

Fix. Resolve entity-specific aliases at the API boundary into their
canonical `*_search_index` names so we send literal index names to ES,
bypassing alias expansion entirely. Compound aliases (`all`,
`dataAsset`) have no canonical index — they pass through unchanged so
ES expands them natively, preserving the "everything under the
data-asset umbrella" use case the UI's MyData / CuratedAssets / search-
bar widgets rely on.

The change is one method, `SearchRepository.getIndexOrAliasName(String)`:

  * entity-specific alias (`"table"`)         → `"<cluster>_table_search_index"`
  * compound alias (`"dataAsset"`, `"all"`)   → `"<cluster>_dataAsset"` (passes through)
  * canonical name (`"table_search_index"`)   → `"<cluster>_table_search_index"` (legacy callers)
  * already cluster-prefixed                  → returned unchanged (idempotent)
  * empty token from `"table,"` / `","`       → dropped, with all-empty input preserved

All four search/export/preview/NLQ resource paths already call this
method; `searchByField`, `aggregate`, and `getEntityTypeCounts` already
call it inside the ES/OS managers. So the fix takes effect across every
endpoint that accepts an `index` parameter without changing the public
API surface — no new query params, no schema changes, no signature
churn.

No caller passes an entity-specific alias and expects child entity types
back: UI sites with `SearchIndex.TABLE`/`TOPIC`/etc. all want only that
type (asset-type filter chips, advanced-search builders, lineage
selection, alert rule scoping). UI sites that DO want mixed entity
types use `SearchIndex.ALL` or `SearchIndex.DATA_ASSET`, which are
compound aliases that this change leaves unchanged. Internal Java
callers (RBAC, propagation, DataInsightSystemChartRepository) pass
entity-specific aliases for entity-specific operations — no leakage
expected there either.

Tests pin: entity-alias → canonical resolution; compound-alias passes
through; idempotent prefix; comma-separated input; empty-token
handling; existing canonical-name behavior unchanged.

(cherry picked from commit 8b0f4a7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue with Aliases in Search

3 participants