fix(search): resolve entity-specific aliases to canonical index names#27813
fix(search): resolve entity-specific aliases to canonical index names#27813
Conversation
Closes #27761. Background. When the UI started passing the alias `index=table` (instead of the legacy `index=table_search_index`), every search for tables also returned column docs. Root cause: ES alias expansion is bidirectional through the parent/child graph in indexMapping.json. column_search_index is created with `table` as one of its aliases (because tableColumn lists "table" in parentAliases), so when ES sees alias `table` it expands it to both `table_search_index` and `column_search_index`. The same shape exists for every alias whose entities also act as a parent for some other entity (testCase, testSuite, database, …). Fix. Resolve entity-specific aliases at the API boundary into their canonical `*_search_index` names so we send literal index names to ES, bypassing alias expansion entirely. Compound aliases (`all`, `dataAsset`) have no canonical index — they pass through unchanged so ES expands them natively, preserving the "everything under the data-asset umbrella" use case the UI's MyData / CuratedAssets / search- bar widgets rely on. The change is one method, `SearchRepository.getIndexOrAliasName(String)`: * entity-specific alias (`"table"`) → `"<cluster>_table_search_index"` * compound alias (`"dataAsset"`, `"all"`) → `"<cluster>_dataAsset"` (passes through) * canonical name (`"table_search_index"`) → `"<cluster>_table_search_index"` (legacy callers) * already cluster-prefixed → returned unchanged (idempotent) * empty token from `"table,"` / `","` → dropped, with all-empty input preserved All four search/export/preview/NLQ resource paths already call this method; `searchByField`, `aggregate`, and `getEntityTypeCounts` already call it inside the ES/OS managers. So the fix takes effect across every endpoint that accepts an `index` parameter without changing the public API surface — no new query params, no schema changes, no signature churn. No caller passes an entity-specific alias and expects child entity types back: UI sites with `SearchIndex.TABLE`/`TOPIC`/etc. all want only that type (asset-type filter chips, advanced-search builders, lineage selection, alert rule scoping). UI sites that DO want mixed entity types use `SearchIndex.ALL` or `SearchIndex.DATA_ASSET`, which are compound aliases that this change leaves unchanged. Internal Java callers (RBAC, propagation, DataInsightSystemChartRepository) pass entity-specific aliases for entity-specific operations — no leakage expected there either. Tests pin: entity-alias → canonical resolution; compound-alias passes through; idempotent prefix; comma-separated input; empty-token handling; existing canonical-name behavior unchanged.
| @Test | ||
| void getIndexOrAliasNameResolvesEntitySpecificAliasToCanonicalIndex() { | ||
| assertEquals("cluster_table_search_index", repository.getIndexOrAliasName("table")); | ||
| assertEquals("cluster_domain_search_index", repository.getIndexOrAliasName("domain")); | ||
| } | ||
|
|
||
| /** | ||
| * Compound aliases like {@code "all"} and {@code "dataAsset"} have no entry in | ||
| * {@code entityIndexMap} (they're meta-aliases registered against many entities at index | ||
| * creation time). The resolver passes them through with the cluster prefix so ES expands them | ||
| * natively — searching {@code dataAsset} should still surface every data-asset entity. | ||
| */ | ||
| @Test | ||
| void getIndexOrAliasNamePassesCompoundAliasesThroughForNativeESExpansion() { | ||
| assertEquals("cluster_dataAsset", repository.getIndexOrAliasName("dataAsset")); |
There was a problem hiding this comment.
💡 Edge Case: No test coverage for non-clustered (null/empty clusterAlias) path
All new tests in SearchRepositoryBehaviorTest use clusterAlias = "cluster". The fix changes behavior for non-clustered deployments too: previously getIndexOrAliasName("table") with no cluster alias returned "table" (passthrough), now it returns "table_search_index" (resolved canonical name). This is the core of the fix and arguably the more common deployment mode, but it has zero dedicated test coverage. If IndexMapping.getIndexName(null) or the prefix == null path in resolveSingleAliasToken ever regresses, no test will catch it.
Suggested fix:
@Test
void getIndexOrAliasNameResolvesAliasWithoutClusterAlias() {
SearchRepository noPrefixRepo =
newRepository(
Map.of(Entity.TABLE, TABLE_MAPPING, Entity.DOMAIN, DOMAIN_MAPPING),
null);
assertEquals("table_search_index", noPrefixRepo.getIndexOrAliasName("table"));
assertEquals("dataAsset", noPrefixRepo.getIndexOrAliasName("dataAsset"));
assertEquals("table_search_index", noPrefixRepo.getIndexOrAliasName("table_search_index"));
}
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
Code Review 👍 Approved with suggestions 0 resolved / 1 findingsResolves entity-specific aliases to canonical index names within the search repository. Add test coverage for the non-clustered (null or empty clusterAlias) execution path to ensure robust behavior. 💡 Edge Case: No test coverage for non-clustered (null/empty clusterAlias) pathAll new tests in Suggested fix🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
There was a problem hiding this comment.
Pull request overview
This PR fixes search-result “bleed” caused by Elasticsearch/OpenSearch alias expansion by resolving entity-specific index aliases (e.g., table) to their canonical *_search_index names before issuing queries.
Changes:
- Updated
SearchRepository.getIndexOrAliasNameto resolve entity-specific aliases viaentityIndexMapto canonical index names (while leaving compound aliases likeall/dataAssetto expand natively). - Added unit tests in
SearchRepositoryBehaviorTestto cover entity-alias resolution, compound passthrough, idempotence, comma-separated handling, and empty-token behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java | Implements entity-alias → canonical-index resolution and safer token parsing/prefixing logic. |
| openmetadata-service/src/test/java/org/openmetadata/service/search/SearchRepositoryBehaviorTest.java | Adds regression and behavior tests validating the new index/alias resolution rules. |
|
🟡 Playwright Results — all passed (16 flaky)✅ 3966 passed · ❌ 0 failed · 🟡 16 flaky · ⏭️ 86 skipped
🟡 16 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
…#27813) Closes #27761. Background. When the UI started passing the alias `index=table` (instead of the legacy `index=table_search_index`), every search for tables also returned column docs. Root cause: ES alias expansion is bidirectional through the parent/child graph in indexMapping.json. column_search_index is created with `table` as one of its aliases (because tableColumn lists "table" in parentAliases), so when ES sees alias `table` it expands it to both `table_search_index` and `column_search_index`. The same shape exists for every alias whose entities also act as a parent for some other entity (testCase, testSuite, database, …). Fix. Resolve entity-specific aliases at the API boundary into their canonical `*_search_index` names so we send literal index names to ES, bypassing alias expansion entirely. Compound aliases (`all`, `dataAsset`) have no canonical index — they pass through unchanged so ES expands them natively, preserving the "everything under the data-asset umbrella" use case the UI's MyData / CuratedAssets / search- bar widgets rely on. The change is one method, `SearchRepository.getIndexOrAliasName(String)`: * entity-specific alias (`"table"`) → `"<cluster>_table_search_index"` * compound alias (`"dataAsset"`, `"all"`) → `"<cluster>_dataAsset"` (passes through) * canonical name (`"table_search_index"`) → `"<cluster>_table_search_index"` (legacy callers) * already cluster-prefixed → returned unchanged (idempotent) * empty token from `"table,"` / `","` → dropped, with all-empty input preserved All four search/export/preview/NLQ resource paths already call this method; `searchByField`, `aggregate`, and `getEntityTypeCounts` already call it inside the ES/OS managers. So the fix takes effect across every endpoint that accepts an `index` parameter without changing the public API surface — no new query params, no schema changes, no signature churn. No caller passes an entity-specific alias and expects child entity types back: UI sites with `SearchIndex.TABLE`/`TOPIC`/etc. all want only that type (asset-type filter chips, advanced-search builders, lineage selection, alert rule scoping). UI sites that DO want mixed entity types use `SearchIndex.ALL` or `SearchIndex.DATA_ASSET`, which are compound aliases that this change leaves unchanged. Internal Java callers (RBAC, propagation, DataInsightSystemChartRepository) pass entity-specific aliases for entity-specific operations — no leakage expected there either. Tests pin: entity-alias → canonical resolution; compound-alias passes through; idempotent prefix; comma-separated input; empty-token handling; existing canonical-name behavior unchanged. (cherry picked from commit 8b0f4a7)



Closes #27761.
Problem
When the UI migrated from
index=table_search_indextoindex=table, every search for tables started returning column docs too. ES alias expansion is bidirectional through the parent/child graph inindexMapping.json:column_search_indexis created withtableas one of its aliases (becausetableColumnlists"table"in itsparentAliases), so when ES resolves aliastableit expands to bothtable_search_indexANDcolumn_search_index.Same bleed exists for every alias whose entities also act as a parent for some other entity —
testCase,testSuite,database, …Fix
One method.
SearchRepository.getIndexOrAliasName(String)now resolves entity-specific aliases to their canonical*_search_indexnames so we hand ES literal index names and bypass alias expansion entirely."table"(entity alias)"<cluster>_table_search_index""dataAsset"/"all"(compound)"<cluster>_dataAsset""table_search_index"(canonical)"<cluster>_table_search_index""<cluster>_table_search_index""table,"/","That's the entire change. No new query params, no schema changes, no method signatures touched. Every existing caller of
getIndexOrAliasNamepicks up the new behavior:/v1/search/query,/v1/search/export,/v1/search/preview,/v1/search/nlq/query— resource layer already pre-resolves via this method./v1/search/aggregate(GET + POST),/v1/search/fieldQuery,/v1/search/entityTypeCounts— ES/OS managers already resolve via this method.Why no flag mechanism
A previous draft of this fix (PR #27762,
fix-cluster-aliasingbranch) addedfetchParentsAliases/fetchChildAliasesflags so callers could opt into selective expansion. Audit of every caller showed nobody actually needs that — UI sites withSearchIndex.TABLE/TOPIC/ etc. want only that type; UI sites that need mixed entity types useSearchIndex.ALL/SearchIndex.DATA_ASSET(compound aliases, unchanged by this fix); internal Java callers (RBAC, propagation,DataInsightSystemChartRepository) pass entity-specific aliases for entity-specific operations. Adding a flag for a use case nobody has just enlarges the API surface.Test plan
SearchRepositoryBehaviorTestpin: entity-alias → canonical resolution; compound-alias passthrough; idempotent already-prefixed token; comma-separated independent resolution; empty-token / all-empty handling; existingindexNameHelpersRespectClusterAliastest (canonical-name input) unchanged.clusterAliasconfigured:?index=tableresolves to<tenant>_table_search_indexexactly once (no double-prefix).🤖 Generated with Claude Code