Fix: Resolve text fields to .keyword for ES/OS sorting and aggregation #27103
Fix: Resolve text fields to .keyword for ES/OS sorting and aggregation #27103mohityadav766 merged 7 commits intomainfrom
Conversation
Text fields like `name` and `displayName` are mapped as `text` type in ES/OS indexes (for full-text search) with `.keyword` sub-fields for sorting and aggregation. When these bare field names were passed to sort or terms aggregation builders, ES/OS rejected them with `illegal_argument_exception` because text fields cannot be used for per-document field data operations. Add centralized `resolveFieldForSortOrAggregation()` in SearchSourceBuilderFactory that converts known text fields to their `.keyword` sub-fields, and apply it across both search managers and aggregation builders (terms, top_hits) for both ES and OpenSearch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
...tadata-service/src/main/java/org/openmetadata/service/search/SearchSourceBuilderFactory.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
This PR addresses search failures caused by sorting/aggregating on text fields (notably name) by resolving sort/aggregation field names to appropriate keyword sub-fields and applying existing owner-field remaps consistently across Elasticsearch and OpenSearch implementations.
Changes:
- Add
SearchSourceBuilderFactory.resolveFieldForSortOrAggregation(...)and unit tests to enforce field resolution rules. - Update Elasticsearch/OpenSearch aggregation builders to use the new resolver (instead of only the owner remap).
- Update Elasticsearch/OpenSearch search managers to resolve sort fields (e.g.,
name→name.keyword) before applying sorting.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-service/src/test/java/org/openmetadata/service/search/SearchFieldResolutionTest.java | Adds unit coverage for the new sort/aggregation field resolution logic. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchSourceBuilderFactory.java | Introduces resolveFieldForSortOrAggregation and a small allowlist for root text fields requiring .keyword. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSourceBuilderFactory.java | Uses the resolver for configured terms aggregations. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java | Resolves sort field before sorting and adjusts unmapped type handling. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchAggregationManager.java | Resolves aggregation field before execution. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/aggregations/OpenTopHitsAggregations.java | Resolves sort_field param for top-hits aggregation. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/aggregations/OpenTermsAggregations.java | Resolves field param for terms aggregation. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSourceBuilderFactory.java | Uses the resolver for configured terms aggregations. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java | Resolves sort field before sorting and adjusts unmapped type handling. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchAggregationManager.java | Resolves aggregation field before execution. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/aggregations/ElasticTopHitsAggregations.java | Resolves sort_field param for top-hits aggregation. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/aggregations/ElasticTermsAggregations.java | Resolves field param for terms aggregation. |
.../src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java
Outdated
Show resolved
Hide resolved
...ervice/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java
Outdated
Show resolved
Hide resolved
...tadata-service/src/main/java/org/openmetadata/service/search/SearchSourceBuilderFactory.java
Show resolved
Hide resolved
- Support nested text fields (e.g. columns.name → columns.name.keyword) by extracting the leaf segment for whitelist lookup instead of skipping all dotted paths - Fix unmappedType="integer" for remapped owner fields (ownerDisplayName, ownerName) by introducing KEYWORD_SORT_FIELDS set used alongside the .keyword suffix check in both search managers - Remove dead remapAggregationField() method (no callers remained) - Add 2 new test cases: nested text field resolution and flat keyword sort field passthrough Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
...tadata-service/src/main/java/org/openmetadata/service/search/SearchSourceBuilderFactory.java
Outdated
Show resolved
Hide resolved
| String remapped = AGGREGATION_FIELD_REMAPS.getOrDefault(field, field); | ||
| if (!remapped.equals(field)) { | ||
| return remapped; | ||
| } | ||
| if (field.startsWith("_")) { | ||
| return field; | ||
| } | ||
| if (field.endsWith(".keyword")) { | ||
| return field; | ||
| } | ||
| String leaf = field.contains(".") ? field.substring(field.lastIndexOf('.') + 1) : field; | ||
| if (TEXT_FIELDS_WITH_KEYWORD.contains(leaf)) { | ||
| return field + ".keyword"; | ||
| } |
There was a problem hiding this comment.
resolveFieldForSortOrAggregation appends .keyword based solely on the leaf segment (name/displayName). That will also rewrite fields that are not text (e.g., nested reference fields like owners.name / service.name are mapped as keyword in some indices and may not have a .keyword multi-field), causing sort/aggregation to reference a non-existent field and reintroduce illegal_argument_exception. Consider switching from a leaf-based rule to an explicit allowlist of known text paths (e.g., name, displayName, columns.name, parent.displayName, etc.) and/or performing remapping/keyword-qualification in a way that cannot produce a field that doesn’t exist across supported index mappings.
| @Test | ||
| void preservesFlatKeywordSortFields() { | ||
| assertEquals( | ||
| "ownerDisplayName", | ||
| SearchSourceBuilderFactory.resolveFieldForSortOrAggregation("ownerDisplayName")); | ||
| assertEquals( | ||
| "ownerName", SearchSourceBuilderFactory.resolveFieldForSortOrAggregation("ownerName")); | ||
| } | ||
|
|
||
| @Test | ||
| void remapsOwnerFields() { | ||
| assertEquals( | ||
| "ownerDisplayName", | ||
| SearchSourceBuilderFactory.resolveFieldForSortOrAggregation("owners.displayName.keyword")); | ||
| assertEquals( | ||
| "ownerName", | ||
| SearchSourceBuilderFactory.resolveFieldForSortOrAggregation("owners.name.keyword")); | ||
| } |
There was a problem hiding this comment.
The new resolution logic can change dotted paths whose leaf is name/displayName (e.g., owners.name, service.name). The current tests only cover the .keyword-qualified owner paths and a couple of known text fields, but don’t assert what should happen for common keyword reference fields without .keyword. Adding explicit test cases for inputs like owners.name / owners.displayName (and potentially service.name) would help prevent regressions where the resolver generates a non-existent *.keyword field for indices that don’t define that multi-field.
Code Review ✅ Approved 2 resolved / 2 findingsResolves text fields to .keyword for Elasticsearch/OpenSearch sorting and aggregation, addressing nested text field bypass issues and owner field remap gaps. No open issues remain. ✅ 2 resolved✅ Edge Case: Nested text fields bypass .keyword resolution due to dot check
✅ Edge Case: Owner field remap missed when input lacks .keyword suffix
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
🟡 Playwright Results — all passed (21 flaky)✅ 3597 passed · ❌ 0 failed · 🟡 21 flaky · ⏭️ 207 skipped
🟡 21 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
|
Failed to cherry-pick changes to the 1.12.5 branch. |
…and aggregation (#27103) * Fix: Resolve text fields to .keyword for ES/OS sorting and aggregation Text fields like `name` and `displayName` are mapped as `text` type in ES/OS indexes (for full-text search) with `.keyword` sub-fields for sorting and aggregation. When these bare field names were passed to sort or terms aggregation builders, ES/OS rejected them with `illegal_argument_exception` because text fields cannot be used for per-document field data operations. Add centralized `resolveFieldForSortOrAggregation()` in SearchSourceBuilderFactory that converts known text fields to their `.keyword` sub-fields, and apply it across both search managers and aggregation builders (terms, top_hits) for both ES and OpenSearch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix: Address code review comments on resolveFieldForSortOrAggregation - Support nested text fields (e.g. columns.name → columns.name.keyword) by extracting the leaf segment for whitelist lookup instead of skipping all dotted paths - Fix unmappedType="integer" for remapped owner fields (ownerDisplayName, ownerName) by introducing KEYWORD_SORT_FIELDS set used alongside the .keyword suffix check in both search managers - Remove dead remapAggregationField() method (no callers remained) - Add 2 new test cases: nested text field resolution and flat keyword sort field passthrough Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix: Address round 2 review comments on resolveFieldForSortOrAggregation - Expand AGGREGATION_FIELD_REMAPS to cover bare owner paths (owners.name, owners.displayName) so they remap to ownerName/ownerDisplayName instead of incorrectly gaining a .keyword suffix - Replace leaf-based extraction with exact-path matching so only root-level name/displayName fields get .keyword appended; dotted paths like service.name or columns.name now pass through unchanged - Remove convertsNestedTextFieldsToKeyword test (no longer valid behavior) - Add bare owner remap tests and doesNotAppendKeywordToNestedTextPaths test
Reverted : Cherry Pick : Fix: Resolve text fields to .keyword for ES/OS sorting and aggregation (#27103)
… case search indexing Commit 2839bc2 cherry-picked PR #27153 ("Improve memory usage for reindex") to 1.12.5, bundled with the revert of PR #27103. PR #27153 is still open/unmerged on main and introduced 6 extra file changes that were never meant for 1.12.5. The bulk sink change from `toJsonData(json_string)` to `toJsonData(map_object)` bypassed Jackson's WRITE_DATES_AS_TIMESTAMPS=false setting. During reindex, java.util.Date fields (like tags.appliedAt) were sent as raw epoch Longs instead of ISO strings. OpenSearch dynamically mapped tags.appliedAt as "long". Later, real-time indexing (test case creation via API) sent appliedAt as ISO string — OpenSearch rejected it with mapper_parsing_exception. Test cases with tags were created in DB but silently never indexed in search. This restores all 6 files to their pre-2839bc259f state, matching main. Files restored: - ElasticSearchBulkSink.java — back to pojoToJson → string → toJsonData(string) - OpenSearchBulkSink.java — same - EsUtils.java — removed toJsonData(Object) overload - OsUtils.java — same - SearchIndexExecutor.java — removed contextDataCache, Thread.MIN_PRIORITY - ReindexingMetrics.java — removed counter caching Requires reindex after deployment.
…e test case search indexing Commit 2839bc2 cherry-picked PR #27153 ("Improve memory usage for reindex") to 1.12.5, bundled with the revert of PR #27103. PR #27153 is still open/unmerged on main and introduced extra file changes that were never meant for 1.12.5. The bulk sink change from toJsonData(json_string) to toJsonData(map_object) bypassed Jackson's WRITE_DATES_AS_TIMESTAMPS=false setting. During reindex, java.util.Date fields (like tags.appliedAt) were sent as raw epoch Longs instead of ISO strings. OpenSearch dynamically mapped tags.appliedAt as "long". Later, real-time indexing (test case creation via API) sent appliedAt as ISO string — OpenSearch rejected it with mapper_parsing_exception. Test cases with tags were created in DB but silently never indexed in search. Changes: - ElasticSearchBulkSink/OpenSearchBulkSink: reverted serialization back to pojoToJson → string → toJsonData(string), kept Thread.MIN_PRIORITY - EsUtils/OsUtils: removed toJsonData(Object) overload - SearchIndexExecutor: removed contextDataCache, kept Thread.MIN_PRIORITY - ReindexingMetrics: removed counter caching Requires reindex after deployment.
…e test case search indexing (#27202) Commit 2839bc2 cherry-picked PR #27153 ("Improve memory usage for reindex") to 1.12.5, bundled with the revert of PR #27103. PR #27153 is still open/unmerged on main and introduced extra file changes that were never meant for 1.12.5. The bulk sink change from toJsonData(json_string) to toJsonData(map_object) bypassed Jackson's WRITE_DATES_AS_TIMESTAMPS=false setting. During reindex, java.util.Date fields (like tags.appliedAt) were sent as raw epoch Longs instead of ISO strings. OpenSearch dynamically mapped tags.appliedAt as "long". Later, real-time indexing (test case creation via API) sent appliedAt as ISO string — OpenSearch rejected it with mapper_parsing_exception. Test cases with tags were created in DB but silently never indexed in search. Changes: - ElasticSearchBulkSink/OpenSearchBulkSink: reverted serialization back to pojoToJson → string → toJsonData(string), kept Thread.MIN_PRIORITY - EsUtils/OsUtils: removed toJsonData(Object) overload - SearchIndexExecutor: removed contextDataCache, kept Thread.MIN_PRIORITY - ReindexingMetrics: removed counter caching Requires reindex after deployment.
open-metadata#27103) * Fix: Resolve text fields to .keyword for ES/OS sorting and aggregation Text fields like `name` and `displayName` are mapped as `text` type in ES/OS indexes (for full-text search) with `.keyword` sub-fields for sorting and aggregation. When these bare field names were passed to sort or terms aggregation builders, ES/OS rejected them with `illegal_argument_exception` because text fields cannot be used for per-document field data operations. Add centralized `resolveFieldForSortOrAggregation()` in SearchSourceBuilderFactory that converts known text fields to their `.keyword` sub-fields, and apply it across both search managers and aggregation builders (terms, top_hits) for both ES and OpenSearch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix: Address code review comments on resolveFieldForSortOrAggregation - Support nested text fields (e.g. columns.name → columns.name.keyword) by extracting the leaf segment for whitelist lookup instead of skipping all dotted paths - Fix unmappedType="integer" for remapped owner fields (ownerDisplayName, ownerName) by introducing KEYWORD_SORT_FIELDS set used alongside the .keyword suffix check in both search managers - Remove dead remapAggregationField() method (no callers remained) - Add 2 new test cases: nested text field resolution and flat keyword sort field passthrough Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix: Address round 2 review comments on resolveFieldForSortOrAggregation - Expand AGGREGATION_FIELD_REMAPS to cover bare owner paths (owners.name, owners.displayName) so they remap to ownerName/ownerDisplayName instead of incorrectly gaining a .keyword suffix - Replace leaf-based extraction with exact-path matching so only root-level name/displayName fields get .keyword appended; dotted paths like service.name or columns.name now pass through unchanged - Remove convertsNestedTextFieldsToKeyword test (no longer valid behavior) - Add bare owner remap tests and doesNotAppendKeywordToNestedTextPaths test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
open-metadata#27103) * Fix: Resolve text fields to .keyword for ES/OS sorting and aggregation Text fields like `name` and `displayName` are mapped as `text` type in ES/OS indexes (for full-text search) with `.keyword` sub-fields for sorting and aggregation. When these bare field names were passed to sort or terms aggregation builders, ES/OS rejected them with `illegal_argument_exception` because text fields cannot be used for per-document field data operations. Add centralized `resolveFieldForSortOrAggregation()` in SearchSourceBuilderFactory that converts known text fields to their `.keyword` sub-fields, and apply it across both search managers and aggregation builders (terms, top_hits) for both ES and OpenSearch. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix: Address code review comments on resolveFieldForSortOrAggregation - Support nested text fields (e.g. columns.name → columns.name.keyword) by extracting the leaf segment for whitelist lookup instead of skipping all dotted paths - Fix unmappedType="integer" for remapped owner fields (ownerDisplayName, ownerName) by introducing KEYWORD_SORT_FIELDS set used alongside the .keyword suffix check in both search managers - Remove dead remapAggregationField() method (no callers remained) - Add 2 new test cases: nested text field resolution and flat keyword sort field passthrough Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix: Address round 2 review comments on resolveFieldForSortOrAggregation - Expand AGGREGATION_FIELD_REMAPS to cover bare owner paths (owners.name, owners.displayName) so they remap to ownerName/ownerDisplayName instead of incorrectly gaining a .keyword suffix - Replace leaf-based extraction with exact-path matching so only root-level name/displayName fields get .keyword appended; dotted paths like service.name or columns.name now pass through unchanged - Remove convertsNestedTextFieldsToKeyword test (no longer valid behavior) - Add bare owner remap tests and doesNotAppendKeywordToNestedTextPaths test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>



Summary
Fixes
SearchException: illegal_argument_exceptionwhen sorting or aggregating on text fields likenamein Elasticsearch/OpenSearch. The root cause was that bare text field names (e.g.,name) were passed directly to ES/OS sort and aggregation builders instead of using their.keywordsub-fields (e.g.,name.keyword).Changes
resolveFieldForSortOrAggregation()inSearchSourceBuilderFactory— a centralized utility that converts known text fields (name,displayName) to their.keywordsub-fields, while preserving ES internal fields (_score,_key), dotted paths, and numeric/date fieldsOpenSearchSearchManagerandElasticSearchSearchManager, also fixing theunmappedTypehint to use"keyword"instead of"integer"for keyword fieldsOpenTermsAggregations,ElasticTermsAggregations,OpenTopHitsAggregations,ElasticTopHitsAggregationsremapAggregationField()to the newresolveFieldForSortOrAggregation()in both source builder factories and aggregation managersSearchFieldResolutionTest) with 7 test cases covering text field conversion, keyword passthrough, ES special fields, dotted paths, owner field remapping, numeric fields, and null/empty handlingWhy
ES/OS index mappings define
nameastype: "text"(with akeywordsub-field) to support full-text search. Text fields cannot be used for sorting or aggregation withoutfielddata=true. The fix ensures all sort and aggregation code paths use the.keywordsub-field automatically.Test Plan
SearchFieldResolutionTest)mvn spotless:applycleannameno longer throwsSearchException🤖 Generated with Claude Code