Skip to content

Fix: Resolve text fields to .keyword for ES/OS sorting and aggregation #27103

Merged
mohityadav766 merged 7 commits intomainfrom
vk/5de9-issue-in-search
Apr 7, 2026
Merged

Fix: Resolve text fields to .keyword for ES/OS sorting and aggregation #27103
mohityadav766 merged 7 commits intomainfrom
vk/5de9-issue-in-search

Conversation

@mohityadav766
Copy link
Copy Markdown
Member

@mohityadav766 mohityadav766 commented Apr 6, 2026

Summary

Fixes SearchException: illegal_argument_exception when sorting or aggregating on text fields like name in Elasticsearch/OpenSearch. The root cause was that bare text field names (e.g., name) were passed directly to ES/OS sort and aggregation builders instead of using their .keyword sub-fields (e.g., name.keyword).

Changes

  • Added resolveFieldForSortOrAggregation() in SearchSourceBuilderFactory — a centralized utility that converts known text fields (name, displayName) to their .keyword sub-fields, while preserving ES internal fields (_score, _key), dotted paths, and numeric/date fields
  • Applied field resolution to the sort path in both OpenSearchSearchManager and ElasticSearchSearchManager, also fixing the unmappedType hint to use "keyword" instead of "integer" for keyword fields
  • Applied field resolution to aggregation buildersOpenTermsAggregations, ElasticTermsAggregations, OpenTopHitsAggregations, ElasticTopHitsAggregations
  • Migrated existing callers of remapAggregationField() to the new resolveFieldForSortOrAggregation() in both source builder factories and aggregation managers
  • Added unit tests (SearchFieldResolutionTest) with 7 test cases covering text field conversion, keyword passthrough, ES special fields, dotted paths, owner field remapping, numeric fields, and null/empty handling

Why

ES/OS index mappings define name as type: "text" (with a keyword sub-field) to support full-text search. Text fields cannot be used for sorting or aggregation without fielddata=true. The fix ensures all sort and aggregation code paths use the .keyword sub-field automatically.

Test Plan

  • All 7 unit tests pass (SearchFieldResolutionTest)
  • mvn spotless:apply clean
  • Integration tests with ES/OS to verify sorting by name no longer throws SearchException

🤖 Generated with Claude Code

Text fields like `name` and `displayName` are mapped as `text` type in
ES/OS indexes (for full-text search) with `.keyword` sub-fields for
sorting and aggregation. When these bare field names were passed to sort
or terms aggregation builders, ES/OS rejected them with
`illegal_argument_exception` because text fields cannot be used for
per-document field data operations.

Add centralized `resolveFieldForSortOrAggregation()` in
SearchSourceBuilderFactory that converts known text fields to their
`.keyword` sub-fields, and apply it across both search managers and
aggregation builders (terms, top_hits) for both ES and OpenSearch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 6, 2026 15:58
@github-actions github-actions bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 6, 2026
@mohityadav766 mohityadav766 changed the title Issue in Search for Text Field (vibe-kanban) Fix: Resolve text fields to .keyword for ES/OS sorting and aggregation (Vibe Kanban) Apr 6, 2026
@mohityadav766 mohityadav766 changed the title Fix: Resolve text fields to .keyword for ES/OS sorting and aggregation (Vibe Kanban) Fix: Resolve text fields to .keyword for ES/OS sorting and aggregation Apr 6, 2026
@mohityadav766 mohityadav766 self-assigned this Apr 6, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses search failures caused by sorting/aggregating on text fields (notably name) by resolving sort/aggregation field names to appropriate keyword sub-fields and applying existing owner-field remaps consistently across Elasticsearch and OpenSearch implementations.

Changes:

  • Add SearchSourceBuilderFactory.resolveFieldForSortOrAggregation(...) and unit tests to enforce field resolution rules.
  • Update Elasticsearch/OpenSearch aggregation builders to use the new resolver (instead of only the owner remap).
  • Update Elasticsearch/OpenSearch search managers to resolve sort fields (e.g., namename.keyword) before applying sorting.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
openmetadata-service/src/test/java/org/openmetadata/service/search/SearchFieldResolutionTest.java Adds unit coverage for the new sort/aggregation field resolution logic.
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchSourceBuilderFactory.java Introduces resolveFieldForSortOrAggregation and a small allowlist for root text fields requiring .keyword.
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSourceBuilderFactory.java Uses the resolver for configured terms aggregations.
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchSearchManager.java Resolves sort field before sorting and adjusts unmapped type handling.
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/OpenSearchAggregationManager.java Resolves aggregation field before execution.
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/aggregations/OpenTopHitsAggregations.java Resolves sort_field param for top-hits aggregation.
openmetadata-service/src/main/java/org/openmetadata/service/search/opensearch/aggregations/OpenTermsAggregations.java Resolves field param for terms aggregation.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSourceBuilderFactory.java Uses the resolver for configured terms aggregations.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchSearchManager.java Resolves sort field before sorting and adjusts unmapped type handling.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchAggregationManager.java Resolves aggregation field before execution.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/aggregations/ElasticTopHitsAggregations.java Resolves sort_field param for top-hits aggregation.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/aggregations/ElasticTermsAggregations.java Resolves field param for terms aggregation.

- Support nested text fields (e.g. columns.name → columns.name.keyword)
  by extracting the leaf segment for whitelist lookup instead of skipping
  all dotted paths
- Fix unmappedType="integer" for remapped owner fields (ownerDisplayName,
  ownerName) by introducing KEYWORD_SORT_FIELDS set used alongside the
  .keyword suffix check in both search managers
- Remove dead remapAggregationField() method (no callers remained)
- Add 2 new test cases: nested text field resolution and flat keyword
  sort field passthrough

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 6, 2026 16:40
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Comment on lines +267 to +280
String remapped = AGGREGATION_FIELD_REMAPS.getOrDefault(field, field);
if (!remapped.equals(field)) {
return remapped;
}
if (field.startsWith("_")) {
return field;
}
if (field.endsWith(".keyword")) {
return field;
}
String leaf = field.contains(".") ? field.substring(field.lastIndexOf('.') + 1) : field;
if (TEXT_FIELDS_WITH_KEYWORD.contains(leaf)) {
return field + ".keyword";
}
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolveFieldForSortOrAggregation appends .keyword based solely on the leaf segment (name/displayName). That will also rewrite fields that are not text (e.g., nested reference fields like owners.name / service.name are mapped as keyword in some indices and may not have a .keyword multi-field), causing sort/aggregation to reference a non-existent field and reintroduce illegal_argument_exception. Consider switching from a leaf-based rule to an explicit allowlist of known text paths (e.g., name, displayName, columns.name, parent.displayName, etc.) and/or performing remapping/keyword-qualification in a way that cannot produce a field that doesn’t exist across supported index mappings.

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +88
@Test
void preservesFlatKeywordSortFields() {
assertEquals(
"ownerDisplayName",
SearchSourceBuilderFactory.resolveFieldForSortOrAggregation("ownerDisplayName"));
assertEquals(
"ownerName", SearchSourceBuilderFactory.resolveFieldForSortOrAggregation("ownerName"));
}

@Test
void remapsOwnerFields() {
assertEquals(
"ownerDisplayName",
SearchSourceBuilderFactory.resolveFieldForSortOrAggregation("owners.displayName.keyword"));
assertEquals(
"ownerName",
SearchSourceBuilderFactory.resolveFieldForSortOrAggregation("owners.name.keyword"));
}
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new resolution logic can change dotted paths whose leaf is name/displayName (e.g., owners.name, service.name). The current tests only cover the .keyword-qualified owner paths and a couple of known text fields, but don’t assert what should happen for common keyword reference fields without .keyword. Adding explicit test cases for inputs like owners.name / owners.displayName (and potentially service.name) would help prevent regressions where the resolver generates a non-existent *.keyword field for indices that don’t define that multi-field.

Copilot uses AI. Check for mistakes.
@mohityadav766 mohityadav766 added the To release Will cherry-pick this PR into the release branch label Apr 6, 2026
Copilot AI review requested due to automatic review settings April 6, 2026 20:34
@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Apr 6, 2026

Code Review ✅ Approved 2 resolved / 2 findings

Resolves text fields to .keyword for Elasticsearch/OpenSearch sorting and aggregation, addressing nested text field bypass issues and owner field remap gaps. No open issues remain.

✅ 2 resolved
Edge Case: Nested text fields bypass .keyword resolution due to dot check

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchSourceBuilderFactory.java:267-269 📄 openmetadata-service/src/test/java/org/openmetadata/service/search/SearchFieldResolutionTest.java:49-59
In resolveFieldForSortOrAggregation, the early return at line 267 (if (field.startsWith("_") || field.contains("."))) prevents .keyword resolution for any dotted path. If a caller ever passes a nested text field like columns.name or parent.displayName for sorting/aggregation, it would fail with the same fielddata error the PR aims to fix.

This is currently safe because the known problematic fields (name, displayName) are root-level, but the method name suggests general-purpose usage and the Javadoc doesn't document this limitation.

Edge Case: Owner field remap missed when input lacks .keyword suffix

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchSourceBuilderFactory.java:232-235 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchSourceBuilderFactory.java:263-277
When owners.displayName or owners.name is passed (without the .keyword suffix), the remap check on line 267-269 does not match because AGGREGATION_FIELD_REMAPS only contains keys with .keyword (owners.displayName.keyword, owners.name.keyword). The function then falls through to the leaf-name check and appends .keyword, producing owners.displayName.keyword — a nested field path — instead of the intended flat denormalized field ownerDisplayName.

This means aggregations using the nested path will either fail (if a nested context isn't set up) or return different results than intended.

Callers in tests already pass the bare form (e.g. owners.displayName in SearchSourceBuilderFactoryTest:291, owners.name in DefaultInheritedFieldEntitySearchTest:311,331,339).

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 6, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

🟡 Playwright Results — all passed (21 flaky)

✅ 3597 passed · ❌ 0 failed · 🟡 21 flaky · ⏭️ 207 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 454 0 3 2
🟡 Shard 2 640 0 2 32
🟡 Shard 3 646 0 5 26
🟡 Shard 4 617 0 5 47
🟡 Shard 5 606 0 1 67
🟡 Shard 6 634 0 5 33
🟡 21 flaky test(s) (passed on retry)
  • Features/DataAssetRulesDisabled.spec.ts › Verify the Messaging Service entity item action after rules disabled (shard 1, 1 retry)
  • Features/CustomizeDetailPage.spec.ts › Search Index - customization should work (shard 1, 1 retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/DataQuality/BundleSuiteBulkOperations.spec.ts › Bulk selection operations (shard 2, 1 retry)
  • Features/Permissions/DataProductPermissions.spec.ts › Data Product allow operations (shard 3, 1 retry)
  • Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
  • Features/Table.spec.ts › Tags term should be consistent for search (shard 3, 1 retry)
  • Flow/ExploreDiscovery.spec.ts › Should display deleted assets when showDeleted is checked and deleted is not present in queryFilter (shard 3, 1 retry)
  • Flow/PersonaDeletionUserProfile.spec.ts › User profile loads correctly before and after persona deletion (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Directory (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Database Schema (shard 4, 1 retry)
  • Pages/DataContractsSemanticRules.spec.ts › Validate Owner Rule Is_Set (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Rename domain with tags and glossary terms preserves associations (shard 4, 1 retry)
  • Pages/ExploreTree.spec.ts › Verify Database and Database Schema available in explore tree (shard 5, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/ProfilerConfigurationPage.spec.ts › Non admin user (shard 6, 1 retry)
  • Pages/ServiceEntity.spec.ts › Announcement create, edit & delete (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Admin user can edit teams from the user profile (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@mohityadav766 mohityadav766 merged commit 35fbefa into main Apr 7, 2026
53 of 54 checks passed
@mohityadav766 mohityadav766 deleted the vk/5de9-issue-in-search branch April 7, 2026 02:41
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Failed to cherry-pick changes to the 1.12.5 branch.
Please cherry-pick the changes manually.
You can find more details here.

mohityadav766 added a commit that referenced this pull request Apr 7, 2026
…and aggregation  (#27103)

              * Fix: Resolve text fields to .keyword for ES/OS sorting and aggregation

              Text fields like `name` and `displayName` are mapped as `text` type in
              ES/OS indexes (for full-text search) with `.keyword` sub-fields for
              sorting and aggregation. When these bare field names were passed to sort
              or terms aggregation builders, ES/OS rejected them with
              `illegal_argument_exception` because text fields cannot be used for
              per-document field data operations.

              Add centralized `resolveFieldForSortOrAggregation()` in
              SearchSourceBuilderFactory that converts known text fields to their
              `.keyword` sub-fields, and apply it across both search managers and
              aggregation builders (terms, top_hits) for both ES and OpenSearch.

              Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

              * Fix: Address code review comments on resolveFieldForSortOrAggregation

              - Support nested text fields (e.g. columns.name → columns.name.keyword)
                by extracting the leaf segment for whitelist lookup instead of skipping
                all dotted paths
              - Fix unmappedType="integer" for remapped owner fields (ownerDisplayName,
                ownerName) by introducing KEYWORD_SORT_FIELDS set used alongside the
                .keyword suffix check in both search managers
              - Remove dead remapAggregationField() method (no callers remained)
              - Add 2 new test cases: nested text field resolution and flat keyword
                sort field passthrough

              Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

              * Fix: Address round 2 review comments on resolveFieldForSortOrAggregation

              - Expand AGGREGATION_FIELD_REMAPS to cover bare owner paths (owners.name,
                owners.displayName) so they remap to ownerName/ownerDisplayName instead
                of incorrectly gaining a .keyword suffix
              - Replace leaf-based extraction with exact-path matching so only root-level
                name/displayName fields get .keyword appended; dotted paths like
                service.name or columns.name now pass through unchanged
              - Remove convertsNestedTextFieldsToKeyword test (no longer valid behavior)
              - Add bare owner remap tests and doesNotAppendKeywordToNestedTextPaths test
mohityadav766 added a commit that referenced this pull request Apr 8, 2026
Reverted : Cherry Pick : Fix: Resolve text fields to .keyword for ES/OS sorting and aggregation  (#27103)
sonika-shah added a commit that referenced this pull request Apr 9, 2026
… case search indexing

Commit 2839bc2 cherry-picked PR #27153 ("Improve memory usage for
reindex") to 1.12.5, bundled with the revert of PR #27103. PR #27153
is still open/unmerged on main and introduced 6 extra file changes
that were never meant for 1.12.5.

The bulk sink change from `toJsonData(json_string)` to
`toJsonData(map_object)` bypassed Jackson's WRITE_DATES_AS_TIMESTAMPS=false
setting. During reindex, java.util.Date fields (like tags.appliedAt)
were sent as raw epoch Longs instead of ISO strings. OpenSearch
dynamically mapped tags.appliedAt as "long". Later, real-time indexing
(test case creation via API) sent appliedAt as ISO string — OpenSearch
rejected it with mapper_parsing_exception. Test cases with tags were
created in DB but silently never indexed in search.

This restores all 6 files to their pre-2839bc259f state, matching main.

Files restored:
- ElasticSearchBulkSink.java — back to pojoToJson → string → toJsonData(string)
- OpenSearchBulkSink.java — same
- EsUtils.java — removed toJsonData(Object) overload
- OsUtils.java — same
- SearchIndexExecutor.java — removed contextDataCache, Thread.MIN_PRIORITY
- ReindexingMetrics.java — removed counter caching

Requires reindex after deployment.
sonika-shah added a commit that referenced this pull request Apr 9, 2026
…e test case search indexing

Commit 2839bc2 cherry-picked PR #27153 ("Improve memory usage for
reindex") to 1.12.5, bundled with the revert of PR #27103. PR #27153
is still open/unmerged on main and introduced extra file changes
that were never meant for 1.12.5.

The bulk sink change from toJsonData(json_string) to
toJsonData(map_object) bypassed Jackson's WRITE_DATES_AS_TIMESTAMPS=false
setting. During reindex, java.util.Date fields (like tags.appliedAt)
were sent as raw epoch Longs instead of ISO strings. OpenSearch
dynamically mapped tags.appliedAt as "long". Later, real-time indexing
(test case creation via API) sent appliedAt as ISO string — OpenSearch
rejected it with mapper_parsing_exception. Test cases with tags were
created in DB but silently never indexed in search.

Changes:
- ElasticSearchBulkSink/OpenSearchBulkSink: reverted serialization
  back to pojoToJson → string → toJsonData(string), kept Thread.MIN_PRIORITY
- EsUtils/OsUtils: removed toJsonData(Object) overload
- SearchIndexExecutor: removed contextDataCache, kept Thread.MIN_PRIORITY
- ReindexingMetrics: removed counter caching

Requires reindex after deployment.
mohityadav766 pushed a commit that referenced this pull request Apr 9, 2026
…e test case search indexing (#27202)

Commit 2839bc2 cherry-picked PR #27153 ("Improve memory usage for
reindex") to 1.12.5, bundled with the revert of PR #27103. PR #27153
is still open/unmerged on main and introduced extra file changes
that were never meant for 1.12.5.

The bulk sink change from toJsonData(json_string) to
toJsonData(map_object) bypassed Jackson's WRITE_DATES_AS_TIMESTAMPS=false
setting. During reindex, java.util.Date fields (like tags.appliedAt)
were sent as raw epoch Longs instead of ISO strings. OpenSearch
dynamically mapped tags.appliedAt as "long". Later, real-time indexing
(test case creation via API) sent appliedAt as ISO string — OpenSearch
rejected it with mapper_parsing_exception. Test cases with tags were
created in DB but silently never indexed in search.

Changes:
- ElasticSearchBulkSink/OpenSearchBulkSink: reverted serialization
  back to pojoToJson → string → toJsonData(string), kept Thread.MIN_PRIORITY
- EsUtils/OsUtils: removed toJsonData(Object) overload
- SearchIndexExecutor: removed contextDataCache, kept Thread.MIN_PRIORITY
- ReindexingMetrics: removed counter caching

Requires reindex after deployment.
SaaiAravindhRaja pushed a commit to SaaiAravindhRaja/OpenMetadata that referenced this pull request Apr 12, 2026
open-metadata#27103)

* Fix: Resolve text fields to .keyword for ES/OS sorting and aggregation

Text fields like `name` and `displayName` are mapped as `text` type in
ES/OS indexes (for full-text search) with `.keyword` sub-fields for
sorting and aggregation. When these bare field names were passed to sort
or terms aggregation builders, ES/OS rejected them with
`illegal_argument_exception` because text fields cannot be used for
per-document field data operations.

Add centralized `resolveFieldForSortOrAggregation()` in
SearchSourceBuilderFactory that converts known text fields to their
`.keyword` sub-fields, and apply it across both search managers and
aggregation builders (terms, top_hits) for both ES and OpenSearch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix: Address code review comments on resolveFieldForSortOrAggregation

- Support nested text fields (e.g. columns.name → columns.name.keyword)
  by extracting the leaf segment for whitelist lookup instead of skipping
  all dotted paths
- Fix unmappedType="integer" for remapped owner fields (ownerDisplayName,
  ownerName) by introducing KEYWORD_SORT_FIELDS set used alongside the
  .keyword suffix check in both search managers
- Remove dead remapAggregationField() method (no callers remained)
- Add 2 new test cases: nested text field resolution and flat keyword
  sort field passthrough

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix: Address round 2 review comments on resolveFieldForSortOrAggregation

- Expand AGGREGATION_FIELD_REMAPS to cover bare owner paths (owners.name,
  owners.displayName) so they remap to ownerName/ownerDisplayName instead
  of incorrectly gaining a .keyword suffix
- Replace leaf-based extraction with exact-path matching so only root-level
  name/displayName fields get .keyword appended; dotted paths like
  service.name or columns.name now pass through unchanged
- Remove convertsNestedTextFieldsToKeyword test (no longer valid behavior)
- Add bare owner remap tests and doesNotAppendKeywordToNestedTextPaths test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
SaaiAravindhRaja pushed a commit to SaaiAravindhRaja/OpenMetadata that referenced this pull request Apr 12, 2026
open-metadata#27103)

* Fix: Resolve text fields to .keyword for ES/OS sorting and aggregation

Text fields like `name` and `displayName` are mapped as `text` type in
ES/OS indexes (for full-text search) with `.keyword` sub-fields for
sorting and aggregation. When these bare field names were passed to sort
or terms aggregation builders, ES/OS rejected them with
`illegal_argument_exception` because text fields cannot be used for
per-document field data operations.

Add centralized `resolveFieldForSortOrAggregation()` in
SearchSourceBuilderFactory that converts known text fields to their
`.keyword` sub-fields, and apply it across both search managers and
aggregation builders (terms, top_hits) for both ES and OpenSearch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix: Address code review comments on resolveFieldForSortOrAggregation

- Support nested text fields (e.g. columns.name → columns.name.keyword)
  by extracting the leaf segment for whitelist lookup instead of skipping
  all dotted paths
- Fix unmappedType="integer" for remapped owner fields (ownerDisplayName,
  ownerName) by introducing KEYWORD_SORT_FIELDS set used alongside the
  .keyword suffix check in both search managers
- Remove dead remapAggregationField() method (no callers remained)
- Add 2 new test cases: nested text field resolution and flat keyword
  sort field passthrough

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix: Address round 2 review comments on resolveFieldForSortOrAggregation

- Expand AGGREGATION_FIELD_REMAPS to cover bare owner paths (owners.name,
  owners.displayName) so they remap to ownerName/ownerDisplayName instead
  of incorrectly gaining a .keyword suffix
- Replace leaf-based extraction with exact-path matching so only root-level
  name/displayName fields get .keyword appended; dotted paths like
  service.name or columns.name now pass through unchanged
- Remove convertsNestedTextFieldsToKeyword test (no longer valid behavior)
- Add bare owner remap tests and doesNotAppendKeywordToNestedTextPaths test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants