feat: native Elasticsearch vector search support by joaopamaral · Pull Request #27111 · open-metadata/OpenMetadata

joaopamaral · 2026-04-07T00:05:47Z

Summary

Adds native Elasticsearch 8.x/9.x vector search support, mirroring the existing OpenSearch implementation. OpenMetadata deployments backed by Elasticsearch can now use the same semantic/vector search features as OpenSearch deployments.

Changes

ElasticSearchVectorService (new): ES implementation of VectorIndexService, using Rest5Client for generic HTTP requests. Mirrors OpenSearchVectorService structure.
vector_search_index_es_native.json (new, en/jp/ru/zh): ES-native index mappings using dense_vector / dims / cosine similarity (ES 8.x/9.x format, as opposed to OpenSearch's knn_vector / dimension / HNSW).
VectorSearchQueryBuilder.buildNativeESQuery(): emits the ES 8.x/9.x top-level knn query format (distinct from OpenSearch's nested query.knn). Reference: https://www.elastic.co/docs/solutions/search/vector/knn
SemanticSearchQueryBuilder for Elasticsearch package: mirrors the OpenSearch equivalent.
ElasticSearchIndexManager.extractMappingsJson(): extracts the mappings sub-object before calling putMapping — ES rejects full index JSON (with settings/aliases) at the mappings API.
reformatVectorIndexWithDimension(): handles both "dims" (ES native) and "dimension" (OpenSearch) keys so embedding dimension injection works for both backends.
SearchRepository / ElasticSearchBulkSink: wired to initialize and use ElasticSearchVectorService when ES backend is configured.
Tests: VectorSearchQueryBuilderTest, ElasticSearchIndexManagerTest, and new ElasticSearchVectorServiceTest.

Compatibility

OpenSearch path is unchanged — OpenSearchBulkSink / OpenSearchVectorService untouched.
No schema changes required for existing deployments.

Test plan

mvn test -pl openmetadata-service -Dtest=VectorSearchQueryBuilderTest,ElasticSearchIndexManagerTest,ElasticSearchVectorServiceTest
Integration: configure embeddingProvider in elasticSearchConfiguration, run Search Index app against an ES 8.x/9.x cluster, verify vector index is created and knn search returns results
Confirm OpenSearch deployments unaffected

References

Elasticsearch k-nearest neighbor (kNN) search

🤖 Generated with Claude Code

- Add ElasticSearchVectorService mirroring OpenSearchVectorService using Rest5Client - Add vector_search_index_es_native.json with dense_vector/dims/cosine mappings for en/jp/ru/zh locales - Add VectorSearchQueryBuilder.buildNativeESQuery() for ES 8.x/9.x top-level knn query format - Add SemanticSearchQueryBuilder for Elasticsearch (mirrors OpenSearch equivalent) - Fix ElasticSearchIndexManager.extractMappingsJson() to extract mappings sub-object for putMapping - Fix reformatVectorIndexWithDimension() to handle both "dims" (ES) and "dimension" (OpenSearch) keys - Wire ElasticSearchVectorService into SearchRepository and ElasticSearchBulkSink - Extend VectorSearchQueryBuilderTest and ElasticSearchIndexManagerTest with new coverage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-07T00:06:15Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

joaopamaral · 2026-04-07T00:07:24Z

Initial results look good, but I've run a test only with ES 9.x and version 1.12.4 (not the one from main). I also need to double-check if OpenSearch is affected by this change. Also need to review some AI-resolved conflicts from version 1.12.4 with main.

harshach · 2026-04-07T00:22:01Z

Thanks @joaopamaral this is great!!. Can you make it ready for review? and also address comments here #27111 (comment)

joaopamaral · 2026-04-07T11:31:12Z

Sure @harshach! I'll work on the bot review first before making it ready for review! 👍

github-actions · 2026-04-08T19:27:51Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

…l 6 required args Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-08T19:29:54Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@OverRide

…ce and fix compilation - Expand VectorIndexService interface to declare getExistingFingerprint, getExistingFingerprintsBatch, executeGenericRequest, and VECTOR_INDEX_KEY constant so callers (VectorSearchResource, ElasticSearchBulkSink) can invoke these through the interface type - Add default getIndexAlias() to interface, removing duplicate private getSearchAlias() in OpenSearchVectorService - Fix ElasticSearchVectorService: add generateEmbeddingFields/ updateEntityEmbedding implementations, correct search() signature to include 'from' param, remove spurious @OverRide annotations - Add 'from' parameter to buildNativeESQuery (valid for ES KNN pagination) - DRY appendFilterMustClauses with nestedTags boolean: ES-native index maps tags as nested type, OpenSearch entity indices use flat object - Add semanticSearch field to searchRequest.json (required by SemanticSearchQueryBuilder) - Fix test: update all search() and buildNativeESQuery() call sites to pass the new 'from' parameter Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-10T18:16:09Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

…pper Use Jackson ObjectMapper to patch the 'dims' field in the vector index mapping template instead of exact string matching, which was fragile against whitespace variations. Extract into package-private patchDimension() and add 3 unit tests covering dimension replacement, preservation of other fields, and the no-space JSON variant. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-10T18:30:43Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

…stance Assign to the volatile 'instance' field only after registerVectorEmbeddingHandler() completes, so a concurrent caller via getInstance() cannot observe a partially-initialized service. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-10T18:50:19Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

….search() ES search() was passing 'from' directly to the KNN query, which skips raw chunks rather than parent entities. Mirror the OpenSearch approach: - Loop with rawOffset to collect from + size + 1 distinct parents - Skip 'from' parents in application code after collection - Return 4-arg VectorSearchResponse with totalHits and hasMore populated Extract collectSearchHits() and extractTotalHits() private helpers (same pattern as OpenSearchVectorService). Update tests to use parentId (camelCase, matching VectorDocBuilder), use thenAnswer for fresh mock streams on each loop iteration, and add sequence mock for multi-page termination tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-10T19:15:17Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

…DocBuilder bulkIndex() was reading 'parent_id' and 'chunk_index' (snake_case) to build document IDs, but VectorDocBuilder stores both as camelCase ('parentId', 'chunkIndex'). This caused doc IDs to always be null-N or parentId-N (using loop index instead of actual chunk index). Add test that captures the BulkRequest and verifies the doc ID is parentId-chunkIndex. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-10T19:25:12Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

ElasticSearchBulkSink was casting VectorIndexService to ElasticSearchVectorService to call copyExistingVectorDocuments(), breaking the interface abstraction with a potential ClassCastException. Add the method to the interface with a default no-op (returns false), and remove the cast and import. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-10T19:28:31Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

…yExistingVectorDocuments Remove copyExistingVectorDocuments from VectorIndexService (it is ES-specific and has no meaningful default for other implementations). Use Java 21 pattern matching instanceof in ElasticSearchBulkSink so the call is explicit and safe without introducing a no-op default into the interface. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-10T19:36:02Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

gitar-bot · 2026-04-10T19:42:15Z

Code Review 👍 Approved with suggestions 5 resolved / 6 findings

Adds native Elasticsearch vector search support with comprehensive test coverage and fixes to pagination, initialization ordering, and interface safety. Consider adding a type guard for the extractRestClient cast to Rest5ClientTransport to prevent runtime errors.

💡 Edge Case: extractRestClient cast to Rest5ClientTransport has no guard

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/vector/ElasticSearchVectorService.java:60-63

At line 61, extractRestClient() performs an unchecked cast (Rest5ClientTransport) client._transport(). While the current codebase always creates the ES client with Rest5ClientTransport, a future refactor or different client construction path would produce a ClassCastException with no helpful message. A defensive check or better error message would improve debuggability.

Suggested fix

private static Rest5Client extractRestClient(ElasticsearchClient client) {
  if (!(client._transport() instanceof Rest5ClientTransport rest5)) {
    throw new IllegalArgumentException(
        "ElasticSearchVectorService requires Rest5ClientTransport, got: "
            + client._transport().getClass().getName());
  }
  return rest5.restClient();
}

✅ 5 resolved

✅ Bug: Test calls build() with 4 args but method requires 6 — won't compile

📄 openmetadata-service/src/test/java/org/openmetadata/service/search/vector/VectorSearchQueryBuilderTest.java:864 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/vector/VectorSearchQueryBuilder.java:19-25
The test testNativeESQueryAndOpenSearchQueryProduceSameFilters at line 864 calls VectorSearchQueryBuilder.build(vector, 10, 100, filters) with 4 arguments, but the build() method signature requires 6 parameters: (float[] vector, int size, int from, int k, Map<String, List<String>> filters, double threshold). This will fail to compile. The intent appears to be comparing OpenSearch and ES native queries with the same filters.

✅ Edge Case: loadIndexMapping dimension replacement is brittle — exact string match

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/vector/ElasticSearchVectorService.java:523-527 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java:3060-3068
In ElasticSearchVectorService.loadIndexMapping(), the dimension placeholder is replaced via exact string matching: template.replace(""dims": 512", ""dims": " + dimension). This requires the JSON template to have exactly one space after the colon. If the template is reformatted (e.g., minified, or extra spaces), the replacement silently fails. The code does have a post-check that throws if no replacement happened, but the same brittleness exists in SearchRepository.reformatVectorIndexWithDimension() fallback (lines 3062-3068) which does multiple .replace() calls for both with-space and without-space variants — showing awareness of the problem but an incomplete fix.

✅ Edge Case: init() assigns instance before registerVectorEmbeddingHandler completes

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/vector/ElasticSearchVectorService.java:65-79
In ElasticSearchVectorService.init() (line 70), instance is assigned the new service object, and then instance.registerVectorEmbeddingHandler() is called on line 71. Since getInstance() is not synchronized, a concurrent caller could observe the instance in a partially-initialized state (before the handler is registered). The volatile keyword ensures the reference is visible but doesn't guarantee that registerVectorEmbeddingHandler() has completed.

This mirrors the existing OpenSearch pattern, but the ES version is new code where it can be fixed.

✅ Bug: ES search pagination is broken vs OpenSearch implementation

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/vector/ElasticSearchVectorService.java:95-109 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/vector/OpenSearchVectorService.java:196-210
The ElasticSearchVectorService.search() passes from directly to the ES kNN query, which skips raw hits (chunks). However, the API contract expects from to skip parent entities (grouped results). The OpenSearch implementation correctly handles this: it fetches from + size + 1 parents via a loop, then skips from parents in application code.

With the current ES implementation, requesting from=5, size=10 skips 5 raw chunks (not 5 parents), leading to incorrect pagination results. Additionally, the ES version uses the 2-arg VectorSearchResponse constructor, so totalHits and hasMore are always null — callers lose pagination metadata.

✅ Quality: Unsafe downcast defeats purpose of VectorIndexService interface

📄 openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ElasticSearchBulkSink.java:778
ElasticSearchBulkSink casts vectorService to (ElasticSearchVectorService) to call copyExistingVectorDocuments(). This couples the sink back to the concrete class, undermining the new interface abstraction. If vectorService is ever a different implementation, this throws ClassCastException at runtime.

🤖 Prompt for agents

Code Review: Adds native Elasticsearch vector search support with comprehensive test coverage and fixes to pagination, initialization ordering, and interface safety. Consider adding a type guard for the extractRestClient cast to Rest5ClientTransport to prevent runtime errors.

1. 💡 Edge Case: extractRestClient cast to Rest5ClientTransport has no guard
   Files: openmetadata-service/src/main/java/org/openmetadata/service/search/vector/ElasticSearchVectorService.java:60-63

   At line 61, `extractRestClient()` performs an unchecked cast `(Rest5ClientTransport) client._transport()`. While the current codebase always creates the ES client with `Rest5ClientTransport`, a future refactor or different client construction path would produce a `ClassCastException` with no helpful message. A defensive check or better error message would improve debuggability.

   Suggested fix:
   private static Rest5Client extractRestClient(ElasticsearchClient client) {
     if (!(client._transport() instanceof Rest5ClientTransport rest5)) {
       throw new IllegalArgumentException(
           "ElasticSearchVectorService requires Rest5ClientTransport, got: "
               + client._transport().getClass().getName());
     }
     return rest5.restClient();
   }

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

joaopamaral · 2026-04-10T19:46:32Z

Hi @harshach, ’ve addressed the bot review, but I still need to re-review the code after rebasing/merging with main and rerun the tests against a real server. So far, I’ve tested this PR with version 1.12.4 and ES 9.3.1. I still need to validate that everything continues to work correctly with OpenSearch and ES 8.x.

I won’t be able to run tests for the next couple of days, but feel free to proceed with any testing on your side in the meantime.

Copilot

Pull request overview

This PR adds native Elasticsearch (8.x/9.x) vector search support to OpenMetadata, aiming to provide semantic/vector search capabilities on Elasticsearch deployments comparable to the existing OpenSearch implementation.

Changes:

Added a new ElasticSearchVectorService plus wiring in SearchRepository / ElasticSearchBulkSink to initialize and use it when Elasticsearch is the configured backend.
Introduced ES-native vector index mapping templates (vector_search_index_es_native.json) and extended query-building to emit Elasticsearch’s top-level knn query format.
Added/updated tests around the ES-native query format and Elasticsearch vector service behavior.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
openmetadata-spec/src/main/resources/json/schema/search/searchRequest.json	Adds `semanticSearch` flag to the search request schema.
openmetadata-spec/src/main/resources/elasticsearch/en/vector_search_index_es_native.json	New ES-native vector index template (en).
openmetadata-spec/src/main/resources/elasticsearch/jp/vector_search_index_es_native.json	New ES-native vector index template (jp).
openmetadata-spec/src/main/resources/elasticsearch/ru/vector_search_index_es_native.json	New ES-native vector index template (ru).
openmetadata-spec/src/main/resources/elasticsearch/zh/vector_search_index_es_native.json	New ES-native vector index template (zh).
openmetadata-service/src/main/java/org/openmetadata/service/search/vector/VectorSearchQueryBuilder.java	Adds `buildNativeESQuery` and refactors filter emission for vector search queries.
openmetadata-service/src/test/java/org/openmetadata/service/search/vector/VectorSearchQueryBuilderTest.java	Adds coverage for ES-native top-level `knn` query structure and filter behavior.
openmetadata-service/src/main/java/org/openmetadata/service/search/vector/VectorIndexService.java	Extends vector service interface and adds an alias helper.
openmetadata-service/src/main/java/org/openmetadata/service/search/vector/OpenSearchVectorService.java	Adjusts to use the new interface default alias method and annotates overrides.
openmetadata-service/src/main/java/org/openmetadata/service/search/vector/ElasticSearchVectorService.java	New Elasticsearch vector service implementation using Rest5Client for generic requests.
openmetadata-service/src/test/java/org/openmetadata/service/search/vector/ElasticSearchVectorServiceTest.java	New tests for ES vector service result parsing, grouping, and dimension patching.
openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java	Initializes ES vector service when Elasticsearch backend is configured; mapping selection tweaks for ES-native template.
openmetadata-service/src/main/java/org/openmetadata/service/search/RecreateWithEmbeddings.java	Attempts to include a vector “entity” key in recreate flow when vector search is enabled.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/SemanticSearchQueryBuilder.java	New builder for semantic/hybrid query composition on Elasticsearch.
openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManager.java	Extracts `mappings` sub-object before calling `putMapping`.
openmetadata-service/src/test/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManagerTest.java	Adds a test asserting updateIndex handles full index JSON by extracting mappings.
openmetadata-service/src/main/java/org/openmetadata/service/resources/search/VectorSearchResource.java	Switches to repository-provided `VectorIndexService` and adds a fingerprint endpoint.
openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ElasticSearchBulkSink.java	Adds async vector-embedding task execution + migration path for ES indexing jobs.
openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/ElasticSearchBulkSinkSimpleTest.java	Adds minimal coverage for vector-embedding helpers on the ES sink.
openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SemanticSearchTool.java	Uses repository `VectorIndexService` rather than OpenSearch-only implementation.

Copilot · 2026-04-10T19:49:22Z

+    sb.append(",\"filter\":{\"bool\":{\"must\":[");
+    appendFilterMustClauses(sb, filters, true);
+    sb.append("]}}"); // close must array and bool


buildNativeESQuery passes nestedTags=true, which makes the tags filter emit a nested query. However, in the existing entity index mappings (e.g., elasticsearch/en/table_index_mapping.json), tags is an object (no "type": "nested"), so a nested query on path tags will fail or return no results. Consider using the same flat tags.tagFQN terms filter as the OpenSearch path (or only emitting nested when the target index actually maps tags as nested).

Copilot · 2026-04-10T19:49:23Z

+      List<Map<String, Object>> docs = VectorDocBuilder.fromEntity(entity, embeddingClient);
+      deleteByParentId(targetIndex, parentId);
+      bulkIndex(docs, targetIndex);
+    } catch (Exception e) {


updateEntityEmbedding delegates to updateVectorEmbeddings, which deletes-by-query and then bulk-indexes new documents into targetIndex. But VectorEmbeddingHandler passes the entity index name (e.g., table_search_index), so this will create additional documents (IDs like parentId-chunkIndex) instead of updating the existing entity document (ID = entity UUID), corrupting the entity index contents. Elasticsearch implementation should mirror OpenSearchVectorService.partialUpdateEntity(...) and update the existing doc by _id with the embedding fields.

Copilot · 2026-04-10T19:49:23Z

+      String query =
+          "{\"size\":1,\"_source\":[\"fingerprint\"],"
+              + "\"query\":{\"term\":{\"parent_id\":\""
+              + VectorSearchQueryBuilder.escape(parentId)
+              + "\"}}}";
+      String response = executeGenericRequest("POST", "/" + indexName + "/_search", query);


Fingerprint lookup queries parent_id, but OpenMetadata search documents are keyed by _id (entity UUID) and the OpenSearch implementation queries by _id / ids query. As written, Elasticsearch will almost always return null fingerprints, causing unnecessary recomputation and making delete/update-by-query paths ineffective. Recommend switching to the same _id-based query strategy as OpenSearchVectorService.getExistingFingerprint(...).

Copilot · 2026-04-10T19:49:23Z

+      String query =
+          "{\"size\":"
+              + parentIds.size()
+              + ",\"_source\":[\"parent_id\",\"fingerprint\"]"
+              + ",\"query\":{\"terms\":{\"parent_id\":"
+              + termsArray
+              + "}}"
+              + ",\"collapse\":{\"field\":\"parent_id\"}}";
+


Batch fingerprint lookup also uses parent_id terms + collapse on parent_id. If the intent is to reuse existing entity documents, this should use an ids query on _id (same as OpenSearch) to avoid relying on a separate parent_id field and to work across the existing entity indices behind the dataAssetEmbeddings alias.

Copilot · 2026-04-10T19:49:23Z

+      try (InputStream is = response.getEntity().getContent()) {
+        return new String(is.readAllBytes(), StandardCharsets.UTF_8);
+      }


executeGenericRequest reads and returns the response body but never checks HTTP status codes. For Elasticsearch errors (4xx/5xx), this will return an error payload as if it were a success, and downstream code will fail later while parsing/processing. Consider checking response.getStatusLine() (or equivalent) and throwing an exception that includes the error body when status >= 400, similar to OpenSearchVectorService.executeGenericRequest.

Suggested change

try (InputStream is = response.getEntity().getContent()) {

return new String(is.readAllBytes(), StandardCharsets.UTF_8);

}

String responseBody = "";

if (response.getEntity() != null) {

try (InputStream is = response.getEntity().getContent()) {

responseBody = new String(is.readAllBytes(), StandardCharsets.UTF_8);

}

}

int statusCode = response.getStatusLine().getStatusCode();

if (statusCode >= 400) {

throw new RuntimeException(

String.format(

"Elasticsearch generic request failed: %s %s returned status %d with body: %s",

method, endpoint, statusCode, responseBody));

}

return responseBody;

Copilot · 2026-04-10T19:49:24Z

+
+  @Override
+  public int getPendingVectorTaskCount() {
+    return Math.max(0, phaser.getUnarrivedParties() - 1);


getPendingVectorTaskCount() computes pending tasks via phaser.getUnarrivedParties() - 1. After awaitVectorCompletion() calls phaser.arrive(), the main party is no longer "unarrived", so this undercounts pending tasks (can report 0 while tasks are still running) and makes timeout reporting inaccurate. Prefer tracking pending via phaser.getRegisteredParties() - 1 (or an explicit counter) so the count is correct regardless of whether the main party has arrived.

Suggested change

return Math.max(0, phaser.getUnarrivedParties() - 1);

return Math.max(0, phaser.getRegisteredParties() - 1);

Copilot · 2026-04-10T19:49:24Z

+      "parent_id": {
+        "type": "keyword"
+      },
+      "chunk_index": {
+        "type": "integer"
+      },
+      "chunk_count": {


The ES-native vector index mapping defines identifier fields in snake_case (parent_id, chunk_index, chunk_count). The embedding code and existing search mappings use camelCase (parentId, chunkIndex, chunkCount), and the vector search/filtering logic also expects parentId. This mismatch will prevent correct grouping/filtering and makes the mapping inconsistent with the rest of the search schema. Align the vector index mapping field names with the existing document schema (camelCase) or update the vector doc builder + query code consistently.

Suggested change

"parent_id": {

"type": "keyword"

},

"chunk_index": {

"type": "integer"

},

"chunk_count": {

"parentId": {

"type": "keyword"

},

"chunkIndex": {

"type": "integer"

},

"chunkCount": {

Copilot · 2026-04-10T19:49:24Z

+      "parent_id": {
+        "type": "keyword"
+      },
+      "chunk_index": {
+        "type": "integer"
+      },
+      "chunk_count": {


The ES-native vector index mapping defines identifier fields in snake_case (parent_id, chunk_index, chunk_count), but the embedding/search code uses camelCase (parentId, chunkIndex, chunkCount). This mismatch will break grouping/filtering and makes the mapping inconsistent with the rest of the search schema. Align the mapping field names with the existing document schema (camelCase) or update the indexing/query code consistently.

Suggested change

"parent_id": {

"type": "keyword"

},

"chunk_index": {

"type": "integer"

},

"chunk_count": {

"parentId": {

"type": "keyword"

},

"chunkIndex": {

"type": "integer"

},

"chunkCount": {

Copilot · 2026-04-10T19:49:24Z

+      "parent_id": {
+        "type": "keyword"
+      },
+      "chunk_index": {
+        "type": "integer"
+      },
+      "chunk_count": {


The ES-native vector index mapping defines identifier fields in snake_case (parent_id, chunk_index, chunk_count), but the embedding/search code uses camelCase (parentId, chunkIndex, chunkCount). This mismatch will break grouping/filtering and makes the mapping inconsistent with the rest of the search schema. Align the mapping field names with the existing document schema (camelCase) or update the indexing/query code consistently.

Suggested change

"parent_id": {

"type": "keyword"

},

"chunk_index": {

"type": "integer"

},

"chunk_count": {

"parentId": {

"type": "keyword"

},

"chunkIndex": {

"type": "integer"

},

"chunkCount": {

Copilot · 2026-04-10T19:49:25Z

+      "parent_id": {
+        "type": "keyword"
+      },
+      "chunk_index": {
+        "type": "integer"
+      },
+      "chunk_count": {


The ES-native vector index mapping defines identifier fields in snake_case (parent_id, chunk_index, chunk_count), but the embedding/search code uses camelCase (parentId, chunkIndex, chunkCount). This mismatch will break grouping/filtering and makes the mapping inconsistent with the rest of the search schema. Align the mapping field names with the existing document schema (camelCase) or update the indexing/query code consistently.

Suggested change

"parent_id": {

"type": "keyword"

},

"chunk_index": {

"type": "integer"

},

"chunk_count": {

"parentId": {

"type": "keyword"

},

"chunkIndex": {

"type": "integer"

},

"chunkCount": {

joaopamaral · 2026-04-10T20:01:58Z

Also need to review all after this refactor #26000 😢

gitar-bot bot reviewed Apr 7, 2026

View reviewed changes

Comment thread ...rvice/src/test/java/org/openmetadata/service/search/vector/VectorSearchQueryBuilderTest.java Outdated

gitar-bot bot reviewed Apr 7, 2026

View reviewed changes

Comment thread ...service/src/main/java/org/openmetadata/service/search/vector/ElasticSearchVectorService.java Outdated

gitar-bot bot reviewed Apr 7, 2026

View reviewed changes

Comment thread ...service/src/main/java/org/openmetadata/service/search/vector/ElasticSearchVectorService.java

gitar-bot bot reviewed Apr 7, 2026

View reviewed changes

Comment thread ...service/src/main/java/org/openmetadata/service/search/vector/ElasticSearchVectorService.java

Merge branch 'main' into feat/es-elasticsearch-vector-search-upstream

e4226c2

fix: correct VectorSearchQueryBuilder.build() call in test to pass al…

4c34b8c

…l 6 required args Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

gitar-bot bot reviewed Apr 10, 2026

View reviewed changes

Comment thread ...service/src/main/java/org/openmetadata/service/search/vector/ElasticSearchVectorService.java

gitar-bot bot reviewed Apr 10, 2026

View reviewed changes

Comment thread ...e/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ElasticSearchBulkSink.java Outdated

joaopamaral marked this pull request as ready for review April 10, 2026 19:41

Copilot AI review requested due to automatic review settings April 10, 2026 19:41

Copilot started reviewing on behalf of joaopamaral April 10, 2026 19:42 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

-      try (InputStream is = response.getEntity().getContent()) {
-        return new String(is.readAllBytes(), StandardCharsets.UTF_8);
-      }
+      String responseBody = "";
+      if (response.getEntity() != null) {
+        try (InputStream is = response.getEntity().getContent()) {
+          responseBody = new String(is.readAllBytes(), StandardCharsets.UTF_8);
+        }
+      }
+      int statusCode = response.getStatusLine().getStatusCode();
+      if (statusCode >= 400) {
+        throw new RuntimeException(
+            String.format(
+                "Elasticsearch generic request failed: %s %s returned status %d with body: %s",
+                method, endpoint, statusCode, responseBody));
+      }
+      return responseBody;

	return Math.max(0, phaser.getUnarrivedParties() - 1);
	return Math.max(0, phaser.getRegisteredParties() - 1);

Conversation

joaopamaral commented Apr 7, 2026

Summary

Changes

Compatibility

Test plan

References

Uh oh!

github-actions bot commented Apr 7, 2026

Uh oh!

joaopamaral commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

harshach commented Apr 7, 2026

Uh oh!

joaopamaral commented Apr 7, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026

Uh oh!

gitar-bot bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joaopamaral commented Apr 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

joaopamaral commented Apr 7, 2026 •

edited

Loading

gitar-bot bot commented Apr 10, 2026 •

edited

Loading