Add hybrid-rrf pipeline check and improve embedding error reporting#26936
Add hybrid-rrf pipeline check and improve embedding error reporting#26936
Conversation
…n status validation - Add isHybridSearchPipelineAvailable() to OpenSearchVectorService and SearchRepository to verify the hybrid-rrf search pipeline exists in OpenSearch - Update getEmbeddingsValidation in SystemRepository to also validate the pipeline when embeddings are enabled, failing with actionable message if missing - Retry vector service initialization during validation to surface the actual failure reason (embedding client vs OpenSearch vector service) instead of the generic "not configured properly" message Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR enhances semantic search health validation for OpenSearch-backed embeddings by checking for the required hybrid-rrf search pipeline and by improving initialization error reporting to better distinguish embedding client vs. vector service failures.
Changes:
- Add a
hybrid-rrfpipeline existence check (via GET/_search/pipeline/hybrid-rrf) and surface an actionable “reindex to create it” message when missing. - Update embeddings system validation to retry vector service initialization during health checks and produce more specific failure messages.
- Add unit tests for
OpenSearchVectorService.isHybridSearchPipelineAvailable()covering success, 404, and exception cases.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/SystemRepository.java | Extends embeddings health validation to retry init and validate hybrid pipeline availability with improved messaging. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java | Exposes isHybridSearchPipelineAvailable() by delegating to OpenSearchVectorService when applicable. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/vector/OpenSearchVectorService.java | Adds isHybridSearchPipelineAvailable() that checks the hybrid-rrf search pipeline via the OpenSearch generic client. |
| openmetadata-service/src/test/java/org/openmetadata/service/search/vector/OpenSearchVectorServiceTest.java | Adds unit tests for the new pipeline-availability method. |
| if (searchRepository.getVectorIndexService() == null) { | ||
| return retryInitAndReportError( | ||
| searchRepository, embeddingsValidation, description, configMessage); | ||
| } | ||
|
|
||
| try { | ||
| StepValidation embeddingResult = | ||
| validateEmbeddingGeneration( | ||
| searchRepository.getEmbeddingClient(), | ||
| embeddingsValidation, | ||
| description, | ||
| configMessage); | ||
|
|
||
| if (Boolean.FALSE.equals(embeddingResult.getPassed())) { | ||
| return embeddingResult; | ||
| } | ||
|
|
||
| return validateHybridSearchPipeline( | ||
| searchRepository, embeddingResult, description, configMessage); | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
The embeddings health-check logic now includes additional branches (retrying initializeVectorSearchService(), differentiating embedding-client vs vector-service initialization, and validating hybrid-rrf pipeline presence). There don’t appear to be unit tests covering these new paths; adding tests for (1) pipeline exists, (2) pipeline missing (404), and (3) pipeline check failure (exception/401/5xx) would help prevent regressions in health-check reporting.
- Fix bug: retry path now validates embedding generation before checking the hybrid pipeline (gitar) - Return Optional<String> from checkHybridSearchPipeline instead of boolean, differentiating 404 (missing) from 5xx/connectivity errors with specific messages (copilot) - Include initialization exception message in StepValidation so operators don't need to hunt logs (copilot) - Return Optional.empty() for non-OpenSearch implementations since hybrid pipeline is OpenSearch-specific (gitar) - Add test for 5xx response case in pipeline check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
initializeVectorSearchService() catches exceptions internally and never rethrows, so the try-catch in retryInitAndReportError never captured the error. Store the exception message in vectorServiceInitError field and read it in the validation to include in the health check message. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| @Getter private EmbeddingClient embeddingClient; | ||
| @Getter private VectorIndexService vectorIndexService; | ||
| @Getter private VectorEmbeddingHandler vectorEmbeddingHandler; | ||
| @Getter private String vectorServiceInitError; |
There was a problem hiding this comment.
vectorServiceInitError is written inside a synchronized method (initializeVectorSearchService) but read via an unsynchronized getter. Without volatile (or synchronized access), other threads may not observe the updated value, which can lead to stale/missing error details in validations/logs. Consider making this field volatile (and optionally clearing it on successful init to avoid stale errors).
| @Getter private String vectorServiceInitError; | |
| @Getter private volatile String vectorServiceInitError; |
| StepValidation embeddingsValidation, | ||
| String description, | ||
| String configMessage) { | ||
| searchRepository.initializeVectorSearchService(); |
There was a problem hiding this comment.
retryInitAndReportError() triggers initializeVectorSearchService() as part of the system validation endpoint. When initialization keeps failing (e.g., misconfigured embedding provider), repeated calls to /system/validate will keep retrying expensive init logic and may also perform side effects (e.g., pipeline creation/handler registration attempts), potentially spamming logs and impacting the embedding provider/OpenSearch. Consider throttling retries (e.g., retry once per process lifetime or with a backoff timestamp) and reporting the last captured init error instead of re-initializing on every validation call.
| searchRepository.initializeVectorSearchService(); | |
| // Avoid repeatedly triggering expensive vector search initialization from the | |
| // validation endpoint when we already know initialization has failed. | |
| String previousInitError = searchRepository.getVectorServiceInitError(); | |
| if (previousInitError == null) { | |
| // Either initialization has not been attempted yet or it has not recorded an error, | |
| // so we allow a single retry attempt. | |
| searchRepository.initializeVectorSearchService(); | |
| } else { | |
| LOG.debug( | |
| "Skipping vector search service initialization retry during system validation " | |
| + "because a previous attempt failed with error: {}", | |
| previousInitError); | |
| } |
| public Optional<String> checkHybridSearchPipeline() { | ||
| if (vectorIndexService instanceof OpenSearchVectorService openSearchVectorService) { | ||
| return openSearchVectorService.checkHybridSearchPipeline(); | ||
| } | ||
| return Optional.empty(); | ||
| } |
There was a problem hiding this comment.
PR description mentions adding isHybridSearchPipelineAvailable() on OpenSearchVectorService/SearchRepository, but the implementation adds checkHybridSearchPipeline() returning Optional<String> instead. To avoid confusion for reviewers and future maintainers, please either update the PR description (and any referenced docs/issues) or rename/adjust the method to match the intended API contract.
Cover the new validation branches: pipeline available, pipeline missing (404), pipeline 5xx, embedding client init failure with error message, vector service init failure with embedding client OK, retry with no recovery, and Elasticsearch not supported. Make getEmbeddingsValidation package-private for testability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
🟡 Playwright Results — all passed (19 flaky)✅ 3448 passed · ❌ 0 failed · 🟡 19 flaky · ⏭️ 223 skipped
🟡 19 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
- Annotate getEmbeddingsValidation with @VisibleForTesting to document the package-private visibility is intentional for testing - Use e.toString() instead of e.getMessage() for vectorServiceInitError and pipeline check errors to avoid null messages - Mock Bedrock config in tests instead of null to eliminate noisy ERROR logs during test runs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove guard that prevented init retries after a previous error, allowing recovery from transient failures (e.g., OpenSearch restart). initializeVectorSearchService() is already idempotent. - Include truncated response body in non-404 pipeline check errors for actionable diagnostics (e.g., 403 permission details) - Rename testEmbeddingsPassButPipelineMissing to testEmbeddingsFailWhenPipelineMissing for clarity - Use imported Optional instead of fully qualified in tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code Review ✅ Approved 4 resolved / 4 findingsAdds hybrid-RRF pipeline validation and improves embedding error reporting in status validation, addressing retry path validation gaps, hybrid pipeline availability checks, and error state management. No issues remain. ✅ 4 resolved✅ Bug: Retry path skips embedding generation validation
✅ Quality: isHybridSearchPipelineAvailable returns false for non-OpenSearch
✅ Quality: Success message mentions hybrid pipeline for non-OpenSearch backends
✅ Bug: vectorServiceInitError not cleared on successful retry
OptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
…pen-metadata#26936) * Add hybrid-rrf pipeline check and improve embedding error reporting in status validation - Add isHybridSearchPipelineAvailable() to OpenSearchVectorService and SearchRepository to verify the hybrid-rrf search pipeline exists in OpenSearch - Update getEmbeddingsValidation in SystemRepository to also validate the pipeline when embeddings are enabled, failing with actionable message if missing - Retry vector service initialization during validation to surface the actual failure reason (embedding client vs OpenSearch vector service) instead of the generic "not configured properly" message Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address PR review feedback from Copilot and Gitar - Fix bug: retry path now validates embedding generation before checking the hybrid pipeline (gitar) - Return Optional<String> from checkHybridSearchPipeline instead of boolean, differentiating 404 (missing) from 5xx/connectivity errors with specific messages (copilot) - Include initialization exception message in StepValidation so operators don't need to hunt logs (copilot) - Return Optional.empty() for non-OpenSearch implementations since hybrid pipeline is OpenSearch-specific (gitar) - Add test for 5xx response case in pipeline check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Store vector service init error for status reporting initializeVectorSearchService() catches exceptions internally and never rethrows, so the try-catch in retryInitAndReportError never captured the error. Store the exception message in vectorServiceInitError field and read it in the validation to include in the health check message. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add unit tests for embeddings validation paths in SystemRepository Cover the new validation branches: pipeline available, pipeline missing (404), pipeline 5xx, embedding client init failure with error message, vector service init failure with embedding client OK, retry with no recovery, and Elasticsearch not supported. Make getEmbeddingsValidation package-private for testability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Clear vectorServiceInitError on successful initialization Prevents stale error state after a successful retry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Make vectorServiceInitError volatile and skip redundant init retries - Add volatile to vectorServiceInitError for thread-safe reads from the unsynchronized getter - Skip calling initializeVectorSearchService() during validation when a previous init error is already recorded, avoiding expensive retries and log spam on every /system/validate call Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address review: @VisibleForTesting, e.toString(), mock Bedrock config - Annotate getEmbeddingsValidation with @VisibleForTesting to document the package-private visibility is intentional for testing - Use e.toString() instead of e.getMessage() for vectorServiceInitError and pipeline check errors to avoid null messages - Mock Bedrock config in tests instead of null to eliminate noisy ERROR logs during test runs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Allow transient failure recovery, include response body, fix test style - Remove guard that prevented init retries after a previous error, allowing recovery from transient failures (e.g., OpenSearch restart). initializeVectorSearchService() is already idempotent. - Include truncated response body in non-404 pipeline check errors for actionable diagnostics (e.g., 403 permission details) - Rename testEmbeddingsPassButPipelineMissing to testEmbeddingsFailWhenPipelineMissing for clarity - Use imported Optional instead of fully qualified in tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…pen-metadata#26936) * Add hybrid-rrf pipeline check and improve embedding error reporting in status validation - Add isHybridSearchPipelineAvailable() to OpenSearchVectorService and SearchRepository to verify the hybrid-rrf search pipeline exists in OpenSearch - Update getEmbeddingsValidation in SystemRepository to also validate the pipeline when embeddings are enabled, failing with actionable message if missing - Retry vector service initialization during validation to surface the actual failure reason (embedding client vs OpenSearch vector service) instead of the generic "not configured properly" message Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address PR review feedback from Copilot and Gitar - Fix bug: retry path now validates embedding generation before checking the hybrid pipeline (gitar) - Return Optional<String> from checkHybridSearchPipeline instead of boolean, differentiating 404 (missing) from 5xx/connectivity errors with specific messages (copilot) - Include initialization exception message in StepValidation so operators don't need to hunt logs (copilot) - Return Optional.empty() for non-OpenSearch implementations since hybrid pipeline is OpenSearch-specific (gitar) - Add test for 5xx response case in pipeline check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Store vector service init error for status reporting initializeVectorSearchService() catches exceptions internally and never rethrows, so the try-catch in retryInitAndReportError never captured the error. Store the exception message in vectorServiceInitError field and read it in the validation to include in the health check message. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add unit tests for embeddings validation paths in SystemRepository Cover the new validation branches: pipeline available, pipeline missing (404), pipeline 5xx, embedding client init failure with error message, vector service init failure with embedding client OK, retry with no recovery, and Elasticsearch not supported. Make getEmbeddingsValidation package-private for testability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Clear vectorServiceInitError on successful initialization Prevents stale error state after a successful retry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Make vectorServiceInitError volatile and skip redundant init retries - Add volatile to vectorServiceInitError for thread-safe reads from the unsynchronized getter - Skip calling initializeVectorSearchService() during validation when a previous init error is already recorded, avoiding expensive retries and log spam on every /system/validate call Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address review: @VisibleForTesting, e.toString(), mock Bedrock config - Annotate getEmbeddingsValidation with @VisibleForTesting to document the package-private visibility is intentional for testing - Use e.toString() instead of e.getMessage() for vectorServiceInitError and pipeline check errors to avoid null messages - Mock Bedrock config in tests instead of null to eliminate noisy ERROR logs during test runs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Allow transient failure recovery, include response body, fix test style - Remove guard that prevented init retries after a previous error, allowing recovery from transient failures (e.g., OpenSearch restart). initializeVectorSearchService() is already idempotent. - Include truncated response body in non-404 pipeline check errors for actionable diagnostics (e.g., 403 permission details) - Rename testEmbeddingsPassButPipelineMissing to testEmbeddingsFailWhenPipelineMissing for clarity - Use imported Optional instead of fully qualified in tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>



Summary
hybrid-rrfsearch pipeline in the system health check when embeddings are enabled. If the pipeline is missing, the status reports an actionable error ("Run a reindex to create it"), differentiating 404 from 5xx/connectivity errors.checkHybridSearchPipeline()toOpenSearchVectorServiceandSearchRepository, returningOptional<String>(empty = OK, present = error detail) to check pipeline existence via a GET request.initializeVectorSearchService()when a previous init error is already recorded.Closes https://github.com/open-metadata/ai-platform/issues/473
Test plan
checkHybridSearchPipeline(200 OK, 404, 5xx, and exception cases)SystemRepositoryembeddings validation paths (7 tests covering all branches)mvn spotless:apply- confirmed formatting is clean