Add external run Tests by mohityadav766 · Pull Request #28637 · open-metadata/OpenMetadata

mohityadav766 · 2026-06-02T12:58:30Z

Search-indexing regression tests — issue coverage

This branch adds behaviour/regression tests for reported search-indexing issues. Tests target
both indexing paths where relevant — the live path (entity create/update →
SearchRepository) and the reindex path (SearchResource → SearchIndexingApplication bulk
sink).

Legend: ✅ new test added (this branch) · 🟡 covered by existing/folded into existing test ·
⛔ not added (reason given) · Fixed? = whether the underlying product issue is resolved.

A. Query-time mapping / field-type 500s

#	Symptom	Fixed?	Coverage
A1	`extension.*` highlight → `search 500` (no analyzer)	✅	✅ `ExtensionHighlightSearchIT` (verified on embedded OpenSearch — highlight on a flattened `extension.*` field no longer fails the shard)
A2	Data-Insights Group By on `extension.*` / non-aggregatable field → 500	✅	✅ `CustomPropertyAggregationIT`
A3	`value_count` agg on string custom property → shard failure	✅	✅ `CustomPropertyAggregationIT`
A4	`search_after` breaks on comma-containing names	Uncertain	⛔ not added (uncertain fix status)
A5	`owners` mapped `object` not `nested` (13 mapping files, incl. locales)	✅ #26982	✅ `OwnersNestedMappingIT` (sweeps every index)
A6	`owners` not `nested` (e2e upgrade)	✅	✅ `OwnersNestedMappingIT`
A7	`too_many_nested_clauses` on long compound names	✅ #21505	✅ `LongCompoundNameSearchUIIT`
A8	Stale `extension`→`fileExtension` aggregation 500 after upgrade	✅ #28555/#28565	⛔ migration-only → AUT nightly
A9	1.12.0 reindex-recreate all-shards-failed on `dataAsset`	✅	🟡 reindex ITs (`ReindexAliasSwapIT`, `NoDuplicatesDuringReindexIT`)

B. Indexing-time parse failures (docs dropped at sink)

#	Symptom	Fixed?	Coverage
B10	Dynamic-mapping parse failure on custom properties	✅ #28671	🟡 `CustomPropertyAggregationIT` + immense-term IT
B11	Custom-property string > 32766 bytes (immense term) skipped	✅	🟡 existing `SearchIndexImmenseTermIT`
B12	Custom-property string > 32766 bytes (general class)	✅	🟡 existing `SearchIndexImmenseTermIT`
B13	Deeply nested schema fields exceed mapping depth limit	✅	🟡 existing `SearchIndexImmenseTermIT` / `SearchIndexFieldLimitIT`
B14	"failed to parse" during indexing (multi-front incident)	Partial/infra	🟡 partial

C. Reindex job reliability (stuck / stale / lock / restart)

#	Symptom	Fixed?	Coverage
C15	Abandoned job + lock disagreement after server crash mid-reindex	✅ #28699	⛔ in-JVM embedded server + in-memory latch — needs real process restart → AUT real-pod nightly
C16	`redeploy_pipelines` 409 (stale hybrid runner)	Env	⛔ needs hybrid runner
C17	Stop/uninstall doesn't stop indexing (multi-pod)	✅ #26669	🟡 `ReindexStopUnderLoadIT` (single-pod); multi-pod ⛔
C18	Search engine unavailable during reindex + recovery	✅	✅ `ReindexOutageRecoveryIT` (reindex path) + `LiveIndexRetryIT` (live path)
C19	Stale-job blocks remaining-entity reindex	✅ 1.12.5	🟡 `ReindexStopUnderLoadIT`
C20	New test cases don't appear after upgrade	✅	🟡 `TestCaseResultReindexUIIT`
C21	Server restart on search indexing (P0)	—	⛔ not unit-testable (in-JVM) → AUT real-pod nightly

D. Reindex resource exhaustion (OOM / disk / payload)

#	Symptom	Fixed?	Coverage
D22	OOM loop from unsafe bulk-sink config	✅	✅ `UnsafeReindexConfigIT` (server-survives guard)
D23	OOM with Auto-Tune	Ops	🟡 `UnsafeReindexConfigIT` (partial)
D24	OpenSearch GC unresponsive at high asset count	Infra	⛔ can't deterministically induce a GC pause
D25	Disk flood-stage watermark blocks index	✅	✅ `ReindexOutageRecoveryIT` (reindex-path fault recovery)
D26	Disk usage exceeded during reindex → test cases missing	✅	✅ `ReindexOutageRecoveryIT`
D27	413 Request Too Long during bulk reindex	✅	🟡 sink strip + `SearchIndexImmenseTermIT`
D28	Vector index bulk processor doesn't flush on payload size	✅	⛔ needs vector index enabled
D29	Connection leak in lineage fetch during reindex	✅	⛔ leak not deterministically assertable

E. Stale / missing data after reindex (correctness)

#	Symptom	Fixed?	Coverage
E30	Test-case status removed after reindex (`testCaseResult` omitted)	✅ #27723	✅ `TestCaseResultReindexUIIT` (reindex path)
E31	Bundle-suite test cases disappear after PUT	—	⛔ needs logical/bundle-suite plumbing
E32	Tier stripped from tags on full rebuild	✅	✅ `TestCaseTierRebuildUIIT` (reindex path)
E33	Glossary-term rename corrupts doc → 0 assets (double find-replace)	✅ #28725	✅ `GlossaryRenameSearchUIIT` + `ClassificationRenameSearchUIIT` (live + reindex paths)
E34	Glossary rename not reflected (platform instability)	Infra	🟡 rename tests cover correctness
E35	Ingestion-pipeline owners missing from index → `isOwner` denies	✅ #28264	✅ `PipelineOwnerIndexUIIT`
E36	Test cases land in old index deleted on alias promotion	✅	🟡 `NoDuplicatesDuringReindexIT` / `ReindexAliasSwapIT`
E37	Search index stale (latest data missing)	✅	🟡 `SimpleReindexTriggerUIIT`
E38	Owner-enrichment crash dropped whole entities from DI	✅ #28264	⛔ Data-Insights-specific

F. Performance / slowness

#	Symptom	Fixed?	Coverage
F39	Reindex throughput (170k took 6h)	Perf	🟡 measured by `Scale100kEntitiesIT` — records `reindex_ms` over 100k/500k cohorts; no hard time threshold (env-dependent → flaky)
F40	Indexing slows badly past 250k records	✅ #27865	🟡 measured by `Scale100kEntitiesIT` (throughput/latency metrics; #27865 added fast-write settings)
F41	recreate + Auto-Tune: >50% of tables silently not indexed	✅	✅ `Scale100kEntitiesIT` asserts `db_count == es_count` after recreate — catches silent under-indexing

G. Pre-release / CI / internal

#	Symptom	Fixed?	Coverage
G42	Data-Insights snapshot index storage blowup	—	⛔ capacity, not behaviour
G43	Reindex alias propagation race on `table_search_index`	✅ #28700	🟡 existing `ReindexAliasSwapIT`
G44	Migration init container fails on ES 9.x (missing KNN settings)	—	⛔ version/env-specific
G45	Fleet-wide health snapshot	—	⛔ ops, not behaviour
G46	Indexing log noise	—	⛔ ops, not behaviour

New tests in this branch (11): GlossaryRenameSearchUIIT, ClassificationRenameSearchUIIT,
LongCompoundNameSearchUIIT, PipelineOwnerIndexUIIT, TestCaseResultReindexUIIT,
TestCaseTierRebuildUIIT (UIIT); OwnersNestedMappingIT, CustomPropertyAggregationIT,
ExtensionHighlightSearchIT, UnsafeReindexConfigIT, ReindexOutageRecoveryIT (IT). The 5 ITs
are validated green on the embedded stack; the 6 UIITs are compile-verified.

Describe your changes:

Fixes #

I worked on ... because ...

Type of change:

High-level design:

N/A — small change.

Tests:

Use cases covered

Unit tests

Backend integration tests

Ingestion integration tests

Playwright (UI) tests

Manual testing performed

UI screen recording / screenshots:

Not applicable.

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
My PR is linked to a GitHub issue via Fixes #<issue-number> above.
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.
For UI changes: I attached a screen recording and/or screenshots above.
I have added tests (unit / integration / Playwright as applicable) and listed them above.

Summary by Gitar

Test infrastructure and support:
- Added TestSupportSearchResource as a restricted admin-only passthrough for remote search engine introspection.
- Implemented ExternalTokenRefresher to maintain valid session tokens during long-running integration suites against remote clusters.
Search client flexibility:
- Enabled SearchClient to route requests through the API passthrough when running in external mode.
- Added rawSearchRequest support to ElasticSearchClient and OpenSearchClient to facilitate low-level index inspection.
CI and resource optimization:
- Configurable EntityLoader concurrency via jpw.loader.maxWorkers to prevent 504 timeouts on shared external infrastructure.
- Introduced NamespaceCleanup for recursive hard-deletion of root entities to maintain state hygiene.
New Integration Tests:
- Added ExtensionHighlightSearchIT to guard against index-shard failures during highlighting on flattened fields.
- Added ReindexOutageRecoveryIT, UnsafeReindexConfigIT, CustomPropertyAggregationIT, and OwnersNestedMappingIT to expand test coverage.
Test refinements:
- Updated SimpleReindexTriggerUIIT to use exact index count assertions.

_{This will update automatically on new commits.}

Copilot

Pull request overview

Adds infrastructure to run the Java integration / Playwright / scale test suites against an external, already-running OpenMetadata cluster (where the search engine is not directly reachable on :9200). It introduces an admin-only test-support REST endpoint that proxies a read-only subset of search engine requests, switches the in-test SearchClient to route through that endpoint when external config is present, and adds a new GitHub Actions workflow that obtains a JWT and drives the suites.

Changes:

New TestSupportSearchResource plus SearchClient.rawSearchRequest implementations on the ES and OS clients for an admin-gated, read-only search passthrough.
Test harness (OssTestServer, ServerHandle, ExternalServer, SearchClient, IndexAliasInspector, Scale100kEntitiesIT) updated to detect external mode and route search introspection through the passthrough; affected tests are tagged (search-direct) or skipped (LiveIndexRetryIT, ReindexStatsIT#orphanedSchemaDoesNotFailReindex).
New java-playwright-external.yml workflow runs ui-it, search-it, and matrixed scale-it profiles against the external cluster.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
openmetadata-service/.../search/SearchClient.java	Adds `RawSearchResponse` record and default `rawSearchRequest` API
openmetadata-service/.../opensearch/OpenSearchClient.java	Implements `rawSearchRequest` via OS generic client
openmetadata-service/.../elasticsearch/ElasticSearchClient.java	Implements `rawSearchRequest` via ES low-level client, parses query string
openmetadata-service/.../resources/testsupport/TestSupportSearchResource.java	New admin-only read-only passthrough endpoints (`/passthrough`, `/cluster-alias`, `/exists`)
openmetadata-integration-tests/.../scenarios/search/reindex/*UIIT.java	Tag direct-search UI ITs with `search-direct` for exclusion in external runs
openmetadata-integration-tests/.../it/util/OssTestServer.java	Delegate to `UiTestServer` when `OM_URL`+`OM_ADMIN_TOKEN` are set
openmetadata-integration-tests/.../it/tests/search/Scale100kEntitiesIT.java	Parameterize seed/timeout, resolve table alias dynamically
openmetadata-integration-tests/.../it/tests/search/ReindexStatsIT.java	Skip orphan-schema test in external mode
openmetadata-integration-tests/.../it/tests/search/LiveIndexRetryIT.java	Skip whole class in external mode
openmetadata-integration-tests/.../it/server/ServerHandle.java	Add `external` flag + `isExternal()`
openmetadata-integration-tests/.../it/server/ExternalServer.java	Mark handles produced from env as external
openmetadata-integration-tests/.../it/search/SearchClient.java	Route through `/v1/test-support/search` when external
openmetadata-integration-tests/.../it/search/IndexAliasInspector.java	Fetch cluster alias from server in external mode
.github/workflows/java-playwright-external.yml	New workflow to drive UI/search/scale suites against an external cluster

+  private static void validateReadOnly(String path) {
+    String value = normalize(path);
+    boolean allowed = ALLOWED_TOKENS.stream().anyMatch(value::contains);
+    if (!allowed) {
+      throw new BadRequestException(
+          "Only read-only search introspection paths are permitted: " + ALLOWED_TOKENS);
+    }
+  }


+    int queryStart = endpoint.indexOf('?');
+    String path = queryStart >= 0 ? endpoint.substring(0, queryStart) : endpoint;
+    es.co.elastic.clients.transport.rest5_client.low_level.Request request =
+        new es.co.elastic.clients.transport.rest5_client.low_level.Request(method, path);
+    if (queryStart >= 0) {
+      for (String pair : endpoint.substring(queryStart + 1).split("&")) {
+        int eq = pair.indexOf('=');
+        if (eq > 0) {
+          request.addParameter(pair.substring(0, eq), pair.substring(eq + 1));
+        }
+      }
+    }


gitar-bot · 2026-06-02T13:01:38Z

+  private static void validateReadOnly(String path) {
+    String value = normalize(path);
+    boolean allowed = ALLOWED_TOKENS.stream().anyMatch(value::contains);
+    if (!allowed) {
+      throw new BadRequestException(
+          "Only read-only search introspection paths are permitted: " + ALLOWED_TOKENS);
+    }


⚠️ Security: Allowlist bypass via substring match in validateReadOnly

The validateReadOnly method at line 142 uses value::contains to check if the path contains any allowed token. This is a substring match, meaning a path like /_bulk?ignore=_count or /_delete_by_query?_search=1 would pass validation because it contains _count or _search as a substring, while actually targeting a mutating endpoint.

While admin-only auth is a mitigating factor, this weakens the defense-in-depth intent of the allowlist. The check should verify the path segment rather than performing a simple substring search.

Check allowed tokens only in the path portion (before '?') and require them to appear as path segments (preceded by '/'):

private static void validateReadOnly(String path) { String value = normalize(path); // Split on '?' to isolate the path portion from query params String pathOnly = value.contains("?") ? value.substring(0, value.indexOf('?')) : value; boolean allowed = ALLOWED_TOKENS.stream() .anyMatch(token -> pathOnly.contains("/" + token) || pathOnly.endsWith(token)); if (!allowed) { throw new BadRequestException( "Only read-only search introspection paths are permitted: " + ALLOWED_TOKENS); } }

Apply fix

_{Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎}

gitar-bot · 2026-06-02T13:01:39Z

+@Path("/v1/test-support/search")
+@Collection(name = "testSupportSearch")


⚠️ Security: Test-support endpoint ships in production with no disable gate

The TestSupportSearchResource is annotated with @Collection, which means it is auto-registered in every deployment including production. Although it requires admin auth, shipping a raw search-engine passthrough in production increases the attack surface — a compromised admin token could exfiltrate arbitrary search engine data or (combined with the allowlist bypass) perform mutations.

Consider gating registration behind a configuration flag (e.g., testSupportEnabled: true in the YAML config) or a build profile so it's excluded from production images.

Gate the resource behind an opt-in environment variable so it's not available in production by default:

// Option: Check a config flag in the constructor and short-circuit all methods, // or conditionally register the resource in the Collection scanner. // Minimal approach: add an env-var / config guard in the constructor: public TestSupportSearchResource(Authorizer authorizer) { this.authorizer = authorizer; if (!Boolean.parseBoolean(System.getenv("OM_TEST_SUPPORT_ENABLED"))) { throw new IllegalStateException( "TestSupportSearchResource is disabled; set OM_TEST_SUPPORT_ENABLED=true to enable"); } }

Apply fix

_{Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎}

github-actions · 2026-06-02T15:46:03Z

🟡 Playwright Results — all passed (13 flaky)

✅ 4259 passed · ❌ 0 failed · 🟡 13 flaky · ⏭️ 88 skipped

Shard	Passed	Flaky	Skipped
🟡 Shard 1	295	4	4
🟡 Shard 2	805	2	9
🟡 Shard 3	805	2	8
🟡 Shard 4	845	1	12
🟡 Shard 5	718	2	47
🟡 Shard 6	791	2	8

🟡 13 flaky test(s) (passed on retry)

Features/DataAssetRulesEnabled.spec.ts › Verify the Pipeline Service Entity Action items after rules is Enabled (shard 1, 1 retry)
Features/EntityRenameConsolidation.spec.ts › Classification - multiple rename + update cycles should preserve tags (shard 1, 1 retry)
Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
Pages/SearchSettings.spec.ts › Restore default search settings (shard 1, 1 retry)
Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2, 2 retries)
Features/Glossary/MUIGlossaryMutualExclusivity.spec.ts › MUI-ME-S01: Selecting ME child should auto-deselect siblings (shard 2, 1 retry)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Features/Table.spec.ts › Table pagination with sorting should works (shard 3, 1 retry)
Pages/CustomProperties.spec.ts › Time (shard 4, 1 retry)
Pages/EntityDataConsumer.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
Pages/ExplorePageRightPanel_KnowledgeCenter.spec.ts › Should remove user owner for knowledgeCenter (shard 5, 1 retry)
Pages/Lineage/DataAssetLineage.spec.ts › Column lineage for table -> apiEndpoint (shard 6, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

+  public Response exists(@Context SecurityContext securityContext, @QueryParam("path") String path)
+      throws IOException {
+    authorizer.authorizeAdmin(securityContext);
+    validateReadOnly(path);
+    RawSearchResponse response =
+        Entity.getSearchRepository()
+            .getSearchClient()
+            .rawSearchRequest("GET", normalize(path), null);
+    boolean exists = response.statusCode() >= 200 && response.statusCode() < 300;
+    return Response.ok(Map.of("exists", exists)).build();
+  }


+    String body;
+    try (var is = response.getEntity().getContent()) {
+      body = new String(is.readAllBytes(), java.nio.charset.StandardCharsets.UTF_8);
+    }


+    Assumptions.assumeTrue(
+        System.getenv("OM_URL") == null,
+        "LiveIndexRetryIT pauses the ES container (EsOutageInjector) and reads the retry queue via "
+            + "the in-JVM DAO — both require the embedded stack, so it is skipped in external mode");


+    Assumptions.assumeTrue(
+        System.getenv("OM_URL") == null,
+        "Creating an orphan (schema row hard-deleted without cascading to its table) needs the "
+            + "in-JVM DAO; the REST API would reject or cascade, so this case is embedded-only");


+  private static final List<String> ALLOWED_TOKENS =
+      List.of("_count", "_search", "_alias", "_cat/indices");


+  private static void validateReadOnly(String path) {
+    String value = normalize(path);
+    boolean allowed = ALLOWED_TOKENS.stream().anyMatch(value::contains);
+    if (!allowed) {
+      throw new BadRequestException(
+          "Only read-only search introspection paths are permitted: " + ALLOWED_TOKENS);
+    }


Copilot

Pull request overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated 5 comments.

+  private static void validateReadOnly(String path) {
+    String value = normalize(path);
+    boolean allowed = ALLOWED_TOKENS.stream().anyMatch(value::contains);
+    if (!allowed) {
+      throw new BadRequestException(
+          "Only read-only search introspection paths are permitted: " + ALLOWED_TOKENS);
+    }


+    if (queryStart >= 0) {
+      for (String pair : endpoint.substring(queryStart + 1).split("&")) {
+        int eq = pair.indexOf('=');
+        if (eq > 0) {
+          request.addParameter(pair.substring(0, eq), pair.substring(eq + 1));
+        }
+      }
+    }


+    String body;
+    try (var is = response.getEntity().getContent()) {
+      body = new String(is.readAllBytes(), java.nio.charset.StandardCharsets.UTF_8);
+    }
+    return new RawSearchResponse(response.getStatusCode(), body);
+  }


+  private String login() {
+    final String body = "{\"email\":\"" + email + "\",\"password\":\"" + passwordB64 + "\"}";
+    try {


+@Path("/v1/test-support/search")
+@Collection(name = "testSupportSearch")
+@Tag(name = "TestSupport", description = "Admin-only read-only search passthrough for tests")
+@Produces(MediaType.APPLICATION_JSON)
+@Slf4j
+public class TestSupportSearchResource {


…st docs Keep only public PR/issue numbers and generic symptoms in the new search-indexing regression tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Regression guards for reported search-indexing issues, covering both the live indexing path (entity create/update -> SearchRepository) and the reindex path (SearchResource -> SearchIndexingApplication bulk sink): - GlossaryRenameSearchUIIT / ClassificationRenameSearchUIIT: prefix-rename must not double-apply to the search doc (live + reindex) - LongCompoundNameSearchUIIT: long compound names stay searchable - PipelineOwnerIndexUIIT: owned entity carries its owner in the index - TestCaseResultReindexUIIT / TestCaseTierRebuildUIIT: status/tier survive a recreate reindex - OwnersNestedMappingIT: every index maps owners as nested (RBAC) - CustomPropertyAggregationIT: custom-property aggregations don't fail shards - UnsafeReindexConfigIT: server survives an aggressive bulk-sink config - ReindexOutageRecoveryIT: reindex path reconciles after an engine outage Docs reference only public PR/issue numbers and generic symptoms. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…nal-run

Copilot

Pull request overview

Copilot reviewed 47 out of 47 changed files in this pull request and generated 3 comments.

+  private static void validateReadOnly(String path) {
+    String value = normalize(path);
+    boolean allowed = ALLOWED_TOKENS.stream().anyMatch(value::contains);
+    if (!allowed) {
+      throw new BadRequestException(
+          "Only read-only search introspection paths are permitted: " + ALLOWED_TOKENS);
+    }
+  }


+      SdkClients.overrideAdminToken(token);
+      AuthSession.update(
+          new TokenSet(token, null, null, Instant.now().plus(Duration.ofDays(3650))));
+      LOG.info("Refreshed external admin token (re-login)");


+    String body;
+    try (var is = response.getEntity().getContent()) {
+      body = new String(is.readAllBytes(), java.nio.charset.StandardCharsets.UTF_8);
+    }
+    return new RawSearchResponse(response.getStatusCode(), body);
+  }


Regression guards for reported search-indexing issues, covering both the live indexing path (entity create/update -> SearchRepository) and the reindex path (SearchResource -> SearchIndexingApplication bulk sink): - GlossaryRenameSearchUIIT / ClassificationRenameSearchUIIT: prefix-rename must not double-apply to the search doc (live + reindex) - LongCompoundNameSearchUIIT: long compound names stay searchable - PipelineOwnerIndexUIIT: owned entity carries its owner in the index - TestCaseResultReindexUIIT / TestCaseTierRebuildUIIT: status/tier survive a recreate reindex - OwnersNestedMappingIT: every index maps owners as nested (RBAC) - CustomPropertyAggregationIT: custom-property aggregations don't fail shards - ExtensionHighlightSearchIT: highlight on a flattened extension.* field does not fail the shard (no 'has no associated analyzer' 500) - UnsafeReindexConfigIT: server survives an aggressive bulk-sink config - ReindexOutageRecoveryIT: reindex path reconciles after an engine outage Docs reference only public PR/issue numbers and generic symptoms. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…nal-run

Regression guards for reported search-indexing issues, covering both the live indexing path (entity create/update -> SearchRepository) and the reindex path (SearchResource -> SearchIndexingApplication bulk sink): - GlossaryRenameSearchUIIT / ClassificationRenameSearchUIIT: prefix-rename must not double-apply to the search doc (live + reindex) - LongCompoundNameSearchUIIT: long compound names stay searchable - PipelineOwnerIndexUIIT: owned entity carries its owner in the index - TestCaseResultReindexUIIT / TestCaseTierRebuildUIIT: status/tier survive a recreate reindex - OwnersNestedMappingIT: every index maps owners as nested (RBAC) - CustomPropertyAggregationIT: custom-property aggregations don't fail shards - ExtensionHighlightSearchIT: highlight on a flattened extension.* field does not fail the shard (no 'has no associated analyzer' 500) - UnsafeReindexConfigIT: server survives an aggressive bulk-sink config - ReindexOutageRecoveryIT: reindex path reconciles after an engine outage Docs reference only public PR/issue numbers and generic symptoms. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…nal-run

gitar-bot · 2026-06-06T05:30:02Z

Code Review ⚠️ Changes requested 0 resolved / 2 findings

Introduces an external Playwright integration test harness but exposes a production security risk by lacking a disable gate for TestSupportSearchResource and implementing a flawed substring-match-based allowlist in validateReadOnly.

⚠️

Security: Allowlist bypass via substring match in validateReadOnly

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/testsupport/TestSupportSearchResource.java:140-146 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/testsupport/TestSupportSearchResource.java:140-147

The validateReadOnly method at line 142 uses value::contains to check if the path contains any allowed token. This is a substring match, meaning a path like /_bulk?ignore=_count or /_delete_by_query?_search=1 would pass validation because it contains _count or _search as a substring, while actually targeting a mutating endpoint.

While admin-only auth is a mitigating factor, this weakens the defense-in-depth intent of the allowlist. The check should verify the path segment rather than performing a simple substring search.

Check allowed tokens only in the path portion (before '?') and require them to appear as path segments (preceded by '/')

private static void validateReadOnly(String path) {
  String value = normalize(path);
  // Split on '?' to isolate the path portion from query params
  String pathOnly = value.contains("?") ? value.substring(0, value.indexOf('?')) : value;
  boolean allowed =
      ALLOWED_TOKENS.stream()
          .anyMatch(token -> pathOnly.contains("/" + token) || pathOnly.endsWith(token));
  if (!allowed) {
    throw new BadRequestException(
        "Only read-only search introspection paths are permitted: " + ALLOWED_TOKENS);
  }
}

⚠️

Security: Test-support endpoint ships in production with no disable gate

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/testsupport/TestSupportSearchResource.java:48-49

The TestSupportSearchResource is annotated with @Collection, which means it is auto-registered in every deployment including production. Although it requires admin auth, shipping a raw search-engine passthrough in production increases the attack surface — a compromised admin token could exfiltrate arbitrary search engine data or (combined with the allowlist bypass) perform mutations.

Consider gating registration behind a configuration flag (e.g., testSupportEnabled: true in the YAML config) or a build profile so it's excluded from production images.

Gate the resource behind an opt-in environment variable so it's not available in production by default

// Option: Check a config flag in the constructor and short-circuit all methods,
// or conditionally register the resource in the Collection scanner.
// Minimal approach: add an env-var / config guard in the constructor:
public TestSupportSearchResource(Authorizer authorizer) {
  this.authorizer = authorizer;
  if (!Boolean.parseBoolean(System.getenv("OM_TEST_SUPPORT_ENABLED"))) {
    throw new IllegalStateException(
        "TestSupportSearchResource is disabled; set OM_TEST_SUPPORT_ENABLED=true to enable");
  }
}

🤖 Prompt for agents

Code Review: Introduces an external Playwright integration test harness but exposes a production security risk by lacking a disable gate for `TestSupportSearchResource` and implementing a flawed substring-match-based allowlist in `validateReadOnly`.

1. ⚠️ Security: Allowlist bypass via substring match in validateReadOnly
   Files: openmetadata-service/src/main/java/org/openmetadata/service/resources/testsupport/TestSupportSearchResource.java:140-146, openmetadata-service/src/main/java/org/openmetadata/service/resources/testsupport/TestSupportSearchResource.java:140-147

   The `validateReadOnly` method at line 142 uses `value::contains` to check if the path contains any allowed token. This is a substring match, meaning a path like `/_bulk?ignore=_count` or `/_delete_by_query?_search=1` would pass validation because it *contains* `_count` or `_search` as a substring, while actually targeting a mutating endpoint.
   
   While admin-only auth is a mitigating factor, this weakens the defense-in-depth intent of the allowlist. The check should verify the path *segment* rather than performing a simple substring search.

   Fix (Check allowed tokens only in the path portion (before '?') and require them to appear as path segments (preceded by '/')):
   private static void validateReadOnly(String path) {
     String value = normalize(path);
     // Split on '?' to isolate the path portion from query params
     String pathOnly = value.contains("?") ? value.substring(0, value.indexOf('?')) : value;
     boolean allowed =
         ALLOWED_TOKENS.stream()
             .anyMatch(token -> pathOnly.contains("/" + token) || pathOnly.endsWith(token));
     if (!allowed) {
       throw new BadRequestException(
           "Only read-only search introspection paths are permitted: " + ALLOWED_TOKENS);
     }
   }

2. ⚠️ Security: Test-support endpoint ships in production with no disable gate
   Files: openmetadata-service/src/main/java/org/openmetadata/service/resources/testsupport/TestSupportSearchResource.java:48-49

   The `TestSupportSearchResource` is annotated with `@Collection`, which means it is auto-registered in every deployment including production. Although it requires admin auth, shipping a raw search-engine passthrough in production increases the attack surface — a compromised admin token could exfiltrate arbitrary search engine data or (combined with the allowlist bypass) perform mutations.
   
   Consider gating registration behind a configuration flag (e.g., `testSupportEnabled: true` in the YAML config) or a build profile so it's excluded from production images.

   Fix (Gate the resource behind an opt-in environment variable so it's not available in production by default):
   // Option: Check a config flag in the constructor and short-circuit all methods,
   // or conditionally register the resource in the Collection scanner.
   // Minimal approach: add an env-var / config guard in the constructor:
   public TestSupportSearchResource(Authorizer authorizer) {
     this.authorizer = authorizer;
     if (!Boolean.parseBoolean(System.getenv("OM_TEST_SUPPORT_ENABLED"))) {
       throw new IllegalStateException(
           "TestSupportSearchResource is disabled; set OM_TEST_SUPPORT_ENABLED=true to enable");
     }
   }

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

Copilot

Pull request overview

Copilot reviewed 52 out of 52 changed files in this pull request and generated 3 comments.

+  /** Status code + raw JSON body of a low-level passthrough request to the search engine. */
+  record RawSearchResponse(int statusCode, String body) {}


+      } catch (final ExecutionException | java.util.concurrent.TimeoutException e) {
+        failed++;
+        if (firstFailure == null) {
+          firstFailure = (e instanceof ExecutionException) ? e.getCause() : e;
+        }
      }


+  private static void validateReadOnly(String path) {
+    String value = normalize(path);
+    boolean allowed = ALLOWED_TOKENS.stream().anyMatch(value::contains);
+    if (!allowed) {
+      throw new BadRequestException(
+          "Only read-only search introspection paths are permitted: " + ALLOWED_TOKENS);
+    }


sonarqubecloud · 2026-06-06T06:39:38Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

mohityadav766 added 2 commits June 2, 2026 16:30

Add External Java Playwright Tests

188a934

Add Handling

c4f8594

mohityadav766 self-assigned this Jun 2, 2026

Copilot AI review requested due to automatic review settings June 2, 2026 12:58

mohityadav766 requested review from akash-jain-10, harshach and tutte as code owners June 2, 2026 12:58

Copilot started reviewing on behalf of mohityadav766 June 2, 2026 12:58 View session

github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Jun 2, 2026

Copilot AI reviewed Jun 2, 2026

View reviewed changes

gitar-bot Bot reviewed Jun 2, 2026

View reviewed changes

Merge branch 'main' into add-external-run

487e699

mohityadav766 temporarily deployed to test June 2, 2026 13:19 — with GitHub Actions Inactive

Do not run parallel

bac18f3

Copilot AI review requested due to automatic review settings June 3, 2026 04:22

Copilot started reviewing on behalf of mohityadav766 June 3, 2026 04:22 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

mohityadav766 temporarily deployed to test June 3, 2026 04:34 — with GitHub Actions Inactive

mohityadav766 had a problem deploying to test June 5, 2026 05:34 — with GitHub Actions Error

Fix token refreshing

439cfe0

mohityadav766 had a problem deploying to test June 5, 2026 05:55 — with GitHub Actions Error

mohityadav766 temporarily deployed to test June 5, 2026 05:55 — with GitHub Actions Inactive

mohityadav766 had a problem deploying to test June 5, 2026 05:55 — with GitHub Actions Error

Refresh aware admin client

ac9c4d4

Copilot AI review requested due to automatic review settings June 5, 2026 07:26

Copilot started reviewing on behalf of mohityadav766 June 5, 2026 07:26 View session

Copilot AI reviewed Jun 5, 2026

View reviewed changes

mohityadav766 temporarily deployed to test June 5, 2026 07:37 — with GitHub Actions Inactive

mohityadav766 and others added 4 commits June 5, 2026 16:03

Add More Secenarios

32496f4

test(search): strip customer/internal references from search-issue te…

81fc7d3

…st docs Keep only public PR/issue numbers and generic symptoms in the new search-indexing regression tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/add-external-run' into add-exter…

af4e384

…nal-run

Copilot AI reviewed Jun 5, 2026

View reviewed changes

mohityadav766 and others added 5 commits June 5, 2026 16:54

Merge remote-tracking branch 'origin/add-external-run' into add-exter…

2c9eb54

…nal-run

Merge remote-tracking branch 'origin/add-external-run' into add-exter…

40eb4b7

…nal-run

Increase wait time for search index

3bd59de

Copilot AI reviewed Jun 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add external run Tests#28637

Add external run Tests#28637
mohityadav766 wants to merge 22 commits into
mainfrom
add-external-run

mohityadav766 commented Jun 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

gitar-bot Bot Jun 2, 2026

Uh oh!

gitar-bot Bot Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

gitar-bot Bot commented Jun 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

sonarqubecloud Bot commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@Path("/v1/test-support/search")
		@Collection(name = "testSupportSearch")

		private static final List<String> ALLOWED_TOKENS =
		List.of("_count", "_search", "_alias", "_cat/indices");

		/** Status code + raw JSON body of a low-level passthrough request to the search engine. */
		record RawSearchResponse(int statusCode, String body) {}

Conversation

mohityadav766 commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Search-indexing regression tests — issue coverage

A. Query-time mapping / field-type 500s

B. Indexing-time parse failures (docs dropped at sink)

C. Reindex job reliability (stuck / stale / lock / restart)

D. Reindex resource exhaustion (OOM / disk / payload)

E. Stale / missing data after reindex (correctness)

F. Performance / slowness

G. Pre-release / CI / internal

Describe your changes:

Type of change:

High-level design:

Tests:

Use cases covered

Unit tests

Backend integration tests

Ingestion integration tests

Playwright (UI) tests

Manual testing performed

UI screen recording / screenshots:

Checklist:

Summary by Gitar

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

gitar-bot Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🟡 Playwright Results — all passed (13 flaky)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

gitar-bot Bot commented Jun 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

sonarqubecloud Bot commented Jun 6, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mohityadav766 commented Jun 2, 2026 •

edited

Loading

github-actions Bot commented Jun 2, 2026 •

edited

Loading