Skip to content

feat(search): cluster fitness diagnostic — REST endpoint + ops CLI#28447

Open
mohityadav766 wants to merge 4 commits into
mainfrom
sizing-rationale-empty-indices
Open

feat(search): cluster fitness diagnostic — REST endpoint + ops CLI#28447
mohityadav766 wants to merge 4 commits into
mainfrom
sizing-rationale-empty-indices

Conversation

@mohityadav766
Copy link
Copy Markdown
Member

Describe your changes:

Adds a search cluster fitness diagnostic so devops can quickly judge whether
the configured Elasticsearch/OpenSearch cluster is sized for the data
OpenMetadata is storing. Exposed two ways: `GET /v1/system/search/fitness`
(admin-only) and `openmetadata-ops search-fitness` (`--json` for raw output).

I built this because after an AWS-managed OpenSearch migration, sizing
issues (oversized shards, disk watermarks, undersized small instances being
bombarded by search/index traffic, fried reindex jobs) are hard to diagnose
in one shot — this consolidates the signals into a structured report with
recommendations.

Type of change:

  • New feature

High-level design:

A single analyzer (`SearchClusterFitnessAnalyzer`) probes the cluster via raw
REST GETs through a small `SearchRestProbe` helper that works against both
ES `Rest5Client` and OS `OpenSearchGenericClient`. All analysis walks Jackson
`JsonNode`, so engine-specific typed responses aren't needed and missing
fields on managed clusters (AWS-restricted endpoints) degrade gracefully —
captured in `inaccessibleMetrics` rather than failing the run.

Signals produced (each carries severity + observed + threshold + rationale + recommendation):
cluster status, pending tasks, unassigned shards (primary vs replica
breakdown with `/_cluster/allocation/explain` reason inlined), shard budget,
shards-per-heap-GB density, dedicated-master recommendation at scale,
per-OpenMetadata-index data footprint (size + avg doc bytes — depth, not just
count), oversized/over-sharded indices, disk low/high/flood watermarks per
node, JVM heap / CPU pressure, write/search thread-pool queue depth and
rejections, circuit-breaker trips. Sizing guidance applies the AWS
OpenSearch storage formula `source × (1 + replicas) × 1.45` with the +1
data-node buffer and the ≤25 shards/GB-heap rule baked into
`SearchClusterFitnessRules` constants (each documented with the AWS source).

OpenMetadata-managed indices are identified by canonical name, prefix
(catches versioned indices like `*rebuild`), and alias intersection
from `/_cat/aliases` — so the report works against reindex-alias-swap state.
When zero OM indices match, the report self-diagnoses: lists the top-20
indices actually present on the cluster, emits an `openmetadata.indices_missing`
signal pointing to clusterAlias config, and reports
sizing as `INSUFFICIENT_DATA` instead of a fake recommendation. Verdict
rolls up as READY / STRAINED / OVERLOADED / UNKNOWN.

Also bundles a one-line `conf/openmetadata.yaml` change to default
`clusterAlias` to `"openmetadata"` (matches the prefix actually used by the
docker-compose stack in this repo, and the prefix the fitness tool then
matches against by default).

Tests:

Use cases covered

  • Admin GETs `/v1/system/search/fitness` against a live test container and
    receives a structured report with signals, indices, sizing guidance.
  • Non-admin is rejected.
  • Empty-OM-indices case populates `otherIndicesOnCluster` and emits the
    `openmetadata.indices_missing` signal (verified manually against a fresh
    single-node OS 3.4 cluster).

Backend integration tests

  • Added `SearchClusterFitnessResourceIT` in `openmetadata-integration-tests`.

Manual testing performed

  1. Ran `openmetadata-ops search-fitness` against a single-node OS 3.4 dev
    cluster with 64 OpenMetadata indices, 245k docs, 203 MB primary data.
  2. Confirmed report flagged the real issues: 172 shards on 1 GB heap
    (>7× AWS shards/GB limit), 82.9% disk usage (~2% from low watermark),
    and correctly classified 164 unassigned shards as expected single-node
    replica state (INFO, not FAIL).
  3. Verified `--json` flag emits the full `SearchClusterFitnessReport` POJO.

UI screen recording / screenshots:

Not applicable.

Checklist:

  • I have read the CONTRIBUTING document.
  • I have commented on my code, particularly in hard-to-understand areas.
  • I have added tests (integration) and listed them above.

🤖 Generated with Claude Code

Adds GET /v1/system/search/fitness and an `openmetadata-ops search-fitness`
subcommand that produce a structured report on whether the configured
Elasticsearch/OpenSearch cluster is sized for the data OpenMetadata is
storing. Checks per-index data footprint (size + avg doc bytes), disk
watermarks, heap/CPU, thread-pool rejections, circuit breakers, shard
layout, and shard-density vs heap, with AWS OpenSearch sizing guidance
(1.45 storage overhead, ≤25 shards/GB heap, +1 data-node buffer,
dedicated-master at ≥10 data nodes) baked into the recommendations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 26, 2026 19:36
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels May 26, 2026
Comment thread conf/openmetadata.yaml Outdated
…agger

The class-level @hidden already covers /v1/system/*, but adding @hidden
explicitly on /search/fitness plus hidden=true on @operation guarantees
the diagnostic stays out of any generated client SDK or swagger UI even
if the class-level annotation is ever removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a search cluster “fitness” diagnostic to help operators evaluate whether the configured Elasticsearch/OpenSearch cluster is appropriately sized for the current OpenMetadata search footprint, exposed via an admin-only REST endpoint and an openmetadata-ops CLI command.

Changes:

  • Introduces SearchClusterFitnessAnalyzer that probes ES/OS via raw REST GETs and emits a structured SearchClusterFitnessReport (signals, node/index footprints, sizing guidance).
  • Adds GET /v1/system/search/fitness (admin-only) and openmetadata-ops search-fitness (--json optional) for retrieving the report.
  • Updates default elasticsearch.clusterAlias in conf/openmetadata.yaml and adds an integration test for the endpoint.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
openmetadata-service/src/main/java/org/openmetadata/service/util/OpenMetadataOperations.java Adds search-fitness ops subcommand and report printing/JSON output.
openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/SizingGuidance.java New POJO for capacity guidance output.
openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/SearchRestProbe.java New helper to issue raw GETs to ES/OS and parse responses to JsonNode.
openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/SearchClusterFitnessRules.java Adds sizing/threshold constants (heap, shards, disk watermarks, etc.).
openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/SearchClusterFitnessReport.java New top-level report model for the diagnostic output.
openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/SearchClusterFitnessAnalyzer.java Core analyzer implementation that probes the cluster and generates signals + sizing guidance.
openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/NodeFootprint.java New per-node footprint model (heap/disk/CPU/threadpool/breakers).
openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/IndexFootprint.java New per-index footprint model (size/docs/shards/health).
openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/FitnessVerdict.java New enum for overall verdict (READY/STRAINED/OVERLOADED/UNKNOWN).
openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/FitnessSignal.java New model for individual fitness signals (severity/threshold/recommendation).
openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/FitnessSeverity.java New enum for signal severity.
openmetadata-service/src/main/java/org/openmetadata/service/resources/system/SystemResource.java Adds admin-only REST endpoint /v1/system/search/fitness.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/SearchClusterFitnessResourceIT.java Adds integration test for the new endpoint (admin happy path).
conf/openmetadata.yaml Changes default elasticsearch.clusterAlias to "openmetadata".

Comment on lines +406 to +411
private void checkClusterStatus(
List<FitnessSignal> signals, JsonNode health, List<String> inaccessible) {
if (health == null) {
inaccessible.add("/_cluster/health");
return;
}
Comment on lines +556 to +567
String reason = text(explain, "unassigned_info");
if (reason == null) {
JsonNode info = explain.get("unassigned_info");
if (info != null) {
String r = text(info, "reason");
String details = text(info, "details");
if (r != null) {
result = details == null ? r : r + " (" + details + ")";
}
}
} else {
result = reason;
Comment on lines +638 to +642
nodes.stream()
.map(NodeFootprint::getHeapMaxBytes)
.filter(java.util.Objects::nonNull)
.mapToLong(Long::longValue)
.sum();
Comment on lines +1162 to +1165
int dataNodes =
nodes.isEmpty()
? Math.max(1, report.getDataNodes() == null ? 1 : report.getDataNodes())
: nodes.size();
Comment on lines +73 to +76
assertThat(sizing.has("recommendedDataNodes")).isTrue();
assertThat(sizing.has("recommendedHeapPerNodeBytes")).isTrue();
assertThat(sizing.has("rationale")).isTrue();
assertThat(sizing.get("rationale").asText()).isNotBlank();
Comment on lines +39 to +51
@Test
void admin_can_fetch_fitness_report_with_signals_and_sizing() throws Exception {
final OpenMetadataClient client = SdkClients.adminClient();

final String body =
client
.getHttpClient()
.executeForString(
HttpMethod.GET,
"/v1/system/search/fitness",
null,
RequestOptions.builder().build());

- conf/openmetadata.yaml: revert clusterAlias default to "" — the change
  to "openmetadata" was a breaking change for any deployment upgrading
  without setting ELASTICSEARCH_CLUSTER_ALIAS (indices would silently
  appear missing). The fitness analyzer reads the configured alias
  via SearchRepository.getClusterAlias() and works against any value.

- analyze(): split into collectClusterSnapshot() / initReport() /
  runChecks() / finalizeReport() with a private ClusterSnapshot holder.
  Each method now fits the 15-line guideline.

- Lazy-fetch /_cat/shards and /_cluster/allocation/explain — only
  hit when unassigned_shards > 0. Avoids the guaranteed 400 from
  allocation/explain on healthy clusters and the multi-MB shard list
  on large clusters.

- extractAllocationExplainReason: drop the buggy asText("unassigned_info")
  call (asText() on an object node returns ""). Walk into the object
  directly via .isObject() check.

- Treat Jackson NullNode/MissingNode as inaccessible in checkClusterStatus
  and checkDedicatedMaster via new isUsable() helper. Probe returns
  NullNode on failure; previous null-only guard let degraded responses
  through.

- checkShardsPerHeapGb and buildSizingGuidance now filter to data-role
  nodes (data, data_*) before summing heap or counting nodes. Including
  master/coordinator heap understated shard density and over-estimated
  available heap per data node.

- SearchRestProbe.elasticGet: guard the Rest5Client cast with instanceof
  so a future ES client swap surfaces a clear LOG.debug instead of
  ClassCastException-swallowed-by-catch.

- SearchClusterFitnessResourceIT:
  - Loosen sizing assertions so INSUFFICIENT_DATA verdict (fresh cluster,
    no OM indices) does not fail the test on missing recommendedDataNodes
    / recommendedHeapPerNodeBytes — those are intentionally null in that
    case.
  - Add non_admin_cannot_fetch_fitness_report covering 401/403 rejection
    for a DataConsumer token.
  - Add unauthenticated_request_is_rejected covering the 401 path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 26, 2026 19:49

private void checkPendingTasks(
List<FitnessSignal> signals, JsonNode health, List<String> inaccessible) {
if (health == null || !health.has("number_of_pending_tasks")) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: Inconsistent null-guard: isUsable() vs == null across check methods

The new isUsable() helper correctly handles the NullNode returned by SearchRestProbe.get() on failure, and it's applied in checkClusterStatus, checkUnassignedShards, and checkDedicatedMaster. However, checkPendingTasks (line 492) still uses the old health == null guard. While this doesn't cause a runtime bug (NullNode.has() returns false, so the method exits early anyway), it means a failed /_cluster/health probe won't contribute to inaccessibleMetrics from checkPendingTasks — unlike the consistent handling in checkClusterStatus. Since both methods share the same health node from the snapshot, this is cosmetic, but using isUsable() uniformly would be clearer.

Replace health == null with isUsable(health) for consistency with other check methods.:

if (!isUsable(health) || !health.has("number_of_pending_tasks")) {
  return;
}
  • Apply fix

Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎

Comment on lines +149 to +163
private int countDataNodes(List<NodeFootprint> nodes, SearchClusterFitnessReport report) {
long withDataRole =
nodes.stream()
.filter(n -> n.getRoles() != null)
.filter(
n -> n.getRoles().stream().anyMatch(r -> r.equals("data") || r.startsWith("data_")))
.count();
int result;
if (withDataRole > 0) {
result = (int) withDataRole;
} else if (report.getDataNodes() != null && report.getDataNodes() > 0) {
result = report.getDataNodes();
} else {
result = Math.max(1, nodes.size());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: Duplicate data-node filtering logic in countDataNodes and filterDataNodes

The lambda n.getRoles().stream().anyMatch(r -> r.equals("data") || r.startsWith("data_")) is duplicated between countDataNodes (line 154) and filterDataNodes (line 1405). Extracting a private static boolean isDataNode(NodeFootprint n) predicate would reduce duplication and make it easier to update the role-matching logic in one place.

Extract shared predicate to eliminate duplicated role-matching logic.:

private static boolean isDataNode(NodeFootprint n) {
  return n.getRoles() != null
      && n.getRoles().stream().anyMatch(r -> r.equals("data") || r.startsWith("data_"));
}

private List<NodeFootprint> filterDataNodes(List<NodeFootprint> nodes) {
  return nodes.stream().filter(SearchClusterFitnessAnalyzer::isDataNode).toList();
}

private int countDataNodes(List<NodeFootprint> nodes, SearchClusterFitnessReport report) {
  long withDataRole = nodes.stream().filter(SearchClusterFitnessAnalyzer::isDataNode).count();
  // ... rest unchanged
}
  • Apply fix

Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 26, 2026

Code Review 👍 Approved with suggestions 5 resolved / 7 findings

Introduces a search cluster fitness diagnostic tool for both REST API and CLI to identify sizing and performance bottlenecks. Refactor identified minor inconsistencies in null-guarding and duplicate data-node filtering logic that should be unified for improved maintainability.

💡 Quality: Inconsistent null-guard: isUsable() vs == null across check methods

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/SearchClusterFitnessAnalyzer.java:492

The new isUsable() helper correctly handles the NullNode returned by SearchRestProbe.get() on failure, and it's applied in checkClusterStatus, checkUnassignedShards, and checkDedicatedMaster. However, checkPendingTasks (line 492) still uses the old health == null guard. While this doesn't cause a runtime bug (NullNode.has() returns false, so the method exits early anyway), it means a failed /_cluster/health probe won't contribute to inaccessibleMetrics from checkPendingTasks — unlike the consistent handling in checkClusterStatus. Since both methods share the same health node from the snapshot, this is cosmetic, but using isUsable() uniformly would be clearer.

Replace `health == null` with `isUsable(health)` for consistency with other check methods.
if (!isUsable(health) || !health.has("number_of_pending_tasks")) {
  return;
}
💡 Quality: Duplicate data-node filtering logic in countDataNodes and filterDataNodes

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/SearchClusterFitnessAnalyzer.java:149-163 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/SearchClusterFitnessAnalyzer.java:1402-1406

The lambda n.getRoles().stream().anyMatch(r -> r.equals("data") || r.startsWith("data_")) is duplicated between countDataNodes (line 154) and filterDataNodes (line 1405). Extracting a private static boolean isDataNode(NodeFootprint n) predicate would reduce duplication and make it easier to update the role-matching logic in one place.

Extract shared predicate to eliminate duplicated role-matching logic.
private static boolean isDataNode(NodeFootprint n) {
  return n.getRoles() != null
      && n.getRoles().stream().anyMatch(r -> r.equals("data") || r.startsWith("data_"));
}

private List<NodeFootprint> filterDataNodes(List<NodeFootprint> nodes) {
  return nodes.stream().filter(SearchClusterFitnessAnalyzer::isDataNode).toList();
}

private int countDataNodes(List<NodeFootprint> nodes, SearchClusterFitnessReport report) {
  long withDataRole = nodes.stream().filter(SearchClusterFitnessAnalyzer::isDataNode).count();
  // ... rest unchanged
}
✅ 5 resolved
Bug: Default clusterAlias change breaks existing deployments

📄 conf/openmetadata.yaml:486
Changing clusterAlias default from "" to "openmetadata" in conf/openmetadata.yaml is a breaking change. IndexMapping.getIndexName() prepends clusterAlias + "_" when the alias is non-empty. Existing deployments that never set ELASTICSEARCH_CLUSTER_ALIAS have indices named like table_search_index. After this change, OpenMetadata will look for openmetadata_table_search_index — indices won't be found, searches will fail, and reindex will create duplicate indices under the new prefix.

This affects any deployment upgrading that doesn't explicitly set the env var. The PR description says this matches the docker-compose stack, but it does NOT match the previous default behavior for all other deployment methods.

Bug: /_cluster/allocation/explain called unconditionally on every run

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/SearchClusterFitnessAnalyzer.java:63 📄 openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/SearchClusterFitnessAnalyzer.java:96-97
At line 63, probe.get("/_cluster/allocation/explain") is called eagerly during every fitness check. When there are no unassigned shards (the healthy case), Elasticsearch/OpenSearch returns HTTP 400 with {"error": "unable to find any unassigned shards to explain"}. While SearchRestProbe.get() catches the exception and returns NullNode, this means:

  1. Every healthy-cluster fitness check logs a DEBUG-level exception (noisy in production)
  2. The 400 response may be counted by monitoring/APM tools as an error
  3. It wastes a round-trip

The allocation explain should only be called when unassigned_shards > 0.

Quality: SearchClusterFitnessAnalyzer.analyze() exceeds method length guidelines

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/SearchClusterFitnessAnalyzer.java:52-66
The analyze() method (lines 52-115) is ~63 lines and orchestrates many concerns. Per project coding standards, methods should be max 15 lines. While complex orchestrators sometimes need more room, this could be broken into smaller steps (e.g., collectClusterData(), runChecks(), buildReport()) for readability and testability.

Performance: All cluster endpoints fetched eagerly even when most data is unused

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/SearchClusterFitnessAnalyzer.java:56-65
Lines 56-65 issue 9 HTTP requests to the search cluster sequentially before any analysis begins. Several may not be needed (e.g., _cat/shards is only used when unassigned > 0, _cluster/settings only for watermark/shard-budget checks). For large clusters, _cat/shards?format=json can return megabytes of data. Consider lazy-fetching endpoints only when the preceding check indicates they're needed.

Edge Case: ElasticSearch low-level client cast may fail with newer ES versions

📄 openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/SearchRestProbe.java:67-69
In SearchRestProbe.elasticGet() (line 68-69), the code casts searchClient.getLowLevelClient() to Rest5Client. If the client implementation changes or if getLowLevelClient() returns a different type, this will throw a ClassCastException at runtime. The catch in get() will handle it, but the endpoint will silently be marked as inaccessible with no clear error message indicating a code incompatibility rather than a network issue.

🤖 Prompt for agents
Code Review: Introduces a search cluster fitness diagnostic tool for both REST API and CLI to identify sizing and performance bottlenecks. Refactor identified minor inconsistencies in null-guarding and duplicate data-node filtering logic that should be unified for improved maintainability.

1. 💡 Quality: Inconsistent null-guard: isUsable() vs == null across check methods
   Files: openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/SearchClusterFitnessAnalyzer.java:492

   The new `isUsable()` helper correctly handles the `NullNode` returned by `SearchRestProbe.get()` on failure, and it's applied in `checkClusterStatus`, `checkUnassignedShards`, and `checkDedicatedMaster`. However, `checkPendingTasks` (line 492) still uses the old `health == null` guard. While this doesn't cause a runtime bug (NullNode.has() returns false, so the method exits early anyway), it means a failed `/_cluster/health` probe won't contribute to `inaccessibleMetrics` from `checkPendingTasks` — unlike the consistent handling in `checkClusterStatus`. Since both methods share the same `health` node from the snapshot, this is cosmetic, but using `isUsable()` uniformly would be clearer.

   Fix (Replace `health == null` with `isUsable(health)` for consistency with other check methods.):
   if (!isUsable(health) || !health.has("number_of_pending_tasks")) {
     return;
   }

2. 💡 Quality: Duplicate data-node filtering logic in countDataNodes and filterDataNodes
   Files: openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/SearchClusterFitnessAnalyzer.java:149-163, openmetadata-service/src/main/java/org/openmetadata/service/search/fitness/SearchClusterFitnessAnalyzer.java:1402-1406

   The lambda `n.getRoles().stream().anyMatch(r -> r.equals("data") || r.startsWith("data_"))` is duplicated between `countDataNodes` (line 154) and `filterDataNodes` (line 1405). Extracting a `private static boolean isDataNode(NodeFootprint n)` predicate would reduce duplication and make it easier to update the role-matching logic in one place.

   Fix (Extract shared predicate to eliminate duplicated role-matching logic.):
   private static boolean isDataNode(NodeFootprint n) {
     return n.getRoles() != null
         && n.getRoles().stream().anyMatch(r -> r.equals("data") || r.startsWith("data_"));
   }
   
   private List<NodeFootprint> filterDataNodes(List<NodeFootprint> nodes) {
     return nodes.stream().filter(SearchClusterFitnessAnalyzer::isDataNode).toList();
   }
   
   private int countDataNodes(List<NodeFootprint> nodes, SearchClusterFitnessReport report) {
     long withDataRole = nodes.stream().filter(SearchClusterFitnessAnalyzer::isDataNode).count();
     // ... rest unchanged
   }

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Comment on lines +985 to +993
} else if (used
>= lowWatermark * (1 - SearchClusterFitnessRules.WATERMARK_PROXIMITY_FRACTION)) {
severity = FitnessSeverity.WARN;
recommendation =
"Approaching low watermark. Provision more disk during the next maintenance window.";
} else if (used >= SearchClusterFitnessRules.DISK_USAGE_FAIL_PERCENT) {
severity = FitnessSeverity.FAIL;
} else if (used >= SearchClusterFitnessRules.DISK_USAGE_WARN_PERCENT) {
severity = FitnessSeverity.WARN;
}
Set<String> canonicalNames = openMetadataCanonicalNames(clusterAlias);
Set<String> openMetadataAliases = openMetadataAliases(clusterAlias);
Set<String> indicesWithOmAlias = indicesCarryingOmAlias(catAliases, openMetadataAliases);
List<FitnessSignal> signals,
List<NodeFootprint> nodes,
JsonNode clusterSettings,
List<String> inaccessible) {
s.rootInfo = probe.get("/");
s.clusterHealth = probe.get("/_cluster/health");
s.clusterStats = probe.get("/_cluster/stats");
s.nodesStats = probe.get("/_nodes/stats");
Comment on lines +249 to +255
@Command(
name = "search-fitness",
description =
"Diagnose whether the configured Elasticsearch/OpenSearch cluster is sized for the "
+ "current OpenMetadata data footprint. Reports per-index size + avg doc bytes, "
+ "disk watermarks, heap/CPU, thread-pool rejections, circuit breakers, shard "
+ "layout, and capacity recommendations.")
@sonarqubecloud
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (11 flaky)

✅ 4251 passed · ❌ 0 failed · 🟡 11 flaky · ⏭️ 88 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
✅ Shard 2 803 0 0 9
🟡 Shard 3 801 0 2 8
🟡 Shard 4 841 0 4 12
🟡 Shard 5 718 0 1 47
🟡 Shard 6 790 0 3 8
🟡 11 flaky test(s) (passed on retry)
  • Features/CustomizeDetailPage.spec.ts › Glossary Term - customization should work (shard 1, 1 retry)
  • Features/KnowledgeCenter.spec.ts › Article mentions in description should working for Knowledge Center (shard 3, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Pages/CustomProperties.spec.ts › Entity Reference (shard 4, 1 retry)
  • Pages/CustomProperties.spec.ts › Should display custom properties for apiCollection in right panel (shard 4, 1 retry)
  • Pages/CustomProperties.spec.ts › Date (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Dashboard (shard 4, 1 retry)
  • Pages/ExplorePageRightPanel_KnowledgeCenter.spec.ts › Should remove user owner for knowledgeCenter (shard 5, 1 retry)
  • Pages/Lineage/DataAssetLineage.spec.ts › Column lineage for mlModel -> topic (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/ServiceEntity.spec.ts › User as Owner Add, Update and Remove (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants