-
Notifications
You must be signed in to change notification settings - Fork 24.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix disk indicator symptoms, impacts and diagnosis (#90262)
This PR fixes the following: **Non-data, non-master calculation** _Bug_: A non-data, non-master node is a node that has at least one role that is not master or data node. _Fix_: A non-data, non-master node is a node that does not contains data and it is not a master. **Dedicated master calculation** _Bug_: A dedicated master node in the context of the disk health indicator is a node that has at least the master role. _Fix_: A dedicated master node in the context of the disk health indicator is a node that has the master role and cannot contain data. **Impact implementation** _Bug_: We list at most 3 impacts, one that is based on blocked indices or indices on unhealthy data nods, one for master and one for the rest. The current implementation wasn't covering this in all cases, for example, if there was a data node and no dedicated master node it wouldn't display the master node impact. _Fix_: Base the calculation of master and other role impacts, on the roles of the affected nodes. **Diagnosis implementation** _Bug_: The code was correct but the tests needed adjusting because they expected 1 extra diagnosis since a master node was identified as both a dedicated master node and a dedicated non-master, non-data node. _Fix_: Fixed the test to respect the hierarchy, data node > dedicated master node > other. **Symptom implementation** _Bug_: There were two messages produced as symptoms, one for the case where we have blocked indices but all the other nodes are healthy, and for all the other cases one that would list the roles that were running out of space. _Fix_: We change them to a messages that might have two parts: - Always mention if there are indices blocked and explain why, which can be one of the following two cases, (i) cluster is recovering **(this includes the case of all nodes appearing healthy because the cluster is moving shards away from the out of space nodes),** (ii) there are unhealthy data nodes that cannot recover without user intervention. - Always mention all the affected roles as symptom. _Examples_ > 3 indices are not allowed to be updated because the cluster was running out of disk space. The cluster is recovering and you should be able to update them within a few minutes. Furthermore 3 more nodes with roles: [master, ingest] are out of disk or running low on disk space. > 3 indices are not allowed to be updated because 2 nodes are out of disk or running low on disk space. Furthermore 3 nodes with roles: [master, ingest] are out of disk or running low on disk space. > 5 nodes with roles: [data, master, ingest] are out of disk or running low on disk space. **Bonus** - We enrich the impact about "other" nodes with the roles. - We enrich the affected indices with listing indicative 10 indices like we do with the shards availability indicator.
- Loading branch information
Showing
5 changed files
with
315 additions
and
251 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
pr: 90262 | ||
summary: Fix disk indicator impacts and diagnosis | ||
area: Health | ||
type: bug | ||
issues: [] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
55 changes: 55 additions & 0 deletions
55
server/src/main/java/org/elasticsearch/health/node/HealthIndicatorDisplayValues.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
/* | ||
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
* or more contributor license agreements. Licensed under the Elastic License | ||
* 2.0 and the Server Side Public License, v 1; you may not use this file except | ||
* in compliance with, at your election, the Elastic License 2.0 or the Server | ||
* Side Public License, v 1. | ||
*/ | ||
|
||
package org.elasticsearch.health.node; | ||
|
||
import org.elasticsearch.cluster.metadata.IndexMetadata; | ||
import org.elasticsearch.cluster.metadata.Metadata; | ||
import org.elasticsearch.cluster.node.DiscoveryNode; | ||
|
||
import java.util.Comparator; | ||
import java.util.Locale; | ||
import java.util.Set; | ||
|
||
import static java.util.stream.Collectors.joining; | ||
|
||
public class HealthIndicatorDisplayValues { | ||
|
||
public static String getNodeName(DiscoveryNode node) { | ||
if (node.getName() != null) { | ||
return String.format(Locale.ROOT, "[%s][%s]", node.getId(), node.getName()); | ||
} | ||
return String.format(Locale.ROOT, "[%s]", node.getId()); | ||
} | ||
|
||
public static String getTruncatedIndices(Set<String> indices, Metadata clusterMetadata) { | ||
final int maxIndices = 10; | ||
String truncatedIndicesString = indices.stream() | ||
.sorted(indicesComparatorByPriorityAndName(clusterMetadata)) | ||
.limit(maxIndices) | ||
.collect(joining(", ")); | ||
if (maxIndices < indices.size()) { | ||
truncatedIndicesString = truncatedIndicesString + ", ..."; | ||
} | ||
return truncatedIndicesString; | ||
} | ||
|
||
/** | ||
* Sorts index names by their priority first, then alphabetically by name. If the priority cannot be determined for an index then | ||
* a priority of -1 is used to sort it behind other index names. | ||
* @param clusterMetadata Used to look up index priority. | ||
* @return Comparator instance | ||
*/ | ||
public static Comparator<String> indicesComparatorByPriorityAndName(Metadata clusterMetadata) { | ||
// We want to show indices with a numerically higher index.priority first (since lower priority ones might get truncated): | ||
return Comparator.comparingInt((String indexName) -> { | ||
IndexMetadata indexMetadata = clusterMetadata.index(indexName); | ||
return indexMetadata == null ? -1 : indexMetadata.priority(); | ||
}).reversed().thenComparing(Comparator.naturalOrder()); | ||
} | ||
} |
Oops, something went wrong.