Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Shards Capacity Health Indicator #94552

Merged
merged 29 commits into from
Mar 24, 2023
Merged

Conversation

HiDAl
Copy link
Contributor

@HiDAl HiDAl commented Mar 20, 2023

Introduces a new Health Indicator to check the cluster's health from the shards' capacity perspective.

It calculates the amount of available room for data and frozen groups, according to the following rules:

if data or frozen nodes have less than 5 shards -> RED
if data or frozen nodes have less than 10 shards -> YELLOW
otherwise -> GREEN

This is the output in case the cluster is unhealthy:

GET _health_report/shards_capacity
{
  "cluster_name": "runTask",
  "indicators": {
    "shards_capacity": {
      "status": "red",
      "symptom": "Cluster is close to reaching the configured maximum number of shards for data nodes.",
      "details": {
        "data": {
          "max_shards_in_cluster": 14,
          "current_used_shards": 10
        },
        "frozen": {
          "max_shards_in_cluster": 10,
          "current_used_shards": 0
        }
      },
      "impacts": [
        {
          "id": "elasticsearch:health:shards_capacity:impact:upgrade_blocked",
          "severity": 1,
          "description": "The cluster has too many used shards to be able to upgrade.",
          "impact_areas": [
            "deployment_management"
          ]
        },
        {
          "id": "elasticsearch:health:shards_capacity:impact:creation_of_new_indices_blocked",
          "severity": 1,
          "description": "The cluster is running low on room to add new shards. Adding data to new indices is at risk",
          "impact_areas": [
            "ingest"
          ]
        }
      ],
      "diagnosis": [
        {
          "id": "elasticsearch:health:shards_capacity:diagnosis:increase_max_shards_per_node",
          "cause": "Elasticsearch is about to reach the maximum number of shards it can host, based on your current settings.",
          "action": "Increase the value of [cluster.max_shards_per_node] cluster setting or remove data indices to clear up resources.",
          "help_url": "https://ela.st/fix-shards-capacity"
        }
      ]
    }
  }
}

relates #94079 and #91119

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v8.8.0 labels Mar 20, 2023
@HiDAl HiDAl added Team:Data Management Meta label for data/management team :Data Management/Health >feature and removed needs:triage Requires assignment of a team area label labels Mar 20, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @HiDAl, I've created a changelog YAML for you.

@HiDAl HiDAl marked this pull request as draft March 20, 2023 14:13
@HiDAl HiDAl marked this pull request as ready for review March 21, 2023 17:43
@HiDAl HiDAl requested a review from andreidan March 21, 2023 17:43
@HiDAl
Copy link
Contributor Author

HiDAl commented Mar 21, 2023

@elasticsearchmachine run elasticsearch-ci/part-3

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Pablo

This generally looks great, I left a few rather minor comments

@tylerperk can you please go through the copy (as most of the output of the API is presented in the UI)
@shubhaat would you like to have a go through the copy?

@andreidan
Copy link
Contributor

add tests to public methods
remove method which could lead to confusions
this makes the method generic enough, so can easily test the internal logic
@HiDAl HiDAl changed the title Add new ShardLimits Health Indicator Service Add new Shards Capacity Health Indicator Mar 23, 2023
@HiDAl
Copy link
Contributor Author

HiDAl commented Mar 23, 2023

@andreidan I did rename the indicator to ShardsCapacity

  1. I didn't rename the class ShardLimitsValidator because > 30 files are using the class, hence this PR will easily become a mess. I'll rename it in a separate PR.
  2. didn't rename the record ShardLimitsMetadata because it actually contains the configured limits.

@HiDAl
Copy link
Contributor Author

HiDAl commented Mar 23, 2023

@elasticsearchmachine run elasticsearch-ci/part-3

@andreidan
Copy link
Contributor

@HiDAl the ShardLimitsValidator and ShardLimitsMetadata can stay named as they are IMO (they're not user facing and are extensively documented )

Can you please update the PR description to reflect the latest state?

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating on this Pablo. This LGTM 🚀 - left a few very minor suggestions

@HiDAl
Copy link
Contributor Author

HiDAl commented Mar 24, 2023

@andreidan I've applied all the recommended changes :)

@HiDAl
Copy link
Contributor Author

HiDAl commented Mar 24, 2023

@elasticmachine update branch

@HiDAl
Copy link
Contributor Author

HiDAl commented Mar 24, 2023

@elasticsearchmachine run elasticsearch-ci/part-1

@HiDAl HiDAl merged commit 5c353b0 into elastic:main Mar 24, 2023
@HiDAl HiDAl deleted the new-SL-indicator branch March 24, 2023 14:05
@HiDAl HiDAl added the cloud-deploy Publish cloud docker image for Cloud-First-Testing label Mar 24, 2023
saarikabhasi pushed a commit to saarikabhasi/elasticsearch that referenced this pull request Apr 10, 2023
Introduces a new Health Indicator to check the cluster's health from the shards' capacity perspective.

It calculates the amount of available room for data and frozen groups, according to the following rules:

```
if data or frozen nodes have less than 5 shards -> RED
if data or frozen nodes have less than 10 shards -> YELLOW
otherwise -> GREEN
```
HiDAl added a commit to HiDAl/elasticsearch that referenced this pull request Apr 12, 2023
In elastic#94552 was introduced a new Health Service to check the shards
capacity of the cluster which will replace this Deprecation Check.
HiDAl added a commit to HiDAl/elasticsearch that referenced this pull request Apr 12, 2023
In elastic#94552 was introduced a new Health Service which checks the shards
capacity of the cluster. This method is replacing the Old
`ClusterDeprecationChecks#checkShard` used to validate the feasibility
of upgrading a cluster.
HiDAl added a commit that referenced this pull request Jun 27, 2023
In #94552 was introduced a new Health Service which checks the shards
capacity of the cluster. This method is replacing the Old
`ClusterDeprecationChecks#checkShard` used to validate the feasibility
of upgrading a cluster.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cloud-deploy Publish cloud docker image for Cloud-First-Testing :Data Management/Health >feature Team:Data Management Meta label for data/management team v8.8.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants