Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-expand-replicas causes problems in a small cluster rolling restart #95104

Open
DaveCTurner opened this issue Apr 8, 2023 · 3 comments
Open
Labels
>bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Meta label for distributed team

Comments

@DaveCTurner
Copy link
Contributor

Many indices (particularly system indices) use auto_expand_replicas: 0-1 to avoid unassigned shards in a one-node cluster. However, when restarting a node in a cluster with just two nodes in a data tier this setting causes undesirable behaviour:

  • the cluster may remain in green health even though really it has unassigned shards and is running with less resilience than intended

  • replicas on the restarting node are effectively destroyed (at least, their retention leases are dropped) which forces a full file-based recovery

There's room for improvement here. Could we use the node-shutdown or desired-nodes features to report accurate health, and avoid file-based recovery, for auto-expand replicas indices while a node is restarting?


Workaround

If you have a small cluster but never intend to shrink its hot/warm/content tiers to a single node, set number_of_replicas: 1 instead of using auto_expand_replicas.

@DaveCTurner DaveCTurner added >bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) labels Apr 8, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Apr 8, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@neox2811

This comment was marked as off-topic.

@DaveCTurner

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests

3 participants