-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document total_shards_per_node as a recipe for hotspots #61306
Comments
Pinging @elastic/es-docs (>docs) |
Pinging @elastic/es-distributed (:Distributed/Allocation) |
Another proposal (a temporarily workaround before permenant fix becomes available) to avoid hotspots when adding new node is to:
|
From a Slack thread on this topic, let me throw this out as a part of the recipe: File this under “Things that should be simple, but are not”. I think it ends up to be:
So for example, if you have 14 primary shards, 1 replica set, 12 nodes, and want to allow 2 node failures:
which would round up to 3 |
I tend to think it a bit simpler like "set this number to replica # + 1" as a getting started. |
index.routing.allocation.total_shards_per_node
has become a very common recommendation for dealing with hotspots in a cluster that has suboptimal shard allocation. For example, shards of the most actively written indices accumulating on 1 or few nodes in the hot tier, a new warm node receiving more than its fair share of shards from high-shard-count indices, etc..).Many users have hit this (I just had another user hit this yesterday). Unfortunately, they tend to find out when their clusters are already performing poorly with specific nodes getting hammered with requests. Until the two high hanging fruit issues are addressed in the future, it can be a nice addition to our documentation to provide the common use case of
total_shards_per_node
to prevent hotspots in the cluster (certainly, also indicating the tradeoff).2 potential places to provide this guidance:
If we decide to make this doc improvement, we may want to do this around the time we enhance ILM to allow updating of
total_shards_per_node
in ILM phases.The text was updated successfully, but these errors were encountered: