Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document total_shards_per_node as a recipe for hotspots #61306

Closed
ppf2 opened this issue Aug 19, 2020 · 5 comments · Fixed by #61942
Closed

Document total_shards_per_node as a recipe for hotspots #61306

ppf2 opened this issue Aug 19, 2020 · 5 comments · Fixed by #61942
Assignees
Labels
:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >docs General docs changes >enhancement Team:Distributed Meta label for distributed team Team:Docs Meta label for docs team

Comments

@ppf2
Copy link
Member

ppf2 commented Aug 19, 2020

index.routing.allocation.total_shards_per_node has become a very common recommendation for dealing with hotspots in a cluster that has suboptimal shard allocation. For example, shards of the most actively written indices accumulating on 1 or few nodes in the hot tier, a new warm node receiving more than its fair share of shards from high-shard-count indices, etc..).

Many users have hit this (I just had another user hit this yesterday). Unfortunately, they tend to find out when their clusters are already performing poorly with specific nodes getting hammered with requests. Until the two high hanging fruit issues are addressed in the future, it can be a nice addition to our documentation to provide the common use case of total_shards_per_node to prevent hotspots in the cluster (certainly, also indicating the tradeoff).

2 potential places to provide this guidance:

If we decide to make this doc improvement, we may want to do this around the time we enhance ILM to allow updating of total_shards_per_node in ILM phases.

@ppf2 ppf2 added >enhancement >docs General docs changes :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) needs:triage Requires assignment of a team area label labels Aug 19, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-docs (>docs)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Allocation)

@elasticmachine elasticmachine added Team:Distributed Meta label for distributed team Team:Docs Meta label for docs team labels Aug 19, 2020
@Leaf-Lin
Copy link
Contributor

Another proposal (a temporarily workaround before permenant fix becomes available) to avoid hotspots when adding new node is to:

  1. Use index shard allocation filtering in the templates, preventing that new indices can go to the new node (via _exclude).
  2. Add the new node.
  3. Await until the number of shards is balanced across the hosts.
  4. Remove the exclude restriction form the templates.
  5. Remove the index allocation exclude restriction from the already created indices - we have to be careful here so we do not remove any other allocation exclude rules the indices may have prior to this operation. It is usually as simple of just setting the value to null, but if there were any other allocation exclude rules that's not valid and we have to keep them.

@VimCommando
Copy link
Contributor

VimCommando commented Sep 11, 2020

From a Slack thread on this topic, let me throw this out as a part of the recipe:

File this under “Things that should be simple, but are not”. I think it ends up to be:
total_shards_per_node = (primary + (replicas * primary)) / (nodes - allowed_failures)
Where:

  • primary is number of primary shards
  • replicas is number of replica sets
  • nodes is number of participating data nodes (factor in hot/warm data tier)
  • allowed_failures is how many nodes can be down at a time

So for example, if you have 14 primary shards, 1 replica set, 12 nodes, and want to allow 2 node failures:

total_shards_per_node 
  = (primary + (replicas * primary)) / (nodes - allowed_failures)
  = (14 + (14 * 1)) / (12 - 2) 
  = 28 / 10
  = 2.8

which would round up to 3

@kunisen
Copy link
Contributor

kunisen commented Sep 11, 2020

I tend to think it a bit simpler like "set this number to replica # + 1" as a getting started.
Also maybe good to consider the combination w/ awareness (e.g. Rack/Zone) for advanced tuning if needed.

@jrodewig jrodewig removed the needs:triage Requires assignment of a team area label label Sep 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >docs General docs changes >enhancement Team:Distributed Meta label for distributed team Team:Docs Meta label for docs team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants