-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default allocation of new shards create hotspots #45851
Comments
Pinging @elastic/es-distributed |
As I said, this is a different issue, because when shard allocation was disabled on the unbalanced node because of the low watermark, primary shards of the same index still land on the same node, even when all available nodes are balanced. The discussion emphasize on an upper bound of ressource use, while I'm suggesting a better heuristic to choose where to allocate each new shards. With the solution proposed, there would at most guarantee spread of new shards between every nodes, but won't help preventing a large index with multiple shard to land on a single node. Let me give a more detailed situation: So there is 30 shards to spread between 5 nodes, but 14 shards hold 95% of the traffic and 75% come from a single index. With any hard limit on ressources, If instead, we add a simple rule to always split shards from the same indice evenly accross nodes, Of course, not everyone has the same setup as me. Let's say that someone with as much node and new shards per day as me, has indexes with only 1 primary and 1 replica, my rule won't help them, but it won't hurt them either. More allocation rules could still be used around this one without conflict. I think this is the easiest approach to solve my issue and help many others and I still think this should be a default config. |
This very much looks like a duplicate of #17213. You seem to be saying that your nodes cannot handle more than This setting applies per index, so you can set it only on the index you want to spread out more. It will not directly cause shards of older indices to move, although you may see a small amount of extra shard movement to fix any resulting imbalance. I think you might be confused about the difference between a primary and a replica in terms of their resource requirements. They both require the same resources. It does not matter whether a node has 5 primaries or 5 replicas of an index, it will see the same load. (For completeness, I should add that this is sometimes not the case, but it is true when indexing log data). You also talk about "a reallocation storm that I want to prevent at all cost". If shard movements are damaging to your cluster stability then you may have misconfigured your cluster (e.g. set I think it would be best to continue this discussion on the forums, so I'm closing this issue. If the discussion in the forums identifies something that isn't already covered by #17213 then we can always reopen this issue. |
My proposed solution is quite different.
I could, but that would cause undesired tradeoff while I think it could be easily avoidable.
I do understand ressource requirements of both primaries and replicas. While replicas are (most of the time?) evenly spread accross the cluster, primaries aren't, and 50% of the total traffic indexed using only 20% of the ressources create a hotspot.
In this case I admit I didn't understood properly how ES handle reallocations. The only reallocation problem we have is caused by the smaller node constantly moving away old shards when it hits the low watermark, but we solved this issue by blocking bigger indexes from writing to it. I think a soft rule spreading evenly shards from the same indice accross nodes should be the default configuration, as I don't see how could it be an unwanted behaviour, except for exceptionnal cases which already need custom allocation rules. I think this issue is closer to #43350 , but with a simpler proposal. It's also simpler than #17213. A new allocation algorithm could use the current one to allocate the first shard from an indice on a node, but then, after applying the configured hard limits, choose the node with the smallest number of shards from this indice and use the current algorithm to choose when multiple nodes has the lowest number of shards. |
We have a big issue of hotspot on our cluster.
5 data nodes, 670 indexes, 1720 shards and 40 billion documents of log data, and growing fast. We're running elasticsearch version 7.3.0.
We have multiple daily indices created each day indexing 4000 documents per second or more together.
One of them is heavier than every others, indexing alone 3000 docs/sec. Because of this, we split it into 5 primary shards (and 1 replica each), each holding about 10-20Gb.
But right now, it's hard for the cluster to keep up the pace because instead of sending 2 shards per node, all 5 primary shards are sent to the same node and the replicas are then split between the 4 remaining, even sometime skipping one.
The number of shard per node is balanced for 4 of the nodes (400+/-1), but the last one has less (120) because he has a lot less space than the other nodes and keep hitting the low watermark (85%).
The problem couldn't only be caused by the cluster trying to rebalance the shards evenly. Yesterday morning because of the disk space watermark hit on the 5th node, not a single new shard has been assigned to it, but all 5 primary shards of the heavy index started on the 2nd node.
As a default configuration, elasticsearch should always try to send new shards of the same index evenly accross all the nodes. This would ensure more than anything a minimum share of network traffic, CPU and I/O, because the heaviest index would always be splitted, and such setting wouldn't prevent shards from different indexes to spread evenly too.
I could still use
index.routing.allocation.total_shards_per_node
and setting it to something like 3, but doing so would also force older shards to move, potentially causing a reallocation storm that I want to prevent at all cost.The balancing heuristic has already been modified with
cluster.routing.allocation.balance.shard
at 0.3 andcluster.routing.allocation.balance.index
at 0.7. I fear that setting the balance index higher would result in the same reallocation storm.The text was updated successfully, but these errors were encountered: