Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster concurrent rebalance ignored on node allocation exclusion #14057

Closed
lb425 opened this issue Oct 10, 2015 · 2 comments

Comments

Projects
None yet
4 participants
@lb425
Copy link

commented Oct 10, 2015

The setting cluster.routing.allocation.cluster_concurrent_rebalance appears to be ignored when moving shard off a node that has been excluded from allocation with the setting cluster.routing.allocation.exclude._ip .

ES Version: 1.7.2

Repeatedly experienced with the following steps:
set cluster.routing.allocation.cluster_concurrent_rebalance to 1
exclude node from allocation with cluster.routing.allocation.exclude._ip

This results in 80 shards rebalancing at a time until all shards are removed from the excluded node.

@clintongormley

This comment has been minimized.

Copy link
Member

commented Oct 16, 2015

Investigation required

@ywelsch ywelsch self-assigned this Oct 22, 2015

@ywelsch ywelsch removed the help wanted label Oct 22, 2015

@ywelsch

This comment has been minimized.

Copy link
Contributor

commented Oct 22, 2015

I can confirm the issue. The root cause is that rebalancing constraints are not taken into consideration when shards that can no longer remain on a node need to be moved. AFAICS, it affects the following options:

  • cluster.routing.allocation.cluster_concurrent_rebalance
  • cluster.routing.allocation.allow_rebalance
  • cluster.routing.rebalance.enable
  • index.routing.rebalance.enable
  • rebalance_only_when_active

ywelsch added a commit to ywelsch/elasticsearch that referenced this issue Oct 26, 2015

ywelsch added a commit that referenced this issue Oct 26, 2015

ywelsch added a commit that referenced this issue Oct 26, 2015

bleskes added a commit to bleskes/elasticsearch that referenced this issue Apr 13, 2016

Rebalancing policy shouldn't prevent hard allocation decisions
elastic#14259 added a check to honor rebalancing policies (i.e., rebalance only on green state) when moving shards due to changes in allocation filtering rules. The rebalancing policy is there to make sure that we don't try to even out the number of shards per node when we are still missing shards. However, it should not interfere with explicit user commands (allocation filtering) or things like the disk threshold wanting to move shards because of a node hitting the high water mark.

elastic#14259 was done to address elastic#14057 where people reported that using allocation filtering caused many shards to be moved at once. This is however a none issue - with 1.7 (where the issue was reported) and 2.x, we protect recovery source nodes by limitting the number of concurrent data streams they can open (i.e., we can have many recoveries, but they will be throttled). In 5.0 we came up with a simpler and more understandable approach where we have a hard limit on the number of outgoing recoveries per node (on top of the incoming recoveries we already had).

bleskes added a commit that referenced this issue Apr 13, 2016

Rebalancing policy shouldn't prevent hard allocation decisions (#17698)
#14259 added a check to honor rebalancing policies (i.e., rebalance only on green state) when moving shards due to changes in allocation filtering rules. The rebalancing policy is there to make sure that we don't try to even out the number of shards per node when we are still missing shards. However, it should not interfere with explicit user commands (allocation filtering) or things like the disk threshold wanting to move shards because of a node hitting the high water mark.

#14259 was done to address #14057 where people reported that using allocation filtering caused many shards to be moved at once. This is however a none issue - with 1.7 (where the issue was reported) and 2.x, we protect recovery source nodes by limitting the number of concurrent data streams they can open (i.e., we can have many recoveries, but they will be throttled). In 5.0 we came up with a simpler and more understandable approach where we have a hard limit on the number of outgoing recoveries per node (on top of the incoming recoveries we already had).

bleskes added a commit that referenced this issue Apr 13, 2016

Rebalancing policy shouldn't prevent hard allocation decisions (#17698)
#14259 added a check to honor rebalancing policies (i.e., rebalance only on green state) when moving shards due to changes in allocation filtering rules. The rebalancing policy is there to make sure that we don't try to even out the number of shards per node when we are still missing shards. However, it should not interfere with explicit user commands (allocation filtering) or things like the disk threshold wanting to move shards because of a node hitting the high water mark.

#14259 was done to address #14057 where people reported that using allocation filtering caused many shards to be moved at once. This is however a none issue - with 1.7 (where the issue was reported) and 2.x, we protect recovery source nodes by limitting the number of concurrent data streams they can open (i.e., we can have many recoveries, but they will be throttled). In 5.0 we came up with a simpler and more understandable approach where we have a hard limit on the number of outgoing recoveries per node (on top of the incoming recoveries we already had).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.