New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto_expand_replicas causing very large amount of cluster state changes when a node joins or leaves the cluster - causing the master to become unresponsive #3399

Closed
moberg opened this Issue Jul 26, 2013 · 2 comments

Comments

Projects
None yet
2 participants
@moberg

moberg commented Jul 26, 2013

In our cluster we have 8 nodes and about 100 indices. Each index have one shard and is replicated to every node using the setting “auto_expand_replicas=’0-all’”.

We observed that when a node leaves the cluster, the master node becomes unresponsive for some time. The more indices we added the longer time it got unresponsive. During this time restarted nodes were sometimes not able to join back into the cluster, causing a split brain scenario, or were just hung at startup.

Looking at the source code for how the node leave and join events is handled I think I have identified the bug. The ClusterChangedEvent is propagated to MetaDataUpdateSettingsService#clusterChanged which will loop through every index and if the number of nodes has changed, fire updateSettings for that index. When a node joins or leaves and using the auto_expand_replicas setting, every index will be affected. So for 100 indices it will fire off 100 updateSettings.

The problem is that each call updateSettings results in a new cluster state, which again will propagate back to the MetaDataUpdateSettingsService#clusterChanged, resulting in an exponential number of cluster state changes. This fills the log with messages like this:

[2013-07-26 20:55:45,726][INFO ][cluster.metadata ] [master] [index1] auto expanded replicas to [5]
[2013-07-26 20:55:45,726][INFO ][cluster.metadata ] [master] [index2] auto expanded replicas to [5]
[2013-07-26 20:55:45,726][INFO ][cluster.metadata ] [master] [index3] auto expanded replicas to [5]

My proposed fix is to group the updates together by fNumberOfReplicas and only trigger one update for each fNumberOfReplicas. In our case, when “auto_expand_replicas” is set to “0-all” this will result in one cluster state change instead of a flood of changes.

The fix passes all the tests and solves the problem we have been observing in production and been able to reproduce in our development environment.

Will update the ticket asap with a link to the commit for fix.

moberg pushed a commit to moberg/elasticsearch that referenced this issue Jul 26, 2013

Fix for elastic#3399 - auto_expand_replicas causing very large amount…
… of cluster state changes when a node joins or leaves the cluster
@kimchy

This comment has been minimized.

Show comment
Hide comment
@kimchy

kimchy Jul 26, 2013

Member

Batching the cluster events will definitely help. Btw, in 0.90 branch (upcoming 0.90.3) we fix the part about cluster not being responsive due to large amount of cluster change events. The cluster state publishing and the ping requests were using the same HIGH transport channel, now we have a dedicated transport channel for pings, means they will not "get behind" potentially a large amount of cluster state events.

Member

kimchy commented Jul 26, 2013

Batching the cluster events will definitely help. Btw, in 0.90 branch (upcoming 0.90.3) we fix the part about cluster not being responsive due to large amount of cluster change events. The cluster state publishing and the ping requests were using the same HIGH transport channel, now we have a dedicated transport channel for pings, means they will not "get behind" potentially a large amount of cluster state events.

@moberg

This comment has been minimized.

Show comment
Hide comment
@moberg

moberg commented Jul 26, 2013

Link to fix:
moberg@2327c56

@imotov imotov closed this in 95855c5 Jul 30, 2013

imotov added a commit that referenced this issue Jul 30, 2013

auto_expand_replicas causing very large amount of cluster state chang…
…es when a node joins or leaves the cluster

Closes #3399

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment