Cluster health should await events plus other things #44348

DaveCTurner · 2019-07-15T14:18:51Z

Today a cluster health request can wait on a selection of conditions, but it
does not guarantee that all of these conditions have ever held simultaneously
when it returns. More specifically, if a request sets waitForEvents() along
with some other conditions then Elasticsearch will respond when the master has
processed all the expected pending tasks and then the cluster satisfied the
other conditions, but it may be that at the time the cluster satisfied the
other conditions there were undesired pending tasks again.

This commit adjusts the behaviour of waitForEvents() to wait for all the
required events to be processed and then, if the resulting cluster state does
not satisfy the other conditions, it will wait until there is a cluster state
that does and then retry the wait-for-events too.

Today a cluster health request can wait on a selection of conditions, but it does not guarantee that all of these conditions have ever held simultaneously when it returns. More specifically, if a request sets `waitForEvents()` along with some other conditions then Elasticsearch will respond when the master has processed all the expected pending tasks _and then_ the cluster satisfied the other conditions, but it may be that at the time the cluster satisfied the other conditions there were undesired pending tasks again. This commit adjusts the behaviour of `waitForEvents()` to wait for all the required events to be processed and then, if the resulting cluster state does not satisfy the other conditions, it will wait until there is a cluster state that does and then retry the wait-for-events too.

elasticmachine · 2019-07-15T14:18:53Z

Pinging @elastic/es-distributed

original-brownbear

Ok this took me a while to understand, this is one confusing class :) (and we should probably dry it up a little at some point imo)
But I got it now -> LGTM :)

DaveCTurner · 2019-07-15T21:24:13Z

Thanks @original-brownbear, yes, there's definitely room for improvement here. I did initially stage a more thorough refactoring for this PR, but the tests to support such a change were a little too thin for my taste.

Today a cluster health request can wait on a selection of conditions, but it does not guarantee that all of these conditions have ever held simultaneously when it returns. More specifically, if a request sets `waitForEvents()` along with some other conditions then Elasticsearch will respond when the master has processed all the expected pending tasks _and then_ the cluster satisfied the other conditions, but it may be that at the time the cluster satisfied the other conditions there were undesired pending tasks again. This commit adjusts the behaviour of `waitForEvents()` to wait for all the required events to be processed and then, if the resulting cluster state does not satisfy the other conditions, it will wait until there is a cluster state that does and then retry the wait-for-events too.

In elastic#44348 we changed the cluster health action so that it sometimes uses the cluster state directly from the master service rather than from the cluster applier. If the state is not recovered then this is inappropriate, because prior to state recovery the state available to the cluster applier contains no indices. This commit moves us back to using the state from the applier. Fixes elastic#44416.

In #44348 we changed the cluster health action so that it sometimes uses the cluster state directly from the master service rather than from the cluster applier. If the state is not recovered then this is inappropriate, because prior to state recovery the state available to the cluster applier contains no indices. This commit moves us back to using the state from the applier. Fixes #44416.

DaveCTurner added >bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.0.0 v7.4.0 labels Jul 15, 2019

DaveCTurner requested review from ywelsch and original-brownbear July 15, 2019 14:18

original-brownbear approved these changes Jul 15, 2019

View reviewed changes

DaveCTurner merged commit 41ef1e6 into elastic:master Jul 16, 2019

DaveCTurner deleted the 2019-07-15-cluster-health-wait-for-events-and-other-conditions branch July 16, 2019 05:31

DaveCTurner mentioned this pull request Jul 16, 2019

GatewayIndexStateIT#testJustMasterNode fails #44416

Closed

DaveCTurner mentioned this pull request Jul 16, 2019

Use applied cluster state in cluster health #44426

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster health should await events plus other things #44348

Cluster health should await events plus other things #44348

DaveCTurner commented Jul 15, 2019

elasticmachine commented Jul 15, 2019

original-brownbear left a comment

DaveCTurner commented Jul 15, 2019

Cluster health should await events plus other things #44348

Cluster health should await events plus other things #44348

Conversation

DaveCTurner commented Jul 15, 2019

elasticmachine commented Jul 15, 2019

original-brownbear left a comment

Choose a reason for hiding this comment

DaveCTurner commented Jul 15, 2019