Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster health should await events plus other things #44348

Conversation

DaveCTurner
Copy link
Contributor

Today a cluster health request can wait on a selection of conditions, but it
does not guarantee that all of these conditions have ever held simultaneously
when it returns. More specifically, if a request sets waitForEvents() along
with some other conditions then Elasticsearch will respond when the master has
processed all the expected pending tasks and then the cluster satisfied the
other conditions, but it may be that at the time the cluster satisfied the
other conditions there were undesired pending tasks again.

This commit adjusts the behaviour of waitForEvents() to wait for all the
required events to be processed and then, if the resulting cluster state does
not satisfy the other conditions, it will wait until there is a cluster state
that does and then retry the wait-for-events too.

Today a cluster health request can wait on a selection of conditions, but it
does not guarantee that all of these conditions have ever held simultaneously
when it returns. More specifically, if a request sets `waitForEvents()` along
with some other conditions then Elasticsearch will respond when the master has
processed all the expected pending tasks _and then_ the cluster satisfied the
other conditions, but it may be that at the time the cluster satisfied the
other conditions there were undesired pending tasks again.

This commit adjusts the behaviour of `waitForEvents()` to wait for all the
required events to be processed and then, if the resulting cluster state does
not satisfy the other conditions, it will wait until there is a cluster state
that does and then retry the wait-for-events too.
@DaveCTurner DaveCTurner added >bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.0.0 v7.4.0 labels Jul 15, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this took me a while to understand, this is one confusing class :) (and we should probably dry it up a little at some point imo)
But I got it now -> LGTM :)

@DaveCTurner
Copy link
Contributor Author

Thanks @original-brownbear, yes, there's definitely room for improvement here. I did initially stage a more thorough refactoring for this PR, but the tests to support such a change were a little too thin for my taste.

@DaveCTurner DaveCTurner merged commit 41ef1e6 into elastic:master Jul 16, 2019
@DaveCTurner DaveCTurner deleted the 2019-07-15-cluster-health-wait-for-events-and-other-conditions branch July 16, 2019 05:31
DaveCTurner added a commit that referenced this pull request Jul 16, 2019
Today a cluster health request can wait on a selection of conditions, but it
does not guarantee that all of these conditions have ever held simultaneously
when it returns. More specifically, if a request sets `waitForEvents()` along
with some other conditions then Elasticsearch will respond when the master has
processed all the expected pending tasks _and then_ the cluster satisfied the
other conditions, but it may be that at the time the cluster satisfied the
other conditions there were undesired pending tasks again.

This commit adjusts the behaviour of `waitForEvents()` to wait for all the
required events to be processed and then, if the resulting cluster state does
not satisfy the other conditions, it will wait until there is a cluster state
that does and then retry the wait-for-events too.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jul 16, 2019
In elastic#44348 we changed the cluster health action so that it sometimes uses the
cluster state directly from the master service rather than from the cluster
applier. If the state is not recovered then this is inappropriate, because
prior to state recovery the state available to the cluster applier contains no
indices. This commit moves us back to using the state from the applier.

Fixes elastic#44416.
DaveCTurner added a commit that referenced this pull request Jul 17, 2019
In #44348 we changed the cluster health action so that it sometimes uses the
cluster state directly from the master service rather than from the cluster
applier. If the state is not recovered then this is inappropriate, because
prior to state recovery the state available to the cluster applier contains no
indices. This commit moves us back to using the state from the applier.

Fixes #44416.
DaveCTurner added a commit that referenced this pull request Jul 17, 2019
In #44348 we changed the cluster health action so that it sometimes uses the
cluster state directly from the master service rather than from the cluster
applier. If the state is not recovered then this is inappropriate, because
prior to state recovery the state available to the cluster applier contains no
indices. This commit moves us back to using the state from the applier.

Fixes #44416.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v7.4.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants