Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only start re-assigning persistent tasks if they are not already being reassigned #76258

Conversation

benwtrent
Copy link
Member

In cluster recovery scenarios, it is possible there has been a flurry of cluster state updates. These updates may be routing updates in an attempt to get indices searchable again on nodes.

For each of these updates, a new persistent task re-assignment update may cause more queued cluster update requests.

This can cause unnecessary work, and consequently slow down the cluster's recovery.

This commit guards cluster update action for persistent tasks re-assignment so that only one is queued at a time.

This MAY cause certain persistent tasks to be re-assigned more slowly, but since we periodically recheck for re-assignment, this is acceptable.

@benwtrent benwtrent added the :Distributed/Task Management Issues for anything around the Tasks API - both persistent and node level. label Aug 9, 2021
@elasticmachine elasticmachine added the Team:Distributed Meta label for distributed team label Aug 9, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@benwtrent benwtrent added >bug v7.15.0 v8.0.0 and removed Team:Distributed Meta label for distributed team labels Aug 9, 2021
@mark-vieira
Copy link
Contributor

@elasticmachine retest this please

@droberts195
Copy link
Contributor

The changes so far are fine, but I think there's an even more important change that needs to be made as well.

The seemingly innocuous change of https://github.com/elastic/elasticsearch/pull/72260/files#diff-73e70d1002fe8bcafa19e34b892feb628153094db2ce30d77ca2f56ae5752523R316-R317 which went into 7.14 causes a major problem for any type of persistent task where the reason for failure to assign includes detailed per-node information.

It means that if a particular type of persistent task cannot be assigned then it will lead to a vicious circle of: fail to assign -> set assignment failure reason -> update cluster state -> trigger cluster state listener -> try to reassign unassigned persistent tasks -> fail to assign -> set assignment failure reason to something different to what it was before -> update cluster state -> trigger cluster state listener -> etc.

For example, the failure reasons might go:

  1. Could not assign ML job [node B doesn't work because Y][node A doesn't work because X][node C doesn't work because Z]
  2. Could not assign ML job [node A doesn't work because X][node C doesn't work because Z][node B doesn't work because Y]
  3. Could not assign ML job [node A doesn't work because X][node B doesn't work because Y][node C doesn't work because Z]

So even though the reasons are effectively the same they're different.

The to something different to what it was before is what's different in 7.14. In 7.13 and earlier the second cycle would choose the same assignment failure reason as the first because the nodes would be checked in the same order, hence there wouldn't be a second cluster state update caused by the second assignment attempt because the cluster state would be identical.

I think this is a serious bug that will affect other users if it's not fixed soon, so it needs to be fixed for 7.14.1.

Two possible fixes I can see are:

  1. Delete https://github.com/elastic/elasticsearch/pull/72260/files#diff-73e70d1002fe8bcafa19e34b892feb628153094db2ce30d77ca2f56ae5752523R316-R317 and replace it with a comment saying it's really important that consecutive calls to getAssignment see the nodes in the same order (if the cluster hasn't changed in between).
  2. Sort the failure reasons for every type of persistent task that includes per-node detail in its failure reasons. Obviously ML falls into this category but other types of persistent task might also need their per-node detail sorting too.

It would also be good to add a test that two consecutive assignment failures with the same cluster state generate the same failure reason.

@droberts195
Copy link
Contributor

Sort the failure reasons for every type of persistent task that includes per-node detail in its failure reasons. Obviously ML falls into this category but other types of persistent task might also need their per-node detail sorting too.

I had a look and transforms already does this - it puts the per-node detailed reasons in a tree map keyed on node ID. The other types of persistent tasks just use very simple high level reasons, so won't be affected. I think it's just ML that will be affected, although all types of ML persistent tasks.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ben, this looks good. Can we add a test to verify the behavior too such that we do not inadvertently disable this somehow in the future?

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test looks good, one minor comment on it. I did not review the other added part yet.

@benwtrent benwtrent added the auto-backport Automatically create backport pull requests when merged label Aug 10, 2021
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if you could just make one more tweak

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

assertThat(
result.getExplanation(),
equalTo(
"Not opening job [incompatible_type_job] on node [{_node_name1}{version=8.0.0}], "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should substitute Version.CURRENT.toString instead of 8.0.0, otherwise this test will break every time we release?

Same issue two lines down.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that

@benwtrent benwtrent added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Aug 10, 2021
@elasticsearchmachine elasticsearchmachine merged commit e8a3b05 into elastic:master Aug 10, 2021
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
7.14 Commit could not be cherrypicked due to conflicts
7.x Commit could not be cherrypicked due to conflicts

To backport manually run:
backport --pr 76258

@benwtrent benwtrent deleted the feature/only-reassign-p-tasks-if-not-currently-reassigning branch August 10, 2021 18:21
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Aug 10, 2021
…g reassigned (elastic#76258)

* Only start re-assigning persistent tasks if they are not already being reassigned

* adding tests addressing PR comments

* addressing Pr COmments

* addressing PR comments + style"

* improving test rigor
benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Aug 10, 2021
…g reassigned (elastic#76258)

* Only start re-assigning persistent tasks if they are not already being reassigned

* adding tests addressing PR comments

* addressing Pr COmments

* addressing PR comments + style"

* improving test rigor
elasticsearchmachine pushed a commit that referenced this pull request Aug 10, 2021
…y being reassigned (#76258) (#76314)

* Only start re-assigning persistent tasks if they are not already being reassigned (#76258)

* Only start re-assigning persistent tasks if they are not already being reassigned

* adding tests addressing PR comments

* addressing Pr COmments

* addressing PR comments + style"

* improving test rigor

* test improvement
elasticsearchmachine pushed a commit that referenced this pull request Aug 10, 2021
…dy being reassigned (#76258) (#76315)

* Only start re-assigning persistent tasks if they are not already being reassigned (#76258)

* Only start re-assigning persistent tasks if they are not already being reassigned

* adding tests addressing PR comments

* addressing Pr COmments

* addressing PR comments + style"

* improving test rigor

* test improvement
@ferozsalam
Copy link

We were on 7.14.0 and experienced the issue described in #76258 (comment) - upgrading to 7.14.1 (which includes the fix in this PR) fixed the problem. Posting some symptoms here in case it helps anyone searching for solutions to their problems:

  • A large (and growing) number of pending tasks
  • A significant number of these tasks having the source reassign persistent tasks

Upgrading to 7.14.1 and rolling all our pods (we run ECK) resolved the issue entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug :Distributed/Task Management Issues for anything around the Tasks API - both persistent and node level. v7.14.1 v7.15.0 v8.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants