Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issues with CS Handling in ILM Async Steps #68361

Conversation

original-brownbear
Copy link
Member

A few findings from debugging an SLM/ILM test issue:

  • No need to spin on all CS changes when waiting for a snapshot to go away,
    just use an appropriate predicate
  • Stop waiting for a new state if we are not master any longer since we can't
    complete the action anyway at this point (worse yet it will potentially run concurrently
    on local and master if it uses the client to execute its requests)
    • this has been causing a lot of noisy logging every now and then and just burns cycles on the previous master needlessly
  • Properly fail the CS listener if the node is shutting down to get easier to debug
    logging
  • Stop throwing exceptions in clusterStateProcessed in the error step mover, this is forbidden

Marking as non-issue for now since it mostly makes test debugging noisy.

A few findings from debugging an SLM/ILM test issue:

* No need to spin on all CS changes when waiting for a snapshot to go away,
just use an appropriate predicate
* Stop waiting for a new state if we are not master any longer since we can't
complete the action anyway at this point (worse yet it will potentially run concurrently
on local and master if it uses the `client` to execute its requests)
* Properly fail the CS listener if the node is shutting down to get easier to debug
logging
* Stop throwing exceptions in `clusterStateProcessed` in the error step mover, this is forbidden
@original-brownbear original-brownbear added >bug :Data Management/ILM+SLM Index and Snapshot lifecycle management v8.0.0 v7.12.0 labels Feb 2, 2021
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Feb 2, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

Copy link
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this cleanup and fix @original-brownbear

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM also, thanks Armin

@original-brownbear
Copy link
Member Author

npnp + thanks Andrei + Lee!

@original-brownbear original-brownbear merged commit c5232e7 into elastic:master Feb 2, 2021
@original-brownbear original-brownbear deleted the minor-cleanups-async-retry-during-snapshot-action-step branch February 2, 2021 16:35
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Feb 2, 2021
* No need to spin on all CS changes when waiting for a snapshot to go away,
just use an appropriate predicate
* Stop waiting for a new state if we are not master any longer since we can't
complete the action anyway at this point (worse yet it will potentially run concurrently
on local and master if it uses the `client` to execute its requests)
* Properly fail the CS listener if the node is shutting down to get easier to debug
logging
* Stop throwing exceptions in `clusterStateProcessed` in the error step mover, this is forbidden
original-brownbear added a commit that referenced this pull request Feb 2, 2021
* No need to spin on all CS changes when waiting for a snapshot to go away,
just use an appropriate predicate
* Stop waiting for a new state if we are not master any longer since we can't
complete the action anyway at this point (worse yet it will potentially run concurrently
on local and master if it uses the `client` to execute its requests)
* Properly fail the CS listener if the node is shutting down to get easier to debug
logging
* Stop throwing exceptions in `clusterStateProcessed` in the error step mover, this is forbidden
alyokaz pushed a commit to alyokaz/elasticsearch that referenced this pull request Mar 10, 2021
* No need to spin on all CS changes when waiting for a snapshot to go away,
just use an appropriate predicate
* Stop waiting for a new state if we are not master any longer since we can't
complete the action anyway at this point (worse yet it will potentially run concurrently
on local and master if it uses the `client` to execute its requests)
* Properly fail the CS listener if the node is shutting down to get easier to debug
logging
* Stop throwing exceptions in `clusterStateProcessed` in the error step mover, this is forbidden
easyice pushed a commit to easyice/elasticsearch that referenced this pull request Mar 25, 2021
* No need to spin on all CS changes when waiting for a snapshot to go away,
just use an appropriate predicate
* Stop waiting for a new state if we are not master any longer since we can't
complete the action anyway at this point (worse yet it will potentially run concurrently
on local and master if it uses the `client` to execute its requests)
* Properly fail the CS listener if the node is shutting down to get easier to debug
logging
* Stop throwing exceptions in `clusterStateProcessed` in the error step mover, this is forbidden
@original-brownbear original-brownbear restored the minor-cleanups-async-retry-during-snapshot-action-step branch April 18, 2023 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/ILM+SLM Index and Snapshot lifecycle management Team:Data Management Meta label for data/management team v7.12.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants