Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollover sometimes does not attach Rollover Info to index metadata #49413

Closed
gwbrown opened this issue Nov 20, 2019 · 3 comments
Closed

Rollover sometimes does not attach Rollover Info to index metadata #49413

gwbrown opened this issue Nov 20, 2019 · 3 comments
Assignees
Labels
>bug :Data Management/ILM+SLM Index and Snapshot lifecycle management :Data Management/Indices APIs APIs to create and manage indices and templates

Comments

@gwbrown
Copy link
Contributor

gwbrown commented Nov 20, 2019

Elasticsearch version (bin/elasticsearch --version): Observed on 6.7.2 and 7.0.1

Issue

I've observed multiple instances where RolloverInfo does not get attached to an index when it is rolled over. In one instance, the cluster was heavily overloaded and had errors processing cluster state updates, in another, a node was rapidly joining, getting removed, rejoining again, and repeating over and over. This may or may not be relevant.

This would normally not be problematic, but ILM relies on the RolloverInfo to update the lifecycle reference date following rollover, so when this occurs, ILM encounters an error with a message similar to this one:
[2019-10-28T05:50:10,915][ERROR][o.e.x.i.ExecuteStepsUpdateTask] [name] policy [my-policy] for index [my-index-000001] failed on cluster state step [{"phase":"hot","action":"rollover","name":"update-rollover-lifecycle-date"}]. Moving to ERROR step

The step info from the ILM Explain API when this happens:
"step_info" : { "type" : "illegal_state_exception", "reason" : "no rollover info found for [my-index-000001] with alias [my-alias], the index has not yet rolled over with that alias", "stack_trace" : "[omitted for brevity]" }

Workaround

There is no way to manually attach RolloverInfo, so if this happens to an index, ILM must be forcibly moved past this step. NOTE THAT THIS MEANS THE CREATION DATE OF THE INDEX WILL BE USED INSTEAD OF THE ROLLOVER DATE for all following phases. If this is problematic, then the index.lifecycle.origination_date setting in 7.5+ may be useful.

To do this, you can use the following request:

POST _ilm/move/my-index-000001
{
  "current_step": { 
    "phase": "hot",
    "action": "rollover",
    "name": "ERROR"
  },
  "next_step": { 
    "phase": "hot",
    "action": "rollover",
    "name": "set-indexing-complete"
  }
}
@gwbrown gwbrown added >bug :Data Management/Indices APIs APIs to create and manage indices and templates :Data Management/ILM+SLM Index and Snapshot lifecycle management labels Nov 20, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

@andreidan andreidan self-assigned this Nov 23, 2019
@andreidan
Copy link
Contributor

andreidan commented Nov 25, 2019

I've tried to reproduce this but was unsuccessful using manual and disruption tests as I probably wasn't successful in getting the exact timing right (using SlowClusterUdpate and BlockClusterStateProcessing)

We'll consider this instance as we are working on making ILM more resilient (tracking the progress in #48183)

@dakrone
Copy link
Member

dakrone commented Jan 14, 2020

I believe we can close this now with the merging of #50388, since the rollover happens in a single cluster state (plus it's retryable on error now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/ILM+SLM Index and Snapshot lifecycle management :Data Management/Indices APIs APIs to create and manage indices and templates
Projects
None yet
Development

No branches or pull requests

4 participants