Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ILM: Skip rolling indexes that are already rolled #47324

Merged
merged 10 commits into from
Oct 4, 2019

Conversation

andreidan
Copy link
Contributor

@andreidan andreidan commented Sep 30, 2019

An index with an ILM policy that has a rollover action in one of the
phases was rolled over when the ILM conditions dictated regardless if
it was already rolled over (eg. manually after modifying an index
template in order to force the creation of a new index that uses the new
mappings).
This changes this behaviour and has ILM check if the index it's about to
roll has not been rolled over in the meantime.

Closes #44175

An index with an ILM policy that has a `rollover` action in one of the
phases was rolled over when the ILM conditions dictated regardless if
it was already rolled over (eg. manually after modifying an index
template in order to force the creation of a new index that uses the new
mappings).
This commit adds the `index.lifecycle.rollover.skip_rolled` setting,
which defaults to true, to change this behaviour and have ILM check if
the index it's about to roll has not been rolled over in the meantime.
@andreidan andreidan added >enhancement :Data Management/ILM+SLM Index and Snapshot lifecycle management v8.0.0 v7.5.0 labels Sep 30, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@@ -23,6 +23,17 @@ policy that contains a rollover action. When the index rolls over, the alias is
updated to reflect that the index is no longer the write index. For more
information about rollover, see <<using-policies-rollover>>.

`index.lifecycle.rollover.skip_rolled`:: (Expert Setting)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better suggestion for naming this setting? I also went ahead and made it true by default. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow_multiple_rollovers maybe? I don't love that either.
skip_rollover_if_already_rolled? More verbose, but that might be okay.

I'm honestly not sure why someone would want to change this setting in the first place. It's probably good to have a setting for it just in case, but I can't think of any actual use case for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this setting is a little strange, if a user were to set it to false they would essentially be rolling over an index that was no longer the write index, and in the case that ILM is configured like it usually is (the policy is applied to every index), it will lead to unintended behavior.

I think we should remove this setting, since to me this feels a little like having a setting to re-introduce buggy behavior.

IndexMetaData indexMetaData = IndexMetaData.builder(randomAlphaOfLength(10))
.putAlias(AliasMetaData.builder(rolloverAlias))
.settings(settings(Version.CURRENT).put(RolloverAction.LIFECYCLE_ROLLOVER_ALIAS, rolloverAlias))
.putRolloverInfo(new RolloverInfo(randomAlphaOfLength(5),
Copy link
Contributor Author

@andreidan andreidan Sep 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think this scenario makes sense? (ie. allowing ILM to attempt a rollover if the index was manually rolled under a different alias or should it skip rolling over a rolled index regardless of the alias?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is like, the corneriest of corner cases. I suspect the behavior in this case will not ever matter. That said, I think we should only pay attention to rollovers using the same alias, and if we're doing that intentionally, might as well test it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check to see if rolling the index using a different alias prompts the same error that #44175 sees (manually is fine), and if so, we should fix that too. I think Gordon is right that it won't really matter.

Copy link
Contributor

@gwbrown gwbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start, thanks! Left some comments, I also think an integration test in TimeSeriesLifecycleActionsIT would probably be good here to make sure we get through all the steps in rollover without erroring.

@@ -23,6 +23,17 @@ policy that contains a rollover action. When the index rolls over, the alias is
updated to reflect that the index is no longer the write index. For more
information about rollover, see <<using-policies-rollover>>.

`index.lifecycle.rollover.skip_rolled`:: (Expert Setting)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow_multiple_rollovers maybe? I don't love that either.
skip_rollover_if_already_rolled? More verbose, but that might be okay.

I'm honestly not sure why someone would want to change this setting in the first place. It's probably good to have a setting for it just in case, but I can't think of any actual use case for it.

@@ -53,6 +55,12 @@ public void evaluateCondition(IndexMetaData indexMetaData, Listener listener) {
return;
}

if (indexMetaData.getSettings().getAsBoolean(LIFECYCLE_ROLLOVER_SKIP_ROLLED, true) &&
indexMetaData.getRolloverInfos().get(rolloverAlias) != null) {
logger.trace("Index {} was already rolled, skipping", indexMetaData.getIndex().getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think this should be info (rather than `trace)
  2. Use brackets around index name, and don't capitalize, to fit better with the typical log format
  3. Nitpicky, but I'd use a message more like index [{}] was already rolled over for alias [{}], not attempting to roll over again

@@ -53,6 +55,12 @@ public void evaluateCondition(IndexMetaData indexMetaData, Listener listener) {
return;
}

if (indexMetaData.getSettings().getAsBoolean(LIFECYCLE_ROLLOVER_SKIP_ROLLED, true) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll also need this check in RolloverStep, or it'll just try to issue the rollover request there and fail a step later. I think UpdateRolloverLifecycleDateStepis fine not checking it, because we should probably update the reference date anyway, and the step to set indexing complete is probably good to do anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, otherwise we pass the check step but fail on the "actual" rollover

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I wasn't seeing this in the manual tests as we need to complete this step here (call the listener callback). I'll add the check to the RolloverStep too with the listener callback as well. Will capture this in an integration test too as you guys suggested

IndexMetaData indexMetaData = IndexMetaData.builder(randomAlphaOfLength(10))
.putAlias(AliasMetaData.builder(rolloverAlias))
.settings(settings(Version.CURRENT).put(RolloverAction.LIFECYCLE_ROLLOVER_ALIAS, rolloverAlias))
.putRolloverInfo(new RolloverInfo(randomAlphaOfLength(5),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is like, the corneriest of corner cases. I suspect the behavior in this case will not ever matter. That said, I think we should only pay attention to rollovers using the same alias, and if we're doing that intentionally, might as well test it.

step.evaluateCondition(indexMetaData, new AsyncWaitStep.Listener() {

@Override
public void onResponse(boolean complete, ToXContentObject informationContext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably check that complete is true.

step.evaluateCondition(indexMetaData, new AsyncWaitStep.Listener() {

@Override
public void onResponse(boolean complete, ToXContentObject informationContext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably check that complete is true.

Mockito.verify(indicesClient, Mockito.only()).rolloverIndex(Mockito.any(), Mockito.any());
}

private IndicesAdminClient indicesAdminClientMock() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Helper methods should come after all of the actual test methods.

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Andrei! I left some comments, in particular there is a bug where we don't actually advance the step. I think it's not caught because we don't have any integration tests for this (only unit tests). I think we should add some REST tests (should be something we can exercise from a .yaml test) for manually rolling over an index.

@@ -53,6 +55,12 @@ public void evaluateCondition(IndexMetaData indexMetaData, Listener listener) {
return;
}

if (indexMetaData.getSettings().getAsBoolean(LIFECYCLE_ROLLOVER_SKIP_ROLLED, true) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, otherwise we pass the check step but fail on the "actual" rollover

IndexMetaData indexMetaData = IndexMetaData.builder(randomAlphaOfLength(10))
.putAlias(AliasMetaData.builder(rolloverAlias))
.settings(settings(Version.CURRENT).put(RolloverAction.LIFECYCLE_ROLLOVER_ALIAS, rolloverAlias))
.putRolloverInfo(new RolloverInfo(randomAlphaOfLength(5),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check to see if rolling the index using a different alias prompts the same error that #44175 sees (manually is fine), and if so, we should fix that too. I think Gordon is right that it won't really matter.

}

private IndicesAdminClient indicesAdminClientMock() {
AdminClient adminClient = Mockito.mock(AdminClient.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vastly prefer that we subclass NoOpClient to test this rather than using mocks, but since this test already has precedence for using them, it's okay.

@@ -23,6 +23,17 @@ policy that contains a rollover action. When the index rolls over, the alias is
updated to reflect that the index is no longer the write index. For more
information about rollover, see <<using-policies-rollover>>.

`index.lifecycle.rollover.skip_rolled`:: (Expert Setting)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this setting is a little strange, if a user were to set it to false they would essentially be rolling over an index that was no longer the write index, and in the case that ILM is configured like it usually is (the policy is applied to every index), it will lead to unintended behavior.

I think we should remove this setting, since to me this feels a little like having a setting to re-introduce buggy behavior.

@andreidan
Copy link
Contributor Author

Thanks for your comments @dakrone and @gwbrown. I agree on the setting being odd (and offers little value). I'll drop it.

#47324 (comment)

@dakrone a manual rollover on a different alias between ilm iterations will cause the ilm rollover to fail because the target index will already have been created (eg. a manual roll of logs-001 will create logs-002. when ilm will attempt to rollover the pre-conditions for rolling logs-001 will be valid but the rollover will fail because logs-002 already exists - unless of course the user archived/deleted/renamed it but I can't really think of an use case where this makes sense)

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

@andreidan
Copy link
Contributor Author

@elasticmachine update branch

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I did leave a comment about a minor change to the test, but it shouldn't need another review. Thanks for iterating on this Andrei!


@Override
public void onResponse(boolean complete) {
assertThat(complete, is(true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is asynchronous, it's possible that the onResponse never actually gets called (we don't verify that it was called), can you add a CountDownLatch that in onResponse is counted down, we can then .await() on the latch and ensure that onResponse did check the complete boolean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes! Good shout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this would be true in an integration test but in this case we call evaluateStep from the main test thread, making this effectively synchronous.


@Override
public void onResponse(boolean complete, ToXContentObject informationContext) {
assertThat(complete, is(true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about the latch, we need to check that onResponse is actually called in the test.


@Override
public void onResponse(boolean complete, ToXContentObject informationContext) {
assertThat(complete, is(true));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about the latch :)

@andreidan andreidan merged commit 37d6106 into elastic:master Oct 4, 2019
andreidan added a commit to andreidan/elasticsearch that referenced this pull request Oct 4, 2019
An index with an ILM policy that has a rollover action in one of the
phases was rolled over when the ILM conditions dictated regardless if
it was already rolled over (eg. manually after modifying an index
template in order to force the creation of a new index that uses the new
mappings).
This changes this behaviour and has ILM check if the index it's about to
roll has not been rolled over in the meantime.

(cherry picked from commit 37d6106)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
andreidan added a commit that referenced this pull request Oct 7, 2019
An index with an ILM policy that has a rollover action in one of the
phases was rolled over when the ILM conditions dictated regardless if
it was already rolled over (eg. manually after modifying an index
template in order to force the creation of a new index that uses the new
mappings).
This changes this behaviour and has ILM check if the index it's about to
roll has not been rolled over in the meantime.

(cherry picked from commit 37d6106)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Manual rollover on ILM indices leads to Error
5 participants