Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle failure to release retention leases in ILM #39281

Merged
merged 8 commits into from Feb 26, 2019

Conversation

@gwbrown
Copy link
Contributor

commented Feb 22, 2019

It is possible that the Unfollow API may fail to release shard history
retention leases when unfollowing, so this needs to be handled by the
ILM Unfollow action. There's nothing much that can be done automatically
about it from the follower side, so this change makes the ILM unfollow
action simply ignore those failures.

Resolves #39181

Handle failure to release retention leases in ILM
It is possible that the Unfollow API may fail to release shard history
retention leases when unfollowing, so this needs to be handled by the
ILM Unfollow action. There's nothing much that can be done automatically
about it from the follower side, so this change makes the ILM unfollow
action simply ignore those failures.
@elasticmachine

This comment has been minimized.

Copy link
Collaborator

commented Feb 22, 2019

@elasticmachine

This comment has been minimized.

Copy link
Collaborator

commented Feb 22, 2019

@jasontedor
Copy link
Member

left a comment

The production code looks good, but yeah, let us see some tests.

gwbrown added some commits Feb 24, 2019

@gwbrown gwbrown marked this pull request as ready for review Feb 24, 2019

@gwbrown gwbrown requested a review from martijnvg Feb 24, 2019

@martijnvg
Copy link
Member

left a comment

The tests and production code look good to me.

I just wonder whether the location of the CcrIndexLifecycleIT is the right place. Currently it resides in the ccr module (the reason that this works is because ccr module has a testCompile dependency on the monitoring module and the monitoring module has a testCompile dependency on the ilm module). However this integration test tests whether the UnfollowFollowIndexStep handles the es.failed_to_remove_retention_leases metadata on a exception correctly. So from that perspective ideally this integ test should reside in the ilm module. (we would do that then need to add a testCompile dependency to ccr in ilm module). What do you think?


final ClusterStateResponse followerClusterState = followerClient().admin().cluster().prepareState().clear().setNodes(true).get();
try {
for (final ObjectCursor<DiscoveryNode> senderNode : followerClusterState.getState().nodes().getNodes().values()) {

This comment has been minimized.

Copy link
@martijnvg

martijnvg Feb 25, 2019

Member

Alternatively this test can also set cluster.remote.leader_cluster.seeds setting to null via cluster update settings api, which disables the follower cluster to communicate to the leader cluster. (this test would then also need to extend overwrite configureRemoteClusterViaNodeSettings() and return false).

I think this is a bit cleaner, as it it requires fewer lines of code and doesn't involve interacting with mock transport service. But since this is a common practice in other tests, I think it is okay for this test too. Just wanted to mention the alternative. Also the upside of the current approach, is that it is a more realistic simulation of when the leader cluster isn't reachable from the follower cluster in production.

This comment has been minimized.

Copy link
@gwbrown

gwbrown Feb 25, 2019

Author Contributor

Thanks - I'll try this. If that's the case, then this might not need to be a CcrIntegTestCase and we could roll it into the existing REST-based test cases in the ILM module, rather than having to add a dependency on CCR.

I didn't like that this test was in CCR, rather than ILM, but I wanted to get eyes on the code, and it can always be moved. I'll look into this and update this PR moving this test into the ILM module one way or another.

@gwbrown

This comment has been minimized.

Copy link
Contributor Author

commented Feb 26, 2019

@martijnvg After working with this a bit, I wasn't able to get setting cluster.remote.leader_cluster.seeds to null to work in a REST test, so I went about trying to move that test case to ilm, but ran into some issues with adding a dependency on CCR.

Adding a testCompile dependency on CCR's test classes to ILM produces jar hell failures, I think because of exactly what you mentioned: CCR depends on ILM transitively via monitoring. Let's sync up tomorrow and figure out if there's something we can do about this that I'm missing, or if we just have to live with this test being in a weird place.

martijnvg and others added some commits Feb 26, 2019

add a rest test, that tests unfollow index step works as expected
when follower cluster is unable to remove retention leases.

@gwbrown gwbrown requested a review from dakrone Feb 26, 2019

@dakrone
Copy link
Member

left a comment

LGTM, I left two minor comments

@gwbrown

This comment has been minimized.

Copy link
Contributor Author

commented Feb 26, 2019

Thanks @dakrone, fixed!

@gwbrown gwbrown merged commit 3f38b37 into elastic:master Feb 26, 2019

8 checks passed

CLA Commit author is a member of Elasticsearch
Details
elasticsearch-ci/1 Build finished.
Details
elasticsearch-ci/2 Build finished.
Details
elasticsearch-ci/bwc Build finished.
Details
elasticsearch-ci/default-distro Build finished.
Details
elasticsearch-ci/docbldesx Build finished.
Details
elasticsearch-ci/oss-distro-docs Build finished.
Details
elasticsearch-ci/packaging-sample Build finished.
Details

gwbrown added a commit to gwbrown/elasticsearch that referenced this pull request Feb 26, 2019

Handle failure to release retention leases in ILM (elastic#39281)
It is possible that the Unfollow API may fail to release shard history
retention leases when unfollowing, so this needs to be handled by the
ILM Unfollow action. There's nothing much that can be done automatically
about it from the follower side, so this change makes the ILM unfollow
action simply ignore those failures.

gwbrown added a commit to gwbrown/elasticsearch that referenced this pull request Feb 26, 2019

Handle failure to release retention leases in ILM (elastic#39281)
It is possible that the Unfollow API may fail to release shard history
retention leases when unfollowing, so this needs to be handled by the
ILM Unfollow action. There's nothing much that can be done automatically
about it from the follower side, so this change makes the ILM unfollow
action simply ignore those failures.

gwbrown added a commit to gwbrown/elasticsearch that referenced this pull request Feb 26, 2019

Handle failure to release retention leases in ILM (elastic#39281)
It is possible that the Unfollow API may fail to release shard history
retention leases when unfollowing, so this needs to be handled by the
ILM Unfollow action. There's nothing much that can be done automatically
about it from the follower side, so this change makes the ILM unfollow
action simply ignore those failures.

gwbrown added a commit that referenced this pull request Feb 26, 2019

Handle failure to release retention leases in ILM (#39281)
It is possible that the Unfollow API may fail to release shard history
retention leases when unfollowing, so this needs to be handled by the
ILM Unfollow action. There's nothing much that can be done automatically
about it from the follower side, so this change makes the ILM unfollow
action simply ignore those failures.

gwbrown added a commit that referenced this pull request Feb 26, 2019

Handle failure to release retention leases in ILM (#39281) (#39418)
It is possible that the Unfollow API may fail to release shard history
retention leases when unfollowing, so this needs to be handled by the
ILM Unfollow action. There's nothing much that can be done automatically
about it from the follower side, so this change makes the ILM unfollow
action simply ignore those failures.

gwbrown added a commit that referenced this pull request Feb 26, 2019

Handle failure to release retention leases in ILM (#39281) (#39417)
It is possible that the Unfollow API may fail to release shard history
retention leases when unfollowing, so this needs to be handled by the
ILM Unfollow action. There's nothing much that can be done automatically
about it from the follower side, so this change makes the ILM unfollow
action simply ignore those failures.

@michaelbaamonde michaelbaamonde added v7.0.0-rc1 and removed v7.0.0 labels Mar 25, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.