Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pause/Resume Auto Follower APIs #47510

Merged
merged 8 commits into from
Oct 11, 2019
Merged

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Oct 3, 2019

This pull request adds two APIs that allow to pause and resume CCR auto-follower patterns:

// pause auto-follower
POST /_ccr/auto_follow/my_pattern/pause

// resume auto-follower
POST /_ccr/auto_follow/my_pattern/resume

The ability to pause and resume auto-follow patterns can be useful in some situations, including the rolling upgrades of cluster using a bi-directional cross-cluster replication scheme (see #46665).

This pull request adds a new active flag to the AutoFollowPattern and adapts the AutoCoordinator and AutoFollower classes so that it stops to fetch remote's cluster state when all auto-follow patterns associate to the remote cluster are paused.

When an auto-follower is paused, remote indices that match the pattern are just ignored: they are not added to the pattern's followed indices uids list that is maintained in the local cluster state. This way, when the auto-follow pattern is resumed the indices created in the remote cluster in the meantime will be picked up again and added as new following indices. Indices created and then deleted in the remote cluster will be ignored as they won't be seen at all by the auto-follower pattern at resume time.

As this pull request is already massive, it does not include documentation.

@tlrx tlrx added >enhancement :Distributed/CCR Issues around the Cross Cluster State Replication features v8.0.0 labels Oct 3, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/CCR)

@tlrx
Copy link
Member Author

tlrx commented Oct 3, 2019

Marking as work in progress until tests are green.

@tlrx
Copy link
Member Author

tlrx commented Oct 4, 2019

@elasticmachine update branch

@tlrx tlrx removed the WIP label Oct 4, 2019
@tlrx
Copy link
Member Author

tlrx commented Oct 4, 2019

This is ready for review.

@tlrx
Copy link
Member Author

tlrx commented Oct 8, 2019

@elasticmachine update branch

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick look and left a 2 nits and a question. Looks great, so LGTM.

final Thread thread = Thread.currentThread();
getRemoteClusterState(remoteCluster, metadataVersion + 1, (remoteClusterStateResponse, remoteError) -> {
getRemoteClusterState(remoteCluster, Math.max(1L, nextMetadataVersion), (remoteClusterStateResponse, remoteError) -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is Math.max(1L, nextMetadataVersion) needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm reading it again, I'm not sure it is really needed. I'll dig into that.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tlrx, the production code looks great. I left one comment about a method override.

}

// for testing purpose
public AutoFollowPattern(String remoteCluster,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this was only added for avoiding changing call sites but I'd rather we didn't. You can use IntelliJ refactoring (change signature on this method to add the boolean active parameter as the fourth argument setting the default value to true, and ignore the conflict with the other override) to automatically handle the call sites (although you might have to worry about line-wrapping). Then we can avoid adding this method.

Copy link
Member Author

@tlrx tlrx Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Jason for spotting this, it should not have been committed. I added this ctor to avoid some work and planned to come back to it later, but didn't 🤦‍♂️

I pushed 0a551a2

@tlrx
Copy link
Member Author

tlrx commented Oct 10, 2019

@elasticmachine run elasticsearch-ci/2
@elasticmachine run elasticsearch-ci/packaging-sample

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @tlrx.

@dnhatn
Copy link
Member

dnhatn commented Oct 10, 2019

run elasticsearch-ci/packaging-sample

@ywelsch ywelsch added the v7.5.0 label Oct 11, 2019
@tlrx tlrx merged commit 8b82e62 into elastic:master Oct 11, 2019
@tlrx tlrx deleted the pause-auto-followers branch October 11, 2019 10:33
@tlrx
Copy link
Member Author

tlrx commented Oct 11, 2019

Thanks a lot @martijnvg, @jasontedor and @dnhatn

@cjcenizal
Copy link
Contributor

@tlrx Great work on this! I've created a Kibana issue to update the UI to surface this functionality (elastic/kibana#48008). Is the active state returned as part of the GET response? If not, is there another way to retrieve it? We will need this state on the UI so that we will know which button (pause or resume) we should show the user.

tlrx added a commit to tlrx/elasticsearch that referenced this pull request Oct 12, 2019
This commit adds two APIs that allow to pause and resume
CCR auto-follower patterns:

// pause auto-follower
POST /_ccr/auto_follow/my_pattern/pause

// resume auto-follower
POST /_ccr/auto_follow/my_pattern/resume

The ability to pause and resume auto-follow patterns can be
useful in some situations, including the rolling upgrades of
cluster using a bi-directional cross-cluster replication scheme
(see elastic#46665).

This committ adds a new active flag to the AutoFollowPattern
and adapts the AutoCoordinator and AutoFollower classes so
that it stops to fetch remote's cluster state when all auto-follow
patterns associate to the remote cluster are paused.

When an auto-follower is paused, remote indices that match the
pattern are just ignored: they are not added to the pattern's
followed indices uids list that is maintained in the local cluster
state. This way, when the auto-follow pattern is resumed the
indices created in the remote cluster in the meantime will be
picked up again and added as new following indices. Indices
created and then deleted in the remote cluster will be ignored
as they won't be seen at all by the auto-follower pattern at
resume time.

Backport of elastic#47510 for 7.x
tlrx added a commit that referenced this pull request Oct 13, 2019
This commit adds two APIs that allow to pause and resume
CCR auto-follower patterns:

// pause auto-follower
POST /_ccr/auto_follow/my_pattern/pause

// resume auto-follower
POST /_ccr/auto_follow/my_pattern/resume

The ability to pause and resume auto-follow patterns can be
useful in some situations, including the rolling upgrades of
cluster using a bi-directional cross-cluster replication scheme
(see #46665).

This commit adds a new active flag to the AutoFollowPattern
and adapts the AutoCoordinator and AutoFollower classes so
that it stops to fetch remote's cluster state when all auto-follow
patterns associate to the remote cluster are paused.

When an auto-follower is paused, remote indices that match the
pattern are just ignored: they are not added to the pattern's
followed indices uids list that is maintained in the local cluster
state. This way, when the auto-follow pattern is resumed the
indices created in the remote cluster in the meantime will be
picked up again and added as new following indices. Indices
created and then deleted in the remote cluster will be ignored
as they won't be seen at all by the auto-follower pattern at
resume time.

Backport of #47510 for 7.x
@tlrx
Copy link
Member Author

tlrx commented Oct 13, 2019

@cjcenizal as said via another channel, the state is exposed in the Get Auto-Follower API under the active boolean field.

howardhuanghua pushed a commit to TencentCloudES/elasticsearch that referenced this pull request Oct 14, 2019
This commit adds two APIs that allow to pause and resume 
CCR auto-follower patterns:

// pause auto-follower
POST /_ccr/auto_follow/my_pattern/pause

// resume auto-follower
POST /_ccr/auto_follow/my_pattern/resume

The ability to pause and resume auto-follow patterns can be 
useful in some situations, including the rolling upgrades of 
cluster using a bi-directional cross-cluster replication scheme 
(see elastic#46665).

This committ adds a new active flag to the AutoFollowPattern 
and adapts the AutoCoordinator and AutoFollower classes so 
that it stops to fetch remote's cluster state when all auto-follow 
patterns associate to the remote cluster are paused.

When an auto-follower is paused, remote indices that match the 
pattern are just ignored: they are not added to the pattern's 
followed indices uids list that is maintained in the local cluster 
state. This way, when the auto-follow pattern is resumed the 
indices created in the remote cluster in the meantime will be 
picked up again and added as new following indices. Indices 
created and then deleted in the remote cluster will be ignored 
as they won't be seen at all by the auto-follower pattern at 
resume time.
tlrx added a commit that referenced this pull request Oct 14, 2019
This commit adds support for Pause/Resume Auto-Follower APIs 
to the HLRC, with the documentation.

Relates #47510
tlrx added a commit that referenced this pull request Oct 14, 2019
This commit adds support for Pause/Resume Auto-Follower APIs 
to the HLRC, with the documentation.

Relates #47510
tlrx added a commit that referenced this pull request Oct 15, 2019
Relates #47510

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
tlrx added a commit that referenced this pull request Oct 15, 2019
Relates #47510

Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/CCR Issues around the Cross Cluster State Replication features >enhancement release highlight v7.5.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants