New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCR: Replicate existing ops with old term on follower #34412

Open
wants to merge 13 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@dnhatn
Contributor

dnhatn commented Oct 12, 2018

Since #34288, we might hit deadlock if the FollowTask has more fetchers than writers. This can happen in the following scenario:

Suppose the leader has two operations [seq#0, seq#1]; the FollowTask has two fetchers and one writer.

  1. The FollowTask issues two concurrent fetch requests: {from_seq_no: 0, num_ops:1} and {from_seq_no: 1, num_ops:1} to read seq#0 and seq#1 respectively.

  2. The second request which fetches seq#1 completes before, and then it triggers a write request containing only seq#1.

  3. The primary of a follower fails after it has replicated seq#1 to replicas.

  4. Since the old primary did not respond, the FollowTask issues another
    write request containing seq#1 (resend the previous write request).

  5. The new primary has seq#1 already; thus it won't replicate seq#1 to replicas but will wait for the global checkpoint to advance at least seq#1.

The problem is that the FollowTask has only one writer and that writer is waiting for seq#0 which won't be delivered until the writer completed.

This PR proposes to delay the write requests if there is a gap in the write-buffer. With this change, if a writer is waiting for seq_no N, then all the operations below N were delivered or were scheduled to deliver by other writers.

This PR proposes to replicate existing operations with the old primary term (instead of the current term) on the follower. In particular, when the following primary detects that it has processed an process already, it will look up the term of an existing operation with the same seq_no in the Lucene index, then rewrite that operation with the old term before replicating it to the following replicas. This approach is wait-free but requires soft-deletes on the follower. I will make a follow-up to enforce the soft-deletes on the follower.

Relates #34288

CCR: Delay write requests if gaps in write buffer
Since , we might hit deadlock if the FollowTask has more fetchers
than writers. This can happen in the following scenario:

Suppose the leader has two operations [seq#0, seq#1]; the FollowTask has
two fetchers and one writer.

1. The FollowTask issues two concurrent fetch requests: {from_seq_no: 0,
num_ops:1} and {from_seq_no: 1, num_ops:1} to read seq#0 and seq#1
respectively.

2. The second request which fetches seq#1 completes before, and then it
triggers a write request containing only seq#1.

3. The primary of a follower fails after it has replicated seq#1 to
replicas.

4. Since the old primary did not respond, the FollowTask issues another
write request containing seq#1 (resend the previous write request).

5. The new primary has seq#1 already; thus it won't replicate seq#1 to
replicas but will wait for the global checkpoint to advance at least
seq#1.

The problem is that the FollowTask has only one writer and that writer
is waiting for seq#0 which won't be delivered until the writer
completed.

This PR proposes to delay the write requests if there is a gap in the
write-buffer. With this change, if a writer is waiting for seq_no N,
then all the operations below N were delivered or were scheduled to
deliver by other writers.
@elasticmachine

This comment has been minimized.

Show comment
Hide comment
@elasticmachine

elasticmachine commented Oct 12, 2018

@dnhatn

This comment has been minimized.

Show comment
Hide comment
@dnhatn

dnhatn Oct 12, 2018

Contributor

Another approach is to let the following primary wait for the advancement of the global checkpoint only if its local checkpoint is at least the waiting_for_global checkpoint. Otherwise, it will return the unapplied operations to the FollowTask without waiting. In the latter case, the FollowTask puts back the unapplied operations to the buffer, then deliver the head (the current behavior) of the buffer (i.e., operations before the waiting_for_gcp).

Contributor

dnhatn commented Oct 12, 2018

Another approach is to let the following primary wait for the advancement of the global checkpoint only if its local checkpoint is at least the waiting_for_global checkpoint. Otherwise, it will return the unapplied operations to the FollowTask without waiting. In the latter case, the FollowTask puts back the unapplied operations to the buffer, then deliver the head (the current behavior) of the buffer (i.e., operations before the waiting_for_gcp).

dnhatn added a commit that referenced this pull request Oct 14, 2018

dnhatn added a commit that referenced this pull request Oct 14, 2018

@dnhatn dnhatn changed the title from CCR: Delay write requests if gaps in write buffer to CCR: Replicate existing ops with old term on follower Oct 16, 2018

@dnhatn dnhatn removed the team-discuss label Oct 16, 2018

@dnhatn dnhatn requested a review from s1monw Oct 16, 2018

@dnhatn

This comment has been minimized.

Show comment
Hide comment
@dnhatn

dnhatn Oct 16, 2018

Contributor

@bleskes This is ready. Could you please give it shot?

Contributor

dnhatn commented Oct 16, 2018

@bleskes This is ready. Could you please give it shot?

dnhatn added some commits Oct 16, 2018

@dnhatn

This comment has been minimized.

Show comment
Hide comment
@dnhatn

dnhatn Oct 17, 2018

Contributor

@s1monw I have addressed your comments. Could you please have another look?

Contributor

dnhatn commented Oct 17, 2018

@s1monw I have addressed your comments. Could you please have another look?

@dnhatn dnhatn requested a review from s1monw Oct 17, 2018

@s1monw

s1monw approved these changes Oct 17, 2018

LGTM

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Oct 17, 2018

Contributor

@bleskes should also look at this.

Contributor

s1monw commented Oct 17, 2018

@bleskes should also look at this.

@dnhatn

This comment has been minimized.

Show comment
Hide comment
@dnhatn

dnhatn Oct 18, 2018

Contributor

@bleskes I've addressed your comments. Would you please take another look?

Contributor

dnhatn commented Oct 18, 2018

@bleskes I've addressed your comments. Would you please take another look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment