Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: fix panic when campaign with snapshot #12163

Merged
merged 1 commit into from
Jul 26, 2020

Conversation

BusyJay
Copy link
Contributor

@BusyJay BusyJay commented Jul 23, 2020

If a node starts campaign without applying the snapshot data, loading entries can fail. This can happen when testing transfer leader after snapshot.

This commit is extracted from #12134. Should also fix the issue reported in comment.

@xiang90 PTAL

@xiang90
Copy link
Contributor

xiang90 commented Jul 23, 2020

If a node starts campaign without applying the snapshot data, loading entries can fail. This can happen when testing transfer leader after snapshot.

is this a problem of the test case or it can also happen in real world case? if a node has not applied the most recent snapshot, shall we even allow it to start a campaign?

@BusyJay
Copy link
Contributor Author

BusyJay commented Jul 24, 2020

is this a problem of the test case?

The test case should advance the snapshot before sending out any messages.

it can also happen in real world case?

If implementer follows the implicit rule that don't send any messages and tick until snapshot is advanced, then this can't happen in real world. Otherwise, the raft can still start campaign when there is a pending snapshot. In my opinion, allow campaign here is OK as the raft machine has latest configuration restored from snapshot. As long as there is no pending configuration in following committed entries, it should be safe for it to campaign. Before the request vote messages are sent, the snapshot should be written into disk according to the contract of handling ready, so it's safe even the raft crashes and restarts.

Though I'm also OK to prevent it from campaigning. It should reduce corner cases.

@xiang90
Copy link
Contributor

xiang90 commented Jul 24, 2020

Though I'm also OK to prevent it from campaigning. It should reduce corner cases.

I would suggest this if you agree.

@BusyJay
Copy link
Contributor Author

BusyJay commented Jul 25, 2020

Updated. Two affected test cases are also updated.

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
@xiang90
Copy link
Contributor

xiang90 commented Jul 26, 2020

lgtm

@xiang90 xiang90 merged commit 26b89fd into etcd-io:master Jul 26, 2020
@BusyJay BusyJay deleted the fix-load-log-error branch July 26, 2020 18:09
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Nov 13, 2020
56233: *: upgrade etcd/raft to master r=knz a=tbg

We were using a custom fork that we picked up during our vgo transition,
but in the meantime upstream has turned `raft` into a small module which
has very few deps, and which we can switch back to.

Helpfully, this module already pins gogoproto v1.3+, which simplifies
PR #56147.

I did review the commits we're picking up here in etcd/raft and they look good. The ones worth pointing out are:

etcd-io/etcd#12137
etcd-io/etcd#12134
etcd-io/etcd#12163

These all seem to fix real bugs.

Release note: None


Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
absolute8511 added a commit to youzan/ZanRedisDB that referenced this pull request Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants