raft: fix panic when campaign with snapshot #12163

BusyJay · 2020-07-23T18:09:07Z

If a node starts campaign without applying the snapshot data, loading entries can fail. This can happen when testing transfer leader after snapshot.

This commit is extracted from #12134. Should also fix the issue reported in comment.

@xiang90 PTAL

xiang90 · 2020-07-23T22:06:43Z

If a node starts campaign without applying the snapshot data, loading entries can fail. This can happen when testing transfer leader after snapshot.

is this a problem of the test case or it can also happen in real world case? if a node has not applied the most recent snapshot, shall we even allow it to start a campaign?

BusyJay · 2020-07-24T04:52:36Z

is this a problem of the test case?

The test case should advance the snapshot before sending out any messages.

it can also happen in real world case?

If implementer follows the implicit rule that don't send any messages and tick until snapshot is advanced, then this can't happen in real world. Otherwise, the raft can still start campaign when there is a pending snapshot. In my opinion, allow campaign here is OK as the raft machine has latest configuration restored from snapshot. As long as there is no pending configuration in following committed entries, it should be safe for it to campaign. Before the request vote messages are sent, the snapshot should be written into disk according to the contract of handling ready, so it's safe even the raft crashes and restarts.

Though I'm also OK to prevent it from campaigning. It should reduce corner cases.

xiang90 · 2020-07-24T22:10:32Z

Though I'm also OK to prevent it from campaigning. It should reduce corner cases.

I would suggest this if you agree.

BusyJay · 2020-07-25T05:37:34Z

Updated. Two affected test cases are also updated.

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

xiang90 · 2020-07-26T07:04:42Z

lgtm

56233: *: upgrade etcd/raft to master r=knz a=tbg We were using a custom fork that we picked up during our vgo transition, but in the meantime upstream has turned `raft` into a small module which has very few deps, and which we can switch back to. Helpfully, this module already pins gogoproto v1.3+, which simplifies PR #56147. I did review the commits we're picking up here in etcd/raft and they look good. The ones worth pointing out are: etcd-io/etcd#12137 etcd-io/etcd#12134 etcd-io/etcd#12163 These all seem to fix real bugs. Release note: None Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>

BusyJay mentioned this pull request Jul 23, 2020

raft: check pending conf change before campaign #12134

Merged

BusyJay force-pushed the fix-load-log-error branch from 16eb485 to 06f3e04 Compare July 25, 2020 05:36

raft: don't campaign with pending snapshot

b4feaed

Signed-off-by: Jay Lee <BusyJayLee@gmail.com>

BusyJay force-pushed the fix-load-log-error branch from 06f3e04 to b4feaed Compare July 25, 2020 05:39

xiang90 merged commit 26b89fd into etcd-io:master Jul 26, 2020

BusyJay deleted the fix-load-log-error branch July 26, 2020 18:09

tbg mentioned this pull request Nov 4, 2020

*: upgrade etcd/raft to master cockroachdb/cockroach#56233

Merged

absolute8511 mentioned this pull request Jun 16, 2021

Some issues on the leader transfer youzan/ZanRedisDB#91

Closed

absolute8511 added a commit to youzan/ZanRedisDB that referenced this pull request Jun 18, 2021

merge fix from etcd-io/etcd#12163

eced5e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raft: fix panic when campaign with snapshot #12163

raft: fix panic when campaign with snapshot #12163

BusyJay commented Jul 23, 2020

xiang90 commented Jul 23, 2020

BusyJay commented Jul 24, 2020

xiang90 commented Jul 24, 2020

BusyJay commented Jul 25, 2020

xiang90 commented Jul 26, 2020

raft: fix panic when campaign with snapshot #12163

raft: fix panic when campaign with snapshot #12163

Conversation

BusyJay commented Jul 23, 2020

xiang90 commented Jul 23, 2020

BusyJay commented Jul 24, 2020

xiang90 commented Jul 24, 2020

BusyJay commented Jul 25, 2020

xiang90 commented Jul 26, 2020