-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: fix panic when campaign with snapshot #12163
Conversation
is this a problem of the test case or it can also happen in real world case? if a node has not applied the most recent snapshot, shall we even allow it to start a campaign? |
The test case should advance the snapshot before sending out any messages.
If implementer follows the implicit rule that don't send any messages and tick until snapshot is advanced, then this can't happen in real world. Otherwise, the raft can still start campaign when there is a pending snapshot. In my opinion, allow campaign here is OK as the raft machine has latest configuration restored from snapshot. As long as there is no pending configuration in following committed entries, it should be safe for it to campaign. Before the request vote messages are sent, the snapshot should be written into disk according to the contract of handling ready, so it's safe even the raft crashes and restarts. Though I'm also OK to prevent it from campaigning. It should reduce corner cases. |
I would suggest this if you agree. |
16eb485
to
06f3e04
Compare
Updated. Two affected test cases are also updated. |
Signed-off-by: Jay Lee <BusyJayLee@gmail.com>
06f3e04
to
b4feaed
Compare
lgtm |
56233: *: upgrade etcd/raft to master r=knz a=tbg We were using a custom fork that we picked up during our vgo transition, but in the meantime upstream has turned `raft` into a small module which has very few deps, and which we can switch back to. Helpfully, this module already pins gogoproto v1.3+, which simplifies PR #56147. I did review the commits we're picking up here in etcd/raft and they look good. The ones worth pointing out are: etcd-io/etcd#12137 etcd-io/etcd#12134 etcd-io/etcd#12163 These all seem to fix real bugs. Release note: None Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
If a node starts campaign without applying the snapshot data, loading entries can fail. This can happen when testing transfer leader after snapshot.
This commit is extracted from #12134. Should also fix the issue reported in comment.
@xiang90 PTAL