Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raft learner may fail to start in some condition #9498

Closed
absolute8511 opened this issue Mar 28, 2018 · 9 comments
Closed

Raft learner may fail to start in some condition #9498

absolute8511 opened this issue Mar 28, 2018 · 9 comments

Comments

@absolute8511
Copy link
Contributor

absolute8511 commented Mar 28, 2018

Raft learner may fail to start if the learner is added before starting and then a snapshot is sent to the learner while starting.

https://github.com/coreos/etcd/blob/f5c56401d74401f040258afeb5fbdd875fafaf69/raft/raft.go#L1273

Also, currently there is no way to start a learner node since the Config.learners in raft Config cannot be set outside the raft package.

So if the raft node is starting fresh and there is a snapshot which including this node as learner, we
will fail to start this node.

Maybe we need to add an interface to allow start new learner raft directly?

@siddontang

@siddontang
Copy link
Contributor

Hi @absolute8511

What I discussed with @xiang90 before about learner is that just supporting in Raft library and let TiKV use it, I didn't consider using this in etcd.

Maybe we can add a learner flag to start etcd if we know we will add this new node as a learner.

@absolute8511
Copy link
Contributor Author

absolute8511 commented Mar 29, 2018

I think any distributed system using this raft library need to handle this situation. But currently, the raft library has no interface to init raft node with learner state from outside. Is there any workaround solution in TiKV?

@siddontang
Copy link
Contributor

/cc @hicqu

@siddontang
Copy link
Contributor

In raft-rs, we can restore the snapshot to the node if the progress has nothing even the node is not the learner, see https://github.com/pingcap/raft-rs/blob/master/src/raft.rs#L1776.

But I can't remember when to add it, maybe we should add this in etcd too.

@gyuho
Copy link
Contributor

gyuho commented Mar 30, 2018

c.f. #9161

@hicqu
Copy link
Contributor

hicqu commented Mar 30, 2018

In TiKV, the flag is_learner in Raft has no meaning if the progress set is empty, which is means the node is new created by ConfChange. This is a little trick, but it works fine.

@absolute8511
Copy link
Contributor Author

@hicqu the solution in TiKV is fine. Maybe we need something like that in etcd raft library.

@hicqu
Copy link
Contributor

hicqu commented Apr 4, 2018

Sorry for get back late. I will create a little PR to fix it later. :)

@stale
Copy link

stale bot commented Apr 7, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 7, 2020
@stale stale bot closed this as completed Apr 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants