Task list: support raft learner in etcd. #10537

jingyih · 2019-03-12T23:39:03Z

Background

Design doc for raft learner (non-voting member)
Original issue Support non-voting members in etcd server #9161

Open questions

Task List
The list is subject to change.

Phase I

Create new feature branch - The feature branch will be merged back to master when the development is done. We will not squash commits to keep track of contribution.
API changes and regenerate go bindings.
- add isLearner flag to MemberAdd API request.
- add MemberPromote API.
- add isLearner field in Status maintenance API response.

Phase II
Multiple developers can work in parallel on these tasks.

Phase III
Multiple developers can work in parallel on these tasks.

Future Work
These items will likely not be included in v3.4 release.

Add auto-promote, but put behind a feature flag?
Throttle process of a learner catching up from leader.

Pull Requests
Incremental PRs opened against feature branch.
Aggregated PR of feature branch against master branch.

The text was updated successfully, but these errors were encountered:

jingyih · 2019-03-12T23:39:29Z

/cc @WIZARD-CXY @xiang90 @gyuho @jpbetz

jingyih · 2019-03-12T23:41:01Z

Thanks @jpbetz for helping on the task list.

@WIZARD-CXY and @jingyih will be working on this. Please let us know if you want to help on some of the tasks.

WIZARD-CXY · 2019-03-13T04:33:34Z

LGTM, the task is clear. I can help in phase II and III.

jingyih · 2019-03-14T23:14:07Z

@WIZARD-CXY

I created the feature branch here:
https://github.com/jingyih/etcd/tree/learner

For our development work, we can create pull request against jingyih/etcd/learner. In the end, we will merge all the changes from jingyih/etcd/learner to etcd-io/etcd/master. Since we will not squash commits, all the commits in feature branch will appear in master branch. Here is an example: #9860

Please let me know if you have any question.

jpbetz · 2019-03-14T23:38:47Z

Thanks Jingyi! We did the same feature branch approach for the clientv3 balancer improvements last summer. All author attribution of all commits is retained, so it's works out really nice.

…

On Thu, Mar 14, 2019 at 4:14 PM Jingyi Hu ***@***.***> wrote: @WIZARD-CXY <https://github.com/WIZARD-CXY> I created the feature branch here: https://github.com/jingyih/etcd/tree/learner For our development work, we can create pull request against jingyih/etcd/learner. In the end, we will merge all the changes from jingyih/etcd/learner to etcd-io/etcd/master. Since we will not squash commits, all the commits in feature branch will appear in master branch. Here is an example: #9860 <#9860> Please let me know if you have any question. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#10537 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAf9RvIQ-BoEluDH_K3JL0_FrwvUDwnkks5vWtfGgaJpZM4bsF-6> .

WIZARD-CXY · 2019-03-15T01:39:26Z

Roger that

WIZARD-CXY · 2019-03-21T08:29:43Z

@jingyih Maybe add modify existing code to work with the learner implementation to phase II.
Because I find that TransferLeadership function may transfer leader to a learner node which is unexpected.

jingyih · 2019-03-21T17:45:35Z

@WIZARD-CXY Thanks for pointing out:) Added. I changed the wording to be more specific: "Learner reject leadership transfer."

WIZARD-CXY · 2019-03-22T09:45:02Z

@jingyih maybe exclude the learner when transfer leadership. I think it is more accurate in this case.

jingyih · 2019-03-22T21:10:32Z

@jingyih maybe exclude the learner when transfer leadership. I think it is more accurate in this case.

Sounds good.

WIZARD-CXY · 2019-04-05T04:02:18Z

@purpleidea https://etcd.readthedocs.io/en/latest/server-learner.html

xiang90 · 2019-04-05T04:06:37Z

For anyone not working "face to face" internally on etcd, it's not obvious what this is. It was mentioned in the etcd community meeting, but I think everyone who attended was a paid RH employee. Thanks!

This feature is currently driven by @jingyih from Google and @WIZARD-CXY from Alibaba. So it is a joint effort from the community :P. If you are interested, we definitely want your help.

purpleidea · 2019-04-05T04:13:33Z

@WIZARD-CXY Thanks for the links!

@xiang90 I didn't realize it wasn't just RH people. Will remove my comment, thanks for the info!

purpleidea · 2019-04-05T04:44:57Z

I just read

https://etcd.readthedocs.io/en/latest/server-learner.html

I have some thoughts about this, based on my experience building etcd clusters automatically in https://github.com/purpleidea/mgmt/ If these insights are useful, I am happy to share. I don't have a lot of time to contribute new code for this particular implementation at the moment.

Some background:

I first published this in 2016: https://purpleidea.com/blog/2016/06/20/automatic-clustering-in-mgmt/
I re-wrote it from etcd V2 to V3.
My golang skills are now improved significantly.

I am about one week away from releasing a re-write of this code to remove all the cruft caused by my lack of knowledge in golang.

Some thoughts:

Instead of doing member add as a follower, wouldn't it be better for the new prospective member to connect as a client, and do some sort of "sync" to pull most of the data down, and then re-join as a member? This means existing cluster automation tools don't need to change much, they just need to add a single etcd client sync which is optional if they want the initial log data. It also means we DON'T need to break API for this.
Currently there's a difficult race:
Suppose you have an N member cluster (perhaps N is 1 and hostname is A). You decide to add member B. You run member add API from A, and also member B starts up server. If there is any problem with startup, member A is in a borked state. What can you do to unbork? Raft log is blocked. A way to force remove the pending member would be useful.
If you did this operation with a "raft learner" the server is already started but not voting, so doing the switchover from non-voting server to voting server might be a more difficult problem to deal with, since it's already part of the cluster specification!
I think the best way to model all this is to add MemberAPI operations into etcd.Op so that they can happen in a Txn. If this is something that intrigues anyone, then I can write up a proposal. A related proposal is: The Txn Cmp section should support AND and OR and NOT. #10571 and [feat] Watches/events on the member API should be possible #5277

Thanks for reading. If this is not helpful, please be blunt and tell me, and I won't make more noise.

jingyih · 2019-04-05T19:54:41Z

This task list was meant to be used for tracking the implementation progress. But I just added some context in the beginning. Sorry for the confusion.

WIZARD-CXY · 2019-05-15T05:09:17Z

@jingyih seems we are in the end of phase III, what else need I do? From your task list, we left 3 things

Add metrics.
integration test: partitioned learner
e2e test: etcdctl member add --learner, etcdctl endpoint status, etcdctl member list

WIZARD-CXY · 2019-05-15T05:10:47Z

how about I work on the add metrics

jingyih · 2019-05-15T05:23:53Z

@WIZARD-CXY I am thinking about dropping integration test: partitioned learner and the e2e tests. We already added e2e test for etcdctl member add --learner, I do not feel like we need to add e2e test for etcdctl endpoint status and etcdctl member list. So probably the last one is Add metrics. Maybe first come up with a list of metrics that we want to add?

jingyih · 2019-05-15T05:28:13Z

We might want to add:

etcd_server_is_learner
Ref:

etcd/docs/metrics/latest

Line 526 in 919b93b

# name: "etcd_server_is_leader"
We might also want some debugging metrics to count the number of total/failed/successful learner promotion.

WIZARD-CXY · 2019-05-15T05:52:30Z

Will do. I will submit a pr collecting the metrics

jingyih · 2019-05-16T02:09:33Z

Will do. I will submit a pr collecting the metrics

Thanks. When you are ready, please create PR against etcd-io/master. Let's stop using the learner feature branch.

WIZARD-CXY · 2019-05-16T03:06:02Z

ok

fabriziopandini · 2019-06-07T16:41:29Z

@jingyih is it possible that an etcd member answer to request from clients while still "learning"
We are experiencing something that might be related in Kubernetes see kubernetes-sigs/kind#588 (comment) ...

jingyih · 2019-06-07T20:28:31Z

@fabriziopandini Are you testing with etcd binary built from the master branch? The learner feature is not released yet.

Learner does accept certain types of requests in the current implementation. I can provide more info if needed. Just want to first make sure if this is related to the issue you are facing.

fabriziopandini · 2019-06-10T06:05:43Z

@jingyih thanks for the quick answer.My use case is the following:
I'm a kubeadm contributor and I'm investigating an issue on kind, where they are agreesively (in parallel) joining control-plane plane nodes to a K8s.

Joining control-plane nodes is a new kubeadm feature that creates a new etcd member on the joining node and then calls AddMember of the etcd V3 API (currently on etcd 3.1.10).

Now, the problem we are experiencing is that etcd / the API server starts to answer before the new etcd member is aligned with the rest of the nodes, giving false answers (after few seconds everything starts to behave as expected). From my understanding, what we are experiencing is etcd answering while still learning.

What I'm looking for is:

possibly a confirmation that the above analysis makes sense (or at least that this is a possible condition)
any suggestions about how to prevent this; the "learner" API is really promising, but I understand it still WIP and I'm open to any suggestion

Thank you in advance for your help

jingyih · 2019-06-10T07:14:19Z

@jingyih thanks for the quick answer.My use case is the following:
I'm a kubeadm contributor and I'm investigating an issue on kind, where they are agreesively (in parallel) joining control-plane plane nodes to a K8s.

Joining control-plane nodes is a new kubeadm feature that creates a new etcd member on the joining node and then calls AddMember of the etcd V3 API (currently on etcd 3.1.10).

Now, the problem we are experiencing is that etcd / the API server starts to answer before the new etcd member is aligned with the rest of the nodes, giving false answers (after few seconds everything starts to behave as expected). From my understanding, what we are experiencing is etcd answering while still learning.

What I'm looking for is:

possibly a confirmation that the above analysis makes sense (or at least that this is a possible condition)

any suggestions about how to prevent this; the "learner" API is really promising, but I understand it still WIP and I'm open to any suggestion

Thank you in advance for your help

Please follow the instruction [1] on how to add a new etcd member to cluster. The AddMember API should be called before starting the new etcd member, otherwise the existing cluster will not be able to recognize and verify the new member.

[1] https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/runtime-configuration.md#add-a-new-member

It looks as though the etcd team was planning to implement "auto promote" (see etcd-io#10537). This commit attempts to lay the groundwork for auto-promoting learners to voters by adding an --auto-promote flag to the add member command. One reason for having the ability to automatically promote learners to voters is that it would enable implementing the Raft paper's recommendation handle reconfiguration changes by first adding new members as non-voters until they have caught up with the leader.

arjunsingri · 2019-08-31T23:28:49Z

How many learner nodes can an etcd cluster support? Does it make sense to replace watches with learner nodes? Can a learner remain a learner indefinitely?

jingyih · 2019-09-01T01:35:13Z

How many learner nodes can an etcd cluster support?

Currently in 3.4, the maximum number of learner is 1. This may increase in future.

Does it make sense to replace watches with learner nodes?

Currently learner does not support watch API. To scale watch you might want to use: https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/grpc_proxy.md#scalable-watch-api

Can a learner remain a learner indefinitely?

Yes. Currently in 3.4 a leaner will not be auto promoted.

stale · 2020-04-06T21:55:55Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

jingyih mentioned this issue Mar 13, 2019

Support non-voting members in etcd server #9161

Closed

jingyih changed the title ~~Task list: support raft leaner in etcd.~~ Task list: support raft learner in etcd. Mar 13, 2019

purpleidea mentioned this issue Apr 11, 2019

Start server fails with "error validating peerURLs" (race?) #10626

Closed

jingyih mentioned this issue Apr 15, 2019

*: support raft learner in etcd #10645

Closed

purpleidea mentioned this issue May 3, 2019

raft: improve the availability related to member change #7625

Closed

fabriziopandini mentioned this issue Jun 7, 2019

create HA cluster is flaky kubernetes-sigs/kind#588

Closed

maxenglander mentioned this issue Jul 11, 2019

etcdserver: add ability to auto-promote learners to voters #10887

Closed

Marlinc mentioned this issue Nov 12, 2019

Add replacement nodes as learner nodes cbws/etcd-operator#31

Open

stale bot added the stale label Apr 6, 2020

stale bot closed this as completed Apr 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task list: support raft learner in etcd. #10537

Task list: support raft learner in etcd. #10537

jingyih commented Mar 12, 2019 •

edited

jingyih commented Mar 12, 2019

jingyih commented Mar 12, 2019 •

edited

WIZARD-CXY commented Mar 13, 2019

jingyih commented Mar 14, 2019

jpbetz commented Mar 14, 2019 via email

WIZARD-CXY commented Mar 15, 2019

WIZARD-CXY commented Mar 21, 2019 •

edited

jingyih commented Mar 21, 2019

WIZARD-CXY commented Mar 22, 2019 •

edited

jingyih commented Mar 22, 2019

WIZARD-CXY commented Apr 5, 2019

xiang90 commented Apr 5, 2019

purpleidea commented Apr 5, 2019

purpleidea commented Apr 5, 2019

jingyih commented Apr 5, 2019

WIZARD-CXY commented May 15, 2019 •

edited

WIZARD-CXY commented May 15, 2019

jingyih commented May 15, 2019

jingyih commented May 15, 2019

WIZARD-CXY commented May 15, 2019

jingyih commented May 16, 2019

WIZARD-CXY commented May 16, 2019

fabriziopandini commented Jun 7, 2019 •

edited

jingyih commented Jun 7, 2019

fabriziopandini commented Jun 10, 2019

jingyih commented Jun 10, 2019 •

edited

arjunsingri commented Aug 31, 2019

jingyih commented Sep 1, 2019

stale bot commented Apr 6, 2020

Task list: support raft learner in etcd. #10537

Task list: support raft learner in etcd. #10537

Comments

jingyih commented Mar 12, 2019 • edited

jingyih commented Mar 12, 2019

jingyih commented Mar 12, 2019 • edited

WIZARD-CXY commented Mar 13, 2019

jingyih commented Mar 14, 2019

jpbetz commented Mar 14, 2019 via email

WIZARD-CXY commented Mar 15, 2019

WIZARD-CXY commented Mar 21, 2019 • edited

jingyih commented Mar 21, 2019

WIZARD-CXY commented Mar 22, 2019 • edited

jingyih commented Mar 22, 2019

WIZARD-CXY commented Apr 5, 2019

xiang90 commented Apr 5, 2019

purpleidea commented Apr 5, 2019

purpleidea commented Apr 5, 2019

jingyih commented Apr 5, 2019

WIZARD-CXY commented May 15, 2019 • edited

WIZARD-CXY commented May 15, 2019

jingyih commented May 15, 2019

jingyih commented May 15, 2019

WIZARD-CXY commented May 15, 2019

jingyih commented May 16, 2019

WIZARD-CXY commented May 16, 2019

fabriziopandini commented Jun 7, 2019 • edited

jingyih commented Jun 7, 2019

fabriziopandini commented Jun 10, 2019

jingyih commented Jun 10, 2019 • edited

arjunsingri commented Aug 31, 2019

jingyih commented Sep 1, 2019

stale bot commented Apr 6, 2020

jingyih commented Mar 12, 2019 •

edited

jingyih commented Mar 12, 2019 •

edited

WIZARD-CXY commented Mar 21, 2019 •

edited

WIZARD-CXY commented Mar 22, 2019 •

edited

WIZARD-CXY commented May 15, 2019 •

edited

fabriziopandini commented Jun 7, 2019 •

edited

jingyih commented Jun 10, 2019 •

edited