Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: add learner #8605

Closed
wants to merge 7 commits into from
Closed

raft: add learner #8605

wants to merge 7 commits into from

Conversation

lishuai87
Copy link
Contributor

Partly implement #8568, only add Nonvoter support.

for protobuf, maybe we can add a version for pb.ConfState.

@lishuai87
Copy link
Contributor Author

lishuai87 commented Sep 27, 2017

@xiang90 @siddontang PTAL

@xiang90
Copy link
Contributor

xiang90 commented Sep 28, 2017

sorry for the delay. i will try to give this a review this week.

message ConfState {
repeated uint64 nodes = 1;
repeated uint64 nodes = 1;
repeated Server servers = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this simpler by just having

repeated uint64 nodes = 1;
repeated uint64 learners= 2;

?

Copy link
Contributor Author

@lishuai87 lishuai87 Sep 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may also have stagings after addVoter feature

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do not like the idea of staging server. i would rather let the application itself promote the leader to a node explicitly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically, the less raft layer needs to care about the better.

@xiang90
Copy link
Contributor

xiang90 commented Sep 29, 2017

can we rename Nonvoter to Learners?

also i would like to simplify the code path and minimize code change by having an explicit list of learners instead of a mixed list of Servers (which can be peer or non-voter).

@lishuai87 lishuai87 changed the title raft: add non-voter raft: add learner Sep 30, 2017
@xiang90
Copy link
Contributor

xiang90 commented Sep 30, 2017

the learner should always stay as follower state, and never try to promote or vote, right? where do we prevent that from happening?

@lishuai87
Copy link
Contributor Author

the learner should always stay as follower state, and never try to promote or vote, right? where do we prevent that from happening?

the promotable will always false for learner:

func (r *raft) promotable() bool {
 	_, ok := r.prs[r.id]
	return ok && !r.isLearner
 }

learner should be able to vote. because if we addNode a learner, the voters may think that node is voter, but that node may think its still learner.
for example,

  1. cluster has [A, B, C], with A is leader,
  2. addLearner D. the majority is still 2.
  3. addNode D. [A, B, C] both apply the confchange, but D don't recv the append log due to log lag or network jitter.
  4. A down. [B, C] is voter, they think D is voter too, the majority is 3. but D think itself is learner.
    if learner can't vote, the cluster will hang until A recover. if learner can vote, the cluster will ok immediately.

@xiang90
Copy link
Contributor

xiang90 commented Oct 3, 2017

learner should be able to vote. because if we addNode a learner, the voters may think that node is voter, but that node may think its still learner.

The same thing happens with normal add node case. There are two motivations for adding learner:

  1. scale up read request when consistency can be relaxed
  2. avoid slow member to disrupt the cluster (only promote to normal member when it can catch up)

What you are trying to do is to enhance 2. I would like to know why 2 itself is not strong enough.

Do you have a specific use case or failure mode in mind?

@lishuai87
Copy link
Contributor Author

In our case, we have 3 nodes in one raft group. If we want replace one node, we will add a new node first, then remove the old one.
But if a node down, the replace operation can't run as usual. Because if we add a new node first, the majority will be 3, and client requests will hang/timeout until the new node catch up.

@siddontang
Copy link
Contributor

Assume we have 3 nodes, A, B and C, and want to replace A with D, so now we can:

  1. Add learner D
  2. Remove A
  3. Add D

The corner case here is after 2, we have only B, C and if any one of them fails, the cluster can't work.

If the learner D can vote, this is not a problem, but here we have another corner case, after 1, we have A, B, C, D, A and D are on the same node, so if the node crashes, the cluster can't work too.

We have the same problem when we do:

  1. Add learner D
  2. Add D
  3. Remove A

I don't think the learner mechanism can fix this corner case, maybe only Raft Membership Change can do. So I suggest keeping things simple and letting learner don't be able to vote.

@xiang90
Copy link
Contributor

xiang90 commented Oct 9, 2017

I would say keep it as simple as possible for now. if the corner case does happen in reality or if we have a more concrete real world use case and failure case, we can improve it.

@xiang90
Copy link
Contributor

xiang90 commented Oct 9, 2017

Just to be more clear, after adding the learner concept, we already largely improve the reliability of membership change process. We will only promote a learner to a member if it can catch up.

mis := make(uint64Slice, 0, len(r.prs))
for id := range r.prs {
mis = append(mis, r.prs[id].Match)
mis := make(uint64Slice, 0, r.voterCount())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seem we can still use len(r.prs) to reduce the extra learner check in voterCount.

@siddontang
Copy link
Contributor

PTAL @xiang90

@xiang90
Copy link
Contributor

xiang90 commented Oct 11, 2017

@siddontang i would like to take another look after @javaforfun disables the vote of the learner.

@siddontang
Copy link
Contributor

@lishuai87
Copy link
Contributor Author

@xiang90 If learner can't vote, how to prevent this problem? #8605 (comment)

@xiang90
Copy link
Contributor

xiang90 commented Oct 12, 2017

@javaforfun we are not going to solve the problem you brought up for now. we should focus on solving:

  1. scale up read request when consistency can be relaxed
  2. avoid slow member to disrupt the cluster (only promote to normal member when it can catch up)

@siddontang
Copy link
Contributor

I agree with @xiang90, here we don't need a 100% solution for ConfChange.

@siddontang
Copy link
Contributor

PTAL @xiang90

raft/raft.go Outdated
@@ -116,6 +116,8 @@ type Config struct {
// used for testing right now.
peers []uint64

learners []uint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc string?

raft/raft.go Outdated
peers := c.peers
if len(cs.Nodes) > 0 {
if len(peers) > 0 {
voters := c.peers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use peers instead of voters. i do not want to create a new term for now.

raft/raft.go Outdated
for _, p := range peers {
r.prs[p] = &Progress{Next: 1, ins: newInflights(r.maxInflight)}
for _, n := range voters {
r.prs[n] = &Progress{Next: 1, ins: newInflights(r.maxInflight), isLearner: false}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this change. bool is false by default.

}

enum ConfChangeType {
ConfChangeAddNode = 0;
ConfChangeRemoveNode = 1;
ConfChangeUpdateNode = 2;
ConfChangeAddLearner = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConfChangeAddLearnerNode

raft/raft.go Outdated
if _, has := r.prs[n]; has {
panic(fmt.Sprintf("cannot specify both Voter and Learner for node: %x", n))
}
r.prs[n] = &Progress{Next: 1, ins: newInflights(r.maxInflight), isLearner: true}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it is easier to have a separate prs map for learners.

return true
}

func (r *raft) restoreNode(nodes []uint64, isLearner bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably restoreNode can take both nodes and learnerNodes as args.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

raft/raft.go Outdated
}

// promotable indicates whether state machine can be promoted to leader,
// which is true when its own id is in progress list.
// which is true when its own id is in progress list and its not learner.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not a learner

raft/raft.go Outdated
}

// promotable indicates whether state machine can be promoted to leader,
// which is true when its own id is in progress list.
// which is true when its own id is in progress list and its not learner.
func (r *raft) promotable() bool {
_, ok := r.prs[r.id]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if r.isLearer {
    return false
}
...

is cleaner

raft/raft.go Outdated
r.addLearnerNode(id, true)
}

func (r *raft) addLearnerNode(id uint64, isLearner bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addNodeOrLearnerNode.

or probably just duplicate code a little bit, and remove this func.

return
}
if isLearner {
// can only change Learner to Voter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log here.

raft/raft.go Outdated
}
if isLearner {
// can only change Learner to Voter
r.logger.Infof("%x ignore addLearner for %x [%s]", r.id, id, r.prs[id])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to log the reason why addLearner is ignored.

return count
}

func (r *raft) quorum() int { return r.voterCount()/2 + 1 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called frequently, maybe now we can use a variable to cache the quorum.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

@@ -3278,7 +3388,7 @@ func entsWithConfig(configFunc func(*Config), terms ...uint64) *raft {
for i, term := range terms {
storage.Append([]pb.Entry{{Index: uint64(i + 1), Term: term}})
}
cfg := newTestConfig(1, []uint64{}, 5, 1, storage)
cfg := newTestConfig(1, []uint64{1, 2, 3, 4, 5}, 5, 1, storage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is changed?

@@ -3293,7 +3403,7 @@ func entsWithConfig(configFunc func(*Config), terms ...uint64) *raft {
func votedWithConfig(configFunc func(*Config), vote, term uint64) *raft {
storage := NewMemoryStorage()
storage.SetHardState(pb.HardState{Vote: vote, Term: term})
cfg := newTestConfig(1, []uint64{}, 5, 1, storage)
cfg := newTestConfig(1, []uint64{1, 2, 3, 4, 5}, 5, 1, storage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is changed?

@@ -76,13 +76,15 @@ message HardState {
}

message ConfState {
repeated uint64 nodes = 1;
repeated uint64 nodes = 1; // Voters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment. voter is meaningless without any context. or we need to explain exactly what is nodes.

@@ -76,13 +76,15 @@ message HardState {
}

message ConfState {
repeated uint64 nodes = 1;
repeated uint64 nodes = 1; // Voters
repeated uint64 learners = 2; // Nonvoters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. we need to explain what it nonvoters.

@lishuai87
Copy link
Contributor Author

maybe it is easier to have a separate prs map for learners.

we may need to handle the prs before merging this PR. also the prs in status struct, https://github.com/coreos/etcd/blob/master/raft/status.go#L30

type Status struct {
	ID uint64

	pb.HardState
	SoftState

	Applied  uint64
	Progress map[uint64]Progress

	LeadTransferee uint64
}

@siddontang
Copy link
Contributor

siddontang commented Oct 20, 2017

maybe it is easier to have a separate prs map for learners.

Seem this can handle leaner more easily, but may introduce complexity in other places like in bcast.

/cc @xiang90

@xiang90
Copy link
Contributor

xiang90 commented Oct 20, 2017

@siddontang

right. someone should give it a try at least to see if it is simpler.

@xiang90
Copy link
Contributor

xiang90 commented Nov 1, 2017

closed in favor of #8751

@xiang90 xiang90 closed this Nov 1, 2017
@lishuai87 lishuai87 deleted the raft-non-voter branch November 12, 2017 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants