Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ForgetLeader #78

Merged
merged 3 commits into from
Jun 26, 2023
Merged

add ForgetLeader #78

merged 3 commits into from
Jun 26, 2023

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Jun 18, 2023

This patch adds ForgetLeader(), which causes a follower to forget its current leader, changing it to None. It remains a leaderless follower in the current term, without campaigning.

This is particularly useful with PreVote+CheckQuorum, if the caller has strong reason to believe the leader is dead, since it will grant prevotes but also revert to follower if it hears from the leader. A quorum of such leaderless followers can thus allow a pre-candidate to hold an election if they believe the leader to be dead.

Signed-off-by: Erik Grinaker grinaker@cockroachlabs.com

cc @pavelkalinnikov @tbg

See cockroachdb/cockroach#105126 for a CockroachDB change that depends on this (there will be a couple more along the same lines).

Copy link
Collaborator

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good. @ahrtr could you also give this a look? This is adding a new method to the library. It's the result of lengthy experimentation in CRDB - we'd like to avoid waiting for election timeouts when the leader removes itself. This is the best option we've found - in CRDB, we have a designated "next leader" that we campaign when it sees the ConfChange that removes the next leader, but it won't be granted votes if the other replicas still remember the leader. So we call ForgetLeader on all followers when they see the ConfChange, which - at least often - allows the election to succeed.

testdata/forget_leader.txt Show resolved Hide resolved
testdata/forget_leader.txt Outdated Show resolved Hide resolved
INFO 2 [logterm: 2, index: 12] sent MsgVote request to 1 at term 3
INFO 2 [logterm: 2, index: 12] sent MsgVote request to 3 at term 3

stabilize
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Might be worth adding a log-level=X as a per-command option, i.e.

stabilize log-level=none

but no worries if you don't want to fry that fish now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, done.

@tbg tbg requested a review from ahrtr June 20, 2023 08:38
@erikgrinaker
Copy link
Contributor Author

we'd like to avoid waiting for election timeouts when the leader removes itself

This isn't really the motivating case -- we handle that by calling an immediate election on the leaseholder, bypassing prevote since we know there is no leader to disrupt (cockroachdb/cockroach#104969).

The motivating case is when we unquiesce a range to find a dead leader according to liveness gossip -- if a quorum of replicas consider the leader dead according to liveness we'd like to elect a new leader immediately rather than waiting out the election timeout. We only want to do this when a quorum agrees though, and without disrupting a current leader, since it's possible for individual replicas to have stale or incorrect views of liveness. ForgetLeader provides these properties.

@erikgrinaker erikgrinaker force-pushed the forget-leader branch 2 times, most recently from 21f3296 to 5cb9207 Compare June 20, 2023 10:53
Copy link
Contributor

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

rafttest/interaction_env_handler_stabilize.go Show resolved Hide resolved
node.go Outdated Show resolved Hide resolved
rawnode.go Outdated Show resolved Hide resolved
rawnode.go Outdated Show resolved Hide resolved
Comment on lines 38 to 41
env.Output.Lvl = old
if level == "none" {
env.Output.Builder = &strings.Builder{}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify why do you reset the env.Output.Builder when level == "none"? If we really need to do it, we should get the details encapsulated in (*InteractionEnv) LogLevel.

Suggested change
env.Output.Lvl = old
if level == "none" {
env.Output.Builder = &strings.Builder{}
}
_ = env.LogLevel(old)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because we not only want to hide the log output, we also want to hide the test output. Consider this example:

stabilize
----
> 2 handling Ready
  Ready MustSync=false:
  Lead:0 State:StatePreCandidate
  Messages:
  2->1 MsgPreVote Term:2 Log:1/12
  2->3 MsgPreVote Term:2 Log:1/12
  INFO 2 received MsgPreVoteResp from 2 at term 1
  INFO 2 has received 1 MsgPreVoteResp votes and 0 vote rejections

If we only set env.LogLevel(none) then it will hide the lines starting with INFO, but still output the other lines which are not emitted by the logger but rather the test framework.

This is the same logic as with the log-level command, which happens here:

https://github.com/etcd-io/raft/blob/768abf8a1c58ebd9dd859c71b3fba6704400bc82/rafttest/interaction_env_handler.go#L201-L207

However, we can't rely on this logic, because we've already reset the log level by the time we return.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx for the clarification.

Two options.

  1. apply my previous suggested change, and below change. The point is to encapsulate the details in just one place.
$ git diff
diff --git a/rafttest/interaction_env_handler_log_level.go b/rafttest/interaction_env_handler_log_level.go
index 2194c9e..09ff718 100644
--- a/rafttest/interaction_env_handler_log_level.go
+++ b/rafttest/interaction_env_handler_log_level.go
@@ -29,6 +29,10 @@ func (env *InteractionEnv) handleLogLevel(t *testing.T, d datadriven.TestData) e
 func (env *InteractionEnv) LogLevel(name string) error {
        for i, s := range lvlNames {
                if strings.EqualFold(s, name) {
+                       // If previous log-level is "none", we need to reset all cached log.
+                       if env.Output.Lvl == len(lvlNames)-1 {
+                               env.Output.Builder = &strings.Builder{}
+                       }
                        env.Output.Lvl = i
                        return nil
                }
  1. No any code change at all, just put a log-level none before the command which you want to turn off the log, and put a log-level xxx after it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went a bit further, and made the Env.Output.Write* methods respect the NONE log level. This seemed cleaner, and will only suppress output for the duration that Lvl was actually set to NONE.

One minor effect is that we no longer trivially know whether we suppressed any output, so we now always output ok with NONE instead of ok (quiet). That seems fine.

@erikgrinaker erikgrinaker force-pushed the forget-leader branch 4 times, most recently from fcb6b45 to f1f83d2 Compare June 23, 2023 08:53
Signed-off-by: Erik Grinaker <grinaker@cockroachlabs.com>
Signed-off-by: Erik Grinaker <grinaker@cockroachlabs.com>
node.go Outdated
Comment on lines 192 to 216
// ForgetLeader forgets a follower's current leader, changing it to None. It
// remains a leaderless follower in the current term, without campaigning.
//
// This is particularly useful with PreVote+CheckQuorum, if the caller has
// strong reason to believe the leader is dead, since it will grant prevotes
// but also revert to follower if it hears from the leader. A quorum of such
// leaderless followers can thus allow a pre-candidate to hold an election
// if they all believe the leader to be dead.
//
// Does nothing with ReadOnlyLeaseBased, since it would allow a new leader to
// be elected without the old leader knowing.
ForgetLeader(ctx context.Context) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we provide an example something like below to make it clearer?

For example, for a three member (e.g 1, 2, and 3) cluster, assuming the current
leader is 3. If we believe that the leader 3 is dead, then we can force member 2
to forget the leader, and make member 1 to campaign. Eventually member 1 will
be elected as a new leader by member 2 and itself. Note if we don't force 
member 2 to forget the leader beforehand, the the campaign from member 1
might be rejected by member 2 because it may think there is already a valid 
leader in current term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's this?

	// ForgetLeader forgets a follower's current leader, changing it to None. It
	// remains a leaderless follower in the current term, without campaigning.
	//
	// This is useful with PreVote+CheckQuorum, where followers will normally not
	// grant pre-votes if they've heard from the leader in the past election
	// timeout interval. Leaderless followers can grant pre-votes immediately, so
	// if a quorum of followers have strong reason to believe the leader is dead
	// (for example via a side-channel or external failure detector) and forget it
	// then they can elect a new leader immediately, without waiting out the
	// election timeout. They will also revert to normal followers if they hear
	// from the leader again, or transition to candidates on an election timeout.
	//
	// For example, consider a three-node cluster where 1 is the leader and 2+3
	// have just received a heartbeat from it. If 2 and 3 believe the leader has
	// now died (maybe they know that an orchestration system shut down 1's VM),
	// we can instruct 2 to forget the leader and 3 to campaign. 2 will then be
	// able to grant 3's pre-vote and elect 3 as leader immediately (normally 2
	// would reject the vote until an election timeout passes because it has heard
	// from the leader recently). However, 3 can not campaign unilaterally, a
	// quorum have to agree that the leader is dead, which avoids disrupting the
	// leader if individual nodes are wrong about it being dead.
	//
	// This does nothing with ReadOnlyLeaseBased, since it would allow a new
	// leader to be elected without the old leader knowing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

cc @tbg to take another look before we merge this PR later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, appreciate the review!

@ahrtr
Copy link
Member

ahrtr commented Jun 23, 2023

Overall looks good to me, great work! The only minor concern is the users of the raft library might not be very clear when and how to use ForgetLeader. So suggest to add an example in the comment.

This patch adds `ForgetLeader()`, which causes a follower to forget its
current leader, changing it to None. It remains a leaderless follower in
the current term, without campaigning.

This is particularly useful with PreVote+CheckQuorum, if the caller has
strong reason to believe the leader is dead, since it will grant
prevotes but also revert to follower if it hears from the leader. A
quorum of such leaderless followers can thus allow a pre-candidate to
hold an election if they believe the leader to be dead.

Signed-off-by: Erik Grinaker <grinaker@cockroachlabs.com>
@ahrtr ahrtr merged commit a10cd45 into etcd-io:main Jun 26, 2023
7 checks passed
@ahrtr ahrtr added this to the v3.6.0 milestone Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants