Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: configure Raft Pre-Vote to reduce disruptive rejoining servers #9352

Merged
merged 4 commits into from Mar 7, 2018

Conversation

@gyuho
Copy link
Member

commented Feb 23, 2018

Address #9333 and #8499.

/cc @wojtek-t @jpbetz

In simplest form, Raft leader steps down to follower when it receives a message with higher term. For instance, a flaky(or rejoining) member drops in and out, and starts campaign. This member will end up with a higher term, and ignore all incoming messages with lower term. In this case, a new leader eventually need to get elected, thus disruptive to cluster availability. Same happens with isolated member or slow incoming network. In etcd, isolated member with higher term would send pb.MsgAppResp, in response to pb.MsgHeartbeat from leader, and pb.MsgAppResp to leader forces leader to step down, and cluster becomes unavailable.

Raft implements Pre-Vote phase to prevent this kind of disruptions. If enabled, Raft runs an additional phase of election to check if pre-candidate can get enough votes to win an election. If not, it would remain as follower, and most likely, receive leader heartbeats to reset its elapsed election ticks.

Pre-Vote feature is recommended in deployments that would require higher availability, at the cost of more expensive election protocols. Thus, it's added as an experimental feature. Will be enabled by default in 3.5.


Added tests in Raft package, but not in etcd server layer, since CheckQuorum is true by default for etcd, hard to test this Raft logic in server-side.

// TestDisruptiveRejoinPreVote simulates isolated follower starts campaign
// to disrupt current leader, which can be prevented by Raft Pre-Vote.
func TestDisruptiveRejoinPreVote(t *testing.T) {
	defer testutil.AfterTest(t)

	clus := NewClusterV3(t, &ClusterConfig{
		Size:    3,
		PreVote: true,
	})
	defer clus.Terminate(t)

	lead1 := clus.WaitLeader(t)

	// simluate m[lead1+2] sending pre-vote requests to leader
	// current leader never steps down from timeouts
	clus.Members[lead1].s.Cfg.ElectionTicks *= 1000
	// m[lead1+1] never starts a campaign
	clus.Members[(lead1+1)%3].s.Cfg.ElectionTicks *= 1000

	// manually disable check quorum in m[lead1]

	// drop incoming messages to m[lead1+2]
	// m[lead1+2] will pre-vote request to m[lead1]
	clus.Members[lead1].s.PauseSending()
	clus.Members[(lead1+1)%3].s.PauseSending()

	// wait until m[lead1+2] becomes pre-candidate
	timeout := clus.Members[(lead1+2)%3].ElectionTimeout()
	time.Sleep(timeout + timeout/2)

	clus.Members[lead1].s.ResumeSending()
	clus.Members[(lead1+1)%3].s.ResumeSending()

	// If pre-vote were not enabled,
	// force leader m[lead1] to step down by sending
	//   1. MsgAppResp with higher term
	//   2. MsgVote with higher term

	lead2 := clus.WaitLeader(t)
	if lead1 != lead2 {
		t.Fatal("with pre-vote enabled, expect no leader disruption")
	}
}

We can add more tests around this once we refactor integration package with embedded etcd and proxy layer.

/cc @bdarnell @siddontang for Raft tests.

@gyuho gyuho added the Release-Note label Feb 23, 2018

@gyuho gyuho changed the title *: configure Rat Pre-Vote to reduce disruptive rejoining servers *: configure Raft Pre-Vote to reduce disruptive rejoining servers Feb 23, 2018

@gyuho gyuho force-pushed the gyuho:raft-pre-vote branch 2 times, most recently from 2ae5b2a to 08f3072 Feb 23, 2018

@gyuho gyuho added the area/raft label Feb 25, 2018

@siddontang siddontang referenced this pull request Feb 26, 2018
@xiang90

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2018

this wont fix #9333.

as @wojtek-t pointed out later in the thread, the restarted node starts the ticking too fast.

@gyuho

This comment has been minimized.

Copy link
Member Author

commented Feb 26, 2018

@xiang90 Will investigate more based on the server logs.

@gyuho gyuho added the WIP label Feb 26, 2018

@gyuho gyuho removed the WIP label Mar 6, 2018

gyuho added 4 commits Feb 23, 2018
raft: document disruptive rejoining server, add tests
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
etcdserver: support Raft Pre-Vote
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
embed: add "ExperimentalPreVote" for Raft
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
etcdmain: add "--experimental-pre-vote" flag
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

@gyuho gyuho force-pushed the gyuho:raft-pre-vote branch from 08f3072 to b48d3eb Mar 6, 2018

@codecov-io

This comment has been minimized.

Copy link

commented Mar 6, 2018

Codecov Report

Merging #9352 into master will decrease coverage by 0.09%.
The diff coverage is 70%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #9352     +/-   ##
=========================================
- Coverage   72.54%   72.44%   -0.1%     
=========================================
  Files         362      362             
  Lines       30783    30793     +10     
=========================================
- Hits        22330    22309     -21     
- Misses       6826     6855     +29     
- Partials     1627     1629      +2
Impacted Files Coverage Δ
raft/raft.go 91.68% <ø> (ø) ⬆️
etcdserver/config.go 79.8% <ø> (ø) ⬆️
etcdmain/config.go 81.21% <100%> (+0.1%) ⬆️
embed/etcd.go 67.42% <100%> (+0.1%) ⬆️
etcdserver/raft.go 89.36% <100%> (+0.1%) ⬆️
embed/config.go 60.85% <100%> (+0.13%) ⬆️
etcdserver/server.go 78.44% <25%> (-0.59%) ⬇️
auth/simple_token.go 87.03% <0%> (-6.49%) ⬇️
clientv3/namespace/watch.go 72.72% <0%> (-6.07%) ⬇️
pkg/adt/interval_tree.go 84.98% <0%> (-4.81%) ⬇️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 211523f...b48d3eb. Read the comment docs.

@gyuho

This comment has been minimized.

Copy link
Member Author

commented Mar 7, 2018

Tested pre-vote feature in functional-testing with no failure. Will keep testing. Pre-vote is disable by default, so should be safe to merge.

#9333 will be addressed in #9364.

@gyuho gyuho merged commit 89292af into etcd-io:master Mar 7, 2018

5 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
jenkins-cov Build finished.
Details
jenkins-ppc64le Build finished.
Details
jenkins-proxy-ci Build finished.
Details
semaphoreci The build passed on Semaphore.
Details

@gyuho gyuho deleted the gyuho:raft-pre-vote branch Mar 7, 2018

july2993 added a commit to july2993/raft-rs that referenced this pull request Mar 12, 2018
BusyJay added a commit to pingcap/raft-rs that referenced this pull request Mar 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants
You can’t perform that action at this time.