Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcdserver: adjust election timeout on restart #9415

Merged
merged 3 commits into from Mar 11, 2018

Conversation

@gyuho
Copy link
Member

commented Mar 9, 2018

Address #9333 with simpler logic.

Single-node restart with no snapshot does not need special handling, because itself will be elected as leader, by the time peer connection report wait times out.


Fresh start 1-node cluster

15:45:34.940350 I | etcdserver: 8e9e05c52164694d as single-node; fast-forwarding 9 ticks (election ticks 10)
15:45:34.940911 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
15:45:35.233465 I | raft: 8e9e05c52164694d is starting a new election at term 1
15:45:35.233525 I | raft: 8e9e05c52164694d became candidate at term 2
15:45:35.233559 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 2
15:45:35.233596 I | raft: 8e9e05c52164694d became leader at term 2


Restart 1-node cluster from snapshot

15:47:29.182917 I | etcdserver: recovered store from snapshot at index 8
15:47:29.194451 I | etcdserver: 8e9e05c52164694d as single-node; fast-forwarding 9 ticks (election ticks 10)
15:47:29.487854 I | raft: 8e9e05c52164694d is starting a new election at term 2
15:47:29.487939 I | raft: 8e9e05c52164694d became candidate at term 3
15:47:29.487985 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 3
15:47:29.488032 I | raft: 8e9e05c52164694d became leader at term 3
15:47:29.488062 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 3


Restart 1-node with no snapshot

15:49:43.412234 I | raft: 8e9e05c52164694d is starting a new election at term 2
15:49:43.412299 I | raft: 8e9e05c52164694d became candidate at term 3
15:49:43.412339 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 3
15:49:43.412380 I | raft: 8e9e05c52164694d became leader at term 3
15:49:43.412407 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 3
15:49:47.020895 I | etcdserver: 8e9e05c52164694d waited 5s but no active peer found (or restarted 1-node cluster); currently, 1 member(s)

Leader gets elected while waiting for peer connection report timeouts, so no side-effect.


Fresh start 3-node

node A:

15:53:47.895306 I | etcdserver: 7339c4e5e833c029 waited 5s but no active peer found (or restarted 1-node cluster); currently, 3 member(s)
15:53:48.690716 I | raft: 7339c4e5e833c029 is starting a new election at term 4
15:53:49.991580 I | raft: 7339c4e5e833c029 became leader at term 6

node B:

15:54:02.194297 I | etcdserver: b548c2511513015 initialzed peer connection; fast-forwarding 8 ticks (election ticks 10) with 2 active peer(s)
15:54:02.197587 I | raft: b548c2511513015 [term: 1] received a MsgHeartbeat message with higher term from 7339c4e5e833c029 [term: 6]

No side-effect.


Rejoining to 3-node cluster with snapshot

16:01:12.800882 I | rafthttp: peer 729934363faa4a24 became active
16:01:12.800894 I | etcdserver: 7339c4e5e833c029 initialzed peer connection; fast-forwarding 8 ticks (election ticks 10) with 2 active peer(s)
16:01:12.857434 I | raft: raft.node: 7339c4e5e833c029 elected leader 729934363faa4a24 at term 7

Peer connection is notified and advance with adjusted ticks.
Previously, it advanced 9 ticks with only one tick left. Now, advances 8 ticks.


Rejoining to 3-node cluster with no snapshot

16:05:48.368674 I | etcdserver: 7339c4e5e833c029 initialzed peer connection; fast-forwarding 8 ticks (election ticks 10) with 2 active peer(s)
16:05:48.439695 I | raft: raft.node: 7339c4e5e833c029 elected leader 729934363faa4a24 at term 6


/cc @xiang90 @jpbetz
@codecov-io

This comment has been minimized.

Copy link

commented Mar 9, 2018

Codecov Report

❗️ No coverage uploaded for pull request base (master@9e84f2d). Click here to learn what that means.
The diff coverage is 97.36%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #9415   +/-   ##
=========================================
  Coverage          ?   72.48%           
=========================================
  Files             ?      362           
  Lines             ?    30827           
  Branches          ?        0           
=========================================
  Hits              ?    22344           
  Misses            ?     6854           
  Partials          ?     1629
Impacted Files Coverage Δ
etcdserver/raft.go 89.47% <100%> (ø)
etcdserver/server.go 79.73% <100%> (ø)
rafthttp/transport.go 83.42% <87.5%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e84f2d...9680b8a. Read the comment docs.

// 1. all connections failed, or
// 2. no active peers, or
// 3. restarted single-node with no snapshot
plog.Infof("%s waited %s but no active peer found (or restarted 1-node cluster); currently, %d member(s)", srv.ID(), rafthttp.ConnReadTimeout, len(cl.Members()))

This comment has been minimized.

Copy link
@xiang90

xiang90 Mar 10, 2018

Contributor

just remove this logging? we only log if we do something special.

This comment has been minimized.

Copy link
@gyuho

gyuho Mar 10, 2018

Author Member

Removed. PTAL. Thanks!


if s.once != nil {
s.once.Do(func() {
plog.Infof("notifying of active peer %q", s.id)

This comment has been minimized.

Copy link
@xiang90

xiang90 Mar 10, 2018

Contributor

this logging does not seem to be useful.

This comment has been minimized.

Copy link
@gyuho

gyuho Mar 10, 2018

Author Member

Removed.

// InitialPeerNotify returns a channel that closes when an initial
// peer connection has been established. Use this to wait until the
// first peer connection becomes active.
InitialPeerNotify() <-chan struct{}

This comment has been minimized.

Copy link
@xiang90

xiang90 Mar 10, 2018

Contributor

hmm... can you try to find another way to do this without introducing a new method to the interface? this interface is too heavy already.

This comment has been minimized.

Copy link
@gyuho

gyuho Mar 10, 2018

Author Member

Agree. Just removed.

@gyuho gyuho force-pushed the gyuho:adjust-advancing-ticks branch from fad0db1 to 60b3d7f Mar 10, 2018

// This can be used for fast-forwarding election
// ticks in multi data-center deployments, thus
// speeding up election process.
advanceRaftTicks func(ticks int)

This comment has been minimized.

Copy link
@xiang90

xiang90 Mar 11, 2018

Contributor

this should be a method on raft related struct.

@@ -527,6 +539,32 @@ func NewServer(cfg ServerConfig) (srv *EtcdServer, err error) {
}
srv.r.transport = tr

srv.goAttach(func() {

This comment has been minimized.

Copy link
@xiang90

xiang90 Mar 11, 2018

Contributor

why this needs to be async? can we start the network first? then wait here? then decide to advance ticks or not. then start the raft routine?

This comment has been minimized.

Copy link
@xiang90

xiang90 Mar 11, 2018

Contributor

also if it works like what i described, we do not need the lock to protect the tick.

This comment has been minimized.

Copy link
@gyuho

gyuho Mar 11, 2018

Author Member

It needs to be async, since we start peer handler in embed package, after we create etcd server here.

Addressed others. PTAL.

@@ -32,11 +32,16 @@ type peerStatus struct {
mu sync.Mutex // protect variables below
active bool
since time.Time

once *sync.Once
notify chan struct{}

This comment has been minimized.

Copy link
@xiang90

xiang90 Mar 11, 2018

Contributor

activeNotify


if s.once != nil {
s.once.Do(func() {
close(s.notify)

This comment has been minimized.

Copy link
@xiang90

xiang90 Mar 11, 2018

Contributor

probably just do

select {
    case s.notify<- struct{}{}:
    default: 
}

we do not need the once struct.

// InitialPeerNotify returns a channel that closes when an initial
// peer connection has been established. Use this to wait until the
// first peer connection becomes active.
func (t *Transport) InitialPeerNotify() <-chan struct{} { return t.initPeerNotifyCh }

This comment has been minimized.

Copy link
@xiang90

xiang90 Mar 11, 2018

Contributor

well... actually, an easy way to solve this problem is to have a for loop to loop over the peer status and check if active is true.

for p := range peers {
    if p.status.isActive() {
        // send to chan
    }
}
// InitialPeerNotify returns a channel that closes when an initial
// peer connection has been established. Use this to wait until the
// first peer connection becomes active.
func (t *Transport) InitialPeerNotify() <-chan struct{} { return t.initPeerNotifyCh }

This comment has been minimized.

Copy link
@xiang90

xiang90 Mar 11, 2018

Contributor

Even better:

InitialPeerNotify() -> ActivePeers()

ActivePeers simply calculate how many peers are active. we move the for loop thing to the raft node side in etcdserver pkg to wait for the first active peer.

@gyuho gyuho force-pushed the gyuho:adjust-advancing-ticks branch 2 times, most recently from 8c4a077 to e42dd2a Mar 11, 2018

}
}

// 1. all connections failed, or

This comment has been minimized.

Copy link
@xiang90

xiang90 Mar 11, 2018

Contributor

move this comment to line 544

// retry up to "rafthttp.ConnReadTimeout", which is 5-sec
for i := 0; i < 5; i++ {
select {
case <-time.After(time.Second):

This comment has been minimized.

Copy link
@xiang90

xiang90 Mar 11, 2018

Contributor

reduce this to 50ms to be more responsive.

This comment has been minimized.

Copy link
@xiang90

xiang90 Mar 11, 2018

Contributor

define 50ms as wait time

}

// retry up to "rafthttp.ConnReadTimeout", which is 5-sec
for i := 0; i < 5; i++ {

This comment has been minimized.

Copy link
@xiang90

xiang90 Mar 11, 2018

Contributor

define waitTime. then here can be 5second/waittime

@gyuho gyuho force-pushed the gyuho:adjust-advancing-ticks branch from e42dd2a to 9f356c8 Mar 11, 2018

@@ -76,6 +76,9 @@ type Peer interface {
// activeSince returns the time that the connection with the
// peer becomes active.
activeSince() time.Time
// isActive returns true if the connection to this peer
// has been established
isActive() bool

This comment has been minimized.

Copy link
@xiang90

xiang90 Mar 11, 2018

Contributor

we can reuse activeSince. if it is smaller than current, it it is active, right?

@xiang90

This comment has been minimized.

Copy link
Contributor

commented Mar 11, 2018

lgtm.

gyuho added 2 commits Mar 11, 2018
rafthttp: add "ActivePeers" to "Transport"
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>
etcdserver: make "advanceTicks" method
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

@gyuho gyuho force-pushed the gyuho:adjust-advancing-ticks branch from 9f356c8 to 33adce4 Mar 11, 2018

etcdserver: adjust election ticks on restart
Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

@gyuho gyuho merged commit 249b7a1 into etcd-io:master Mar 11, 2018

5 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
jenkins-cov Build finished.
Details
jenkins-ppc64le Build finished.
Details
jenkins-proxy-ci Build finished.
Details
semaphoreci The build passed on Semaphore.
Details

@gyuho gyuho deleted the gyuho:adjust-advancing-ticks branch Mar 11, 2018

@mborsz

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2018

When do we expect this fix to be released in 3.1 branch?

@gyuho

This comment has been minimized.

Copy link
Member Author

commented Mar 26, 2018

#9485 is merged.
So, we are ready to publish another set of patch releases.

@jpbetz Are you available for 3.1, 3.2 releasing this week?
Let's coordinate at #9411.

Any day this week works for me.

@jpbetz

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2018

@gyuho Yes, I'm available. Thursday (tomorrow) work? I'm free Friday as well.

@gyuho

This comment has been minimized.

Copy link
Member Author

commented Mar 28, 2018

@jpbetz Tomorrow (Thursday) sounds good. Will ping you when the key is ready. Thanks.

@@ -97,6 +97,7 @@ type raftNode struct {
term uint64
lead uint64

tickMu *sync.Mutex

This comment has been minimized.

Copy link
@jpbetz

jpbetz Mar 28, 2018

Contributor

Out of curiosity, any reason for pointer here? My understanding is that pointers are not typically needed for sync.Mutex.

This comment has been minimized.

Copy link
@jpbetz

jpbetz Mar 28, 2018

Contributor

I'm planning to backport this to 3.1 without a pointer since that removes the need to initialize the mutex, which simplifies the backport: https://github.com/coreos/etcd/pull/9500/files#diff-8c6a0ae3bb0763acd9c96a35d89131feR99

This comment has been minimized.

Copy link
@gyuho

gyuho Mar 28, 2018

Author Member

@jpbetz govet would complain something like this

passes lock by value: github.com/coreos/etcd/clientv3.Client contains sync.Mutex

jpbetz added a commit that referenced this pull request Mar 28, 2018
jpbetz added a commit that referenced this pull request Mar 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
5 participants
You can’t perform that action at this time.