Small number of high latency requests after a leader election #12680

Cjen1 · 2021-02-09T17:35:52Z

I've recently been testing etcd's availability around leader failures.

If I create a new client is follows:

  cli_v3, err := clientv3.New(clientv3.Config{
          Endpoints:            endpoints,
          DialTimeout:          dialTimeout,
          DialKeepAliveTime:    dialTimeout / 2,
          DialKeepAliveTimeout: dialTimeout * 2,
          AutoSyncInterval:     dialTimeout / 2,
  })

And the load generator dispatches 1000 requests per second, using a new goroutine for each requests such that each request is applied asynchronously.

In a three node configuration when I kill the leader 20s into the test (it restarts at 40s), I observe the following odd behaviour:

This shows the latency of requests dispatched at a given point in time.

At 20s the leader is killed and a leader election occurs, thus requests submitted during this period are stalled until the election completes. This explains the lack of requests submitted between 20s and 21s which have latencies lower than a 100ms.

However after the election there is a subset of requests (~10%-15%) which are stalled for a further about 8s (the curving down line of high latency requests).

Are there any possible reasons for this? I've found it to be relatively reproducible, but can't find a possible reason for it.

(The nodes are all running on my local machine. The nodes and the client are running etcd v3.4.14)

The text was updated successfully, but these errors were encountered:

ptabor · 2021-02-09T18:09:51Z

It's aligned with my recent observation on 1.4.13.

What I found is that the node that is follower and becoming a leader during the election has some troubles with responding to the read-only requests it used to be serving. Can you check if the 'delayed' requests are 'served' by the 'new leader' or any of the nodes ?

Cjen1 · 2021-02-09T21:21:27Z

So I'm not actually sure and have had difficulty trying to figure this kind of thing out in the past. Specifically trying to track which node a request is dispatched to, has been troublesome, is there an endpoint which I can hook into for it?

Additionally this is should be a write-only workload (just client.Put requests).

ptabor · 2021-02-09T21:46:13Z

You can have 3 concurrently running clients each of them connecting to specific etcd node.

Cjen1 · 2021-02-17T17:14:56Z

Hi I managed to get all that working today. It seems that you were correct, the delayed requests do seem to be read requests served by the new leader.

(leader killed at 20s, that process is restarted at 40s, just read requests)

Afaict the original leader is "10.0.0.1", and "10.0.0.2" is subsequently elected.

wpedrak · 2021-03-04T15:41:56Z

It might be caused by dropping request on

etcd/raft/raft.go

Lines 1075 to 1078 in aefbd22

    
           // Reject read only request when this leader has not committed any log entry at its term. 
        
           if !r.committedEntryInCurrentTerm() { 
        
           	return nil 
        
           }

as later on we wait for response (and didn't get one) in

etcd/server/etcdserver/v3_server.go

Line 754 in aefbd22

case rs = <-s.r.readStateC:

thus nothing is happening till timeout.

On my local setup, commenting out return nil in first file is resolving timeout issue (but is still not a solution, as this if is there for a reason: #7331).

Cjen1 · 2021-03-04T16:24:04Z

@wpedrak I think that the if is there for the Leader Completeness Property (Section 8 of the Raft paper).

Is it possible to do a case split kind of thing? So in the standard case, everything normal, in the !committedEntryInCurrentTerm() case waiting on that condition itself.

ptabor · 2021-03-04T16:58:59Z

@Cjen1 Could you please rerun your test with the if commented out to confirm that root-cause.

Agree that we should somehow delay execution of the code

etcd/raft/raft.go

Lines 1083 to 1093 in aefbd22

    
           switch r.readOnly.option { 
        
           // If more than the local vote is needed, go through a full broadcast. 
        
           case ReadOnlySafe: 
        
           	r.readOnly.addRequest(r.raftLog.committed, m) 
        
           	// The local node automatically acks the request. 
        
           	r.readOnly.recvAck(r.id, m.Entries[0].Data) 
        
           	r.bcastHeartbeatWithCtx(m.Entries[0].Data) 
        
           case ReadOnlyLeaseBased: 
        
           	if resp := r.responseToReadIndexReq(m, r.raftLog.committed); resp.To != None { 
        
           		r.send(resp) 
        
           	}

till the post-commit.

etcd/raft/raft.go

Line 509 in aefbd22

func (r *raft) bcastAppend() {

Seems to be always executed post-commit.

I don't know whether in clusters without any RW traffic, we can assume a commit soon after election, that would drain the queued 'readIndex' updates.

Fixes etcd-io#12680

wpedrak · 2021-03-11T16:19:42Z

Hi @Cjen1 ,

I've just submitted PR which might fix your issue. Once you have time, please have a look at #12762.

Fixes etcd-io#12680

It is second approach (with first being etcd-io#12762) to solve etcd-io#12680

Cjen1 · 2021-03-17T12:31:13Z

@wpedrak below are repeats of the test using the fixes. It appears that they work as expected.

As a minor point, I don't think that the current build from source instructions are correct, I had a devil of a time building your branches.
What worked was abusing make compile-with-docker-test to have a working build environment.

For interpretation's sake, there are three repeats of each of the tests (etcd, etcd with postponing the reads, etcd with retrying the reads after 500ms).

wpedrak · 2021-03-17T13:33:48Z

@Cjen1 great to hear that it works for you. Could you elaborate on build issues you encountered? I've just did

git clone git@github.com:wpedrak/etcd.git tmp-etcd
cd tmp-etcd
git checkout read_index_retry
./build.sh

and it went through without any issue.

Cjen1 · 2021-03-17T13:54:12Z

I was getting a build directory wasn't etcd/v3 error.
Specifically its the first check in scripts/test_lib.sh

if [[ "$(go list)" != "${ROOT_MODULE}/v3" ]]; then
  echo "must be run from '${ROOT_MODULE}/v3' module directory"
  exit 255
fi

(this was from within the ~/go/go.etcd.io/etcd folder with your branch checked out).

I think it might have been related to using an older version of golang (v1.11.5), but I don't know enough about go's tooling to fix it.

ptabor · 2021-03-17T14:01:05Z

For the master branch we expect golang-1.15+.
I think that 1.11 didn't yet understood modules (by default).

Cjen1 · 2021-03-17T14:26:21Z

@ptabor Ah ok!

ptabor added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. stage/investigating labels Feb 9, 2021

wpedrak added a commit to wpedrak/etcd that referenced this issue Mar 11, 2021

raft: postpone MsgReadIndex until first commit in the term

97d4a4b

Fixes etcd-io#12680

wpedrak mentioned this issue Mar 11, 2021

raft: postpone MsgReadIndex until first commit in the term #12762

Merged

wpedrak added a commit to wpedrak/etcd that referenced this issue Mar 12, 2021

raft: postpone MsgReadIndex until first commit in the term

5fc59ae

Fixes etcd-io#12680

wpedrak added a commit to wpedrak/etcd that referenced this issue Mar 16, 2021

server: add 500ms retries to ReadIndex requests for l-reads

6d492e4

It is second approach (with first being etcd-io#12762) to solve etcd-io#12680

wpedrak mentioned this issue Mar 16, 2021

Read index retry #12780

Merged

wpedrak added a commit to wpedrak/etcd that referenced this issue Mar 16, 2021

server: add 500ms retries to ReadIndex requests for l-reads

1df6caf

It is second approach (with first being etcd-io#12762) to solve etcd-io#12680

wpedrak added a commit to wpedrak/etcd that referenced this issue Mar 16, 2021

server: add 500ms retries to ReadIndex requests for l-reads

e977923

It is second approach (with first being etcd-io#12762) to solve etcd-io#12680

ptabor mentioned this issue Mar 17, 2021

README: Update required go version. #12784

Merged

ptabor closed this as completed in 758ff01 Mar 23, 2021

wpedrak mentioned this issue Mar 24, 2021

etcdserver: resend ReadIndex request on empty apply request #12795

Merged

ptabor mentioned this issue Jun 24, 2021

Etcd performance is not good enough. #10711

Closed

nolouch mentioned this issue Jul 14, 2021

PD can not be ready to serve after election a new leader tikv/pd#3847

Closed

serathius mentioned this issue Jul 21, 2022

Plans for v3.4.20 release #14232

Closed

25 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small number of high latency requests after a leader election #12680

Small number of high latency requests after a leader election #12680

Cjen1 commented Feb 9, 2021

ptabor commented Feb 9, 2021

Cjen1 commented Feb 9, 2021

ptabor commented Feb 9, 2021

Cjen1 commented Feb 17, 2021 •

edited

Loading

wpedrak commented Mar 4, 2021

Cjen1 commented Mar 4, 2021

ptabor commented Mar 4, 2021

wpedrak commented Mar 11, 2021

Cjen1 commented Mar 17, 2021

wpedrak commented Mar 17, 2021

Cjen1 commented Mar 17, 2021

ptabor commented Mar 17, 2021

Cjen1 commented Mar 17, 2021

Small number of high latency requests after a leader election #12680

Small number of high latency requests after a leader election #12680

Comments

Cjen1 commented Feb 9, 2021

ptabor commented Feb 9, 2021

Cjen1 commented Feb 9, 2021

ptabor commented Feb 9, 2021

Cjen1 commented Feb 17, 2021 • edited Loading

wpedrak commented Mar 4, 2021

Cjen1 commented Mar 4, 2021

ptabor commented Mar 4, 2021

wpedrak commented Mar 11, 2021

Cjen1 commented Mar 17, 2021

wpedrak commented Mar 17, 2021

Cjen1 commented Mar 17, 2021

ptabor commented Mar 17, 2021

Cjen1 commented Mar 17, 2021

Cjen1 commented Feb 17, 2021 •

edited

Loading