Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small number of high latency requests after a leader election #12680

Closed
Cjen1 opened this issue Feb 9, 2021 · 13 comments
Closed

Small number of high latency requests after a leader election #12680

Cjen1 opened this issue Feb 9, 2021 · 13 comments
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. stage/investigating

Comments

@Cjen1
Copy link
Contributor

Cjen1 commented Feb 9, 2021

I've recently been testing etcd's availability around leader failures.

If I create a new client is follows:

  cli_v3, err := clientv3.New(clientv3.Config{
          Endpoints:            endpoints,
          DialTimeout:          dialTimeout,
          DialKeepAliveTime:    dialTimeout / 2,
          DialKeepAliveTimeout: dialTimeout * 2,
          AutoSyncInterval:     dialTimeout / 2,
  })

And the load generator dispatches 1000 requests per second, using a new goroutine for each requests such that each request is applied asynchronously.

In a three node configuration when I kill the leader 20s into the test (it restarts at 40s), I observe the following odd behaviour:

etcd-leader

This shows the latency of requests dispatched at a given point in time.

At 20s the leader is killed and a leader election occurs, thus requests submitted during this period are stalled until the election completes. This explains the lack of requests submitted between 20s and 21s which have latencies lower than a 100ms.

However after the election there is a subset of requests (~10%-15%) which are stalled for a further about 8s (the curving down line of high latency requests).

Are there any possible reasons for this? I've found it to be relatively reproducible, but can't find a possible reason for it.

(The nodes are all running on my local machine. The nodes and the client are running etcd v3.4.14)

@ptabor
Copy link
Contributor

ptabor commented Feb 9, 2021

It's aligned with my recent observation on 1.4.13.

What I found is that the node that is follower and becoming a leader during the election has some troubles with responding to the read-only requests it used to be serving. Can you check if the 'delayed' requests are 'served' by the 'new leader' or any of the nodes ?

@ptabor ptabor added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. stage/investigating labels Feb 9, 2021
@Cjen1
Copy link
Contributor Author

Cjen1 commented Feb 9, 2021

So I'm not actually sure and have had difficulty trying to figure this kind of thing out in the past. Specifically trying to track which node a request is dispatched to, has been troublesome, is there an endpoint which I can hook into for it?

Additionally this is should be a write-only workload (just client.Put requests).

@ptabor
Copy link
Contributor

ptabor commented Feb 9, 2021

You can have 3 concurrently running clients each of them connecting to specific etcd node.

@Cjen1
Copy link
Contributor Author

Cjen1 commented Feb 17, 2021

Hi I managed to get all that working today. It seems that you were correct, the delayed requests do seem to be read requests served by the new leader.

image

image

(leader killed at 20s, that process is restarted at 40s, just read requests)

Afaict the original leader is "10.0.0.1", and "10.0.0.2" is subsequently elected.

@wpedrak
Copy link
Contributor

wpedrak commented Mar 4, 2021

It might be caused by dropping request on

etcd/raft/raft.go

Lines 1075 to 1078 in aefbd22

// Reject read only request when this leader has not committed any log entry at its term.
if !r.committedEntryInCurrentTerm() {
return nil
}

as later on we wait for response (and didn't get one) in

case rs = <-s.r.readStateC:

thus nothing is happening till timeout.

On my local setup, commenting out return nil in first file is resolving timeout issue (but is still not a solution, as this if is there for a reason: #7331).

@Cjen1
Copy link
Contributor Author

Cjen1 commented Mar 4, 2021

@wpedrak I think that the if is there for the Leader Completeness Property (Section 8 of the Raft paper).

Is it possible to do a case split kind of thing? So in the standard case, everything normal, in the !committedEntryInCurrentTerm() case waiting on that condition itself.

@ptabor
Copy link
Contributor

ptabor commented Mar 4, 2021

@Cjen1 Could you please rerun your test with the if commented out to confirm that root-cause.

Agree that we should somehow delay execution of the code

etcd/raft/raft.go

Lines 1083 to 1093 in aefbd22

switch r.readOnly.option {
// If more than the local vote is needed, go through a full broadcast.
case ReadOnlySafe:
r.readOnly.addRequest(r.raftLog.committed, m)
// The local node automatically acks the request.
r.readOnly.recvAck(r.id, m.Entries[0].Data)
r.bcastHeartbeatWithCtx(m.Entries[0].Data)
case ReadOnlyLeaseBased:
if resp := r.responseToReadIndexReq(m, r.raftLog.committed); resp.To != None {
r.send(resp)
}
till the post-commit.

func (r *raft) bcastAppend() {

Seems to be always executed post-commit.

I don't know whether in clusters without any RW traffic, we can assume a commit soon after election, that would drain the queued 'readIndex' updates.

@wpedrak
Copy link
Contributor

wpedrak commented Mar 11, 2021

Hi @Cjen1 ,

I've just submitted PR which might fix your issue. Once you have time, please have a look at #12762.

wpedrak added a commit to wpedrak/etcd that referenced this issue Mar 12, 2021
wpedrak added a commit to wpedrak/etcd that referenced this issue Mar 16, 2021
wpedrak added a commit to wpedrak/etcd that referenced this issue Mar 16, 2021
wpedrak added a commit to wpedrak/etcd that referenced this issue Mar 16, 2021
@Cjen1
Copy link
Contributor Author

Cjen1 commented Mar 17, 2021

@wpedrak below are repeats of the test using the fixes. It appears that they work as expected.

As a minor point, I don't think that the current build from source instructions are correct, I had a devil of a time building your branches.
What worked was abusing make compile-with-docker-test to have a working build environment.

For interpretation's sake, there are three repeats of each of the tests (etcd, etcd with postponing the reads, etcd with retrying the reads after 500ms).

visualization

@wpedrak
Copy link
Contributor

wpedrak commented Mar 17, 2021

@Cjen1 great to hear that it works for you. Could you elaborate on build issues you encountered? I've just did

git clone git@github.com:wpedrak/etcd.git tmp-etcd
cd tmp-etcd
git checkout read_index_retry
./build.sh

and it went through without any issue.

@Cjen1
Copy link
Contributor Author

Cjen1 commented Mar 17, 2021

I was getting a build directory wasn't etcd/v3 error.
Specifically its the first check in scripts/test_lib.sh

if [[ "$(go list)" != "${ROOT_MODULE}/v3" ]]; then
  echo "must be run from '${ROOT_MODULE}/v3' module directory"
  exit 255
fi

(this was from within the ~/go/go.etcd.io/etcd folder with your branch checked out).

I think it might have been related to using an older version of golang (v1.11.5), but I don't know enough about go's tooling to fix it.

@ptabor
Copy link
Contributor

ptabor commented Mar 17, 2021

For the master branch we expect golang-1.15+.
I think that 1.11 didn't yet understood modules (by default).

@Cjen1
Copy link
Contributor Author

Cjen1 commented Mar 17, 2021

@ptabor Ah ok!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. stage/investigating
Development

No branches or pull requests

3 participants