Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd container fails healthcheck probe due to context deadline exceeded #12755

Closed
liiri opened this issue Mar 9, 2021 · 3 comments
Closed

etcd container fails healthcheck probe due to context deadline exceeded #12755

liiri opened this issue Mar 9, 2021 · 3 comments

Comments

@liiri
Copy link

liiri commented Mar 9, 2021

I have etcd 3.4.13-0 running in my bare metal Kubernetes cluster.

I'm seeing many of the infamous "error:context deadline exceeded" took too long (2.000051514s) to execute warnings, and I'm pretty sure there is an actual disk issue.

However, I'm more concerened with the fact that my container gets shut down after running for some time, due to Kuberetes health check failures. What I see in logs:

...
2021-03-09 08:16:54.404418 W | etcdserver: read-only range request "key:\"/registry/health\" " with result "error:context deadline exceeded" took too long (2.000051514s) to execute
WARNING: 2021/03/09 08:16:54 grpc: Server.processUnaryRPC failed to write status: connection error: desc = "transport is closing"
2021-03-09 08:16:55.901302 W | etcdserver: failed to revoke 74867816086e2e90 ("etcdserver: request timed out")

What actually makes the request time out here? Is the "took to long" warning actually an error? Or is it in the logs but unrelated to the grpc error? Where is the context deadline metric configured?

@ptabor
Copy link
Contributor

ptabor commented Mar 9, 2021

error:context deadline exceeded means that the deadline is reached and the request failed.

As user can set arbitrarily short deadlines... from etcd perspective its just a 'warning' that one of the callers was left unsatisfied.

k8s has recent changes to attach less objects to individual 'leases' so the failures of lease revocation should be less likely.

@liiri
Copy link
Author

liiri commented Mar 9, 2021

So in this case, the user is kube-apiserver?

Can you please explain (or refer me to) how leases are affecting the health api?

@ptabor
Copy link
Contributor

ptabor commented Mar 9, 2021

@ptabor ptabor closed this as completed Mar 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants