New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Readiness probe is failing, but the service is accessible when execd #51096
Comments
@chrissound
Note: Method 1 will trigger an email to the group. You can find the group list here and label list here. |
/sig network |
If I SSH into the GCP instance I'm also able to get a HTTP 200 from the container IP. |
Nginx logs is showing this:
Which is probably the readiness probe. |
Nginx debug logs:
And according to https://stackoverflow.com/questions/15613452/nginx-issues-http-499-error-after-60-seconds-despite-config-php-and-aws#answer-15621223 it sounds like the probe is closing the connection.... Hmm |
@chrissound |
The issue was - the container was taking longer then a second to respond. |
@chrissound Would you mind suggesting what you did to resolve this? I am facing a similar issue. |
@saurabhdevops You can increase the probe timeout if you don't expect the container to respond within 1 second ( |
sorry to bump this, but I'm on EKS (if it matters) and my nginx ingress readiness probes are taking more than 1 second to respond too. Sometimes they hang upwards of 5-8 seconds and then respond. I increased |
I'm having the same issue and I have increased the timeoutSeconds to 5. |
@Macmee Having the same strange issue here. Basically we started using NodePort, and it sporadically times out the readiness / liveness probe checks. When I inspect our monitoring metrics, the endpoints used for this has a maximum response time of 4.5ms, in other words, there is something stalling the connection into the container/pod somehow. The way I run it is that two services attach to the container, one headless and one http. I'm curious if that could be part of the problem? Have you learned anything more about the issue? |
Same experience here |
Seeing this issue in GCP as well. Not sure where this stall is coming from. |
we are using same C5 instances, but not EKS, we are deploy k8s with kops. we set the |
Same experience. I'm using kops on AWS v1.10.6. |
Did anyone ever solve this? |
same problem with What is super odd is that I see log from the pod and this is coming from the probe :
We have other services running |
Did anyone find a solution to this? |
Any updates on this? |
Ran into this issue today. Really strange behaviour |
Any Updates ? |
Same issue, this is effecting only in one of my GCP project, I have 3 private GKE clusters only 2 of them are being affected. The same code running in different projects are not behaving the same. Only one container is failing the readiness/liveness probes. This is seen with multiple pods across different namespaces. In kube-system namespace the kube-dns, l7-default-backend, heapster pods are restarting with the same behavior. Any insight is much appreciated. |
Try doubling your CPU requests to see if that helps. I saw readiness probes failing due to lack of resources. |
I have encountered this with a resource-limited namespace that was close to capacity (unsure if this was causative). Setting the timeout as suggested here fixed it. |
hi everyone. I wonder how can we know the root cause is the resource? |
Add more info: the service is still available when I make a REST call from my postman. But in Kubernetes ecosystem, when other services call it, the unhealthy pods takes a long time to response => causes a time out exception |
did you resolve this issue? |
hi rockysingh, Not yet. |
I tried to increase (twice) the time in Restemplate while calling API in unhealthy pods, but I still received some time out exception. Still finding the root cause of this |
Here facing the same issue |
I have the same issue... Is not a resources problem. I can see the correct response in the pods logs:
But if I describe the pod:
|
/triage unresolved |
It is worth checking if the pods CPU spikes on startup. |
Started seeing the problem when there was a very high CPU and memory usage during a performance testing. Continued testing by increasing the liveliness and readiness timeout as it was working after the change. But after started failing with the increased values. Some of the pods fails to register as a services in kubernetes even though the docker service is up properly. Looking for any help to understand what causes this problem. Note: Please note that mongodb also runs in the same setup, and mongodb is consuming almost all the memory. Mongodb is running as a statefulset. After few more rounds of testing even mongodb pods also went to crashloopbackoff due to liveliness failure. |
I am also facing this issue on k8 version 1.26 , 1.27 and 1.28 . NAME="Rocky Linux" |
What happened: Readiness probe failed
What you expected to happen: Succesfull container startup
How to reproduce it (as minimally and precisely as possible): I created a NodePort service, and then the issue seemed to start. I deleted the NodePort service and reverted back to the previous LoadBalancer service but the issue remains.
If I remove the liveness probe from my deployment config, the containers start up successfully, and I'm able to access them via the LoadBalancer service external IP
Cloud provider or hardware configuration: Google Cloud Platform
The container in question is a Nginx Container listening on port 80.
The event from
kubectl describe
is:Unhealthy Readiness probe failed: Get http://10.24.1.33:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I can exec into the container and do:
wget 10.24.1.33
which works correctly (HTTP 200 response).Readiness probe config:
The text was updated successfully, but these errors were encountered: