Readiness probe is failing, but the service is accessible when execd #51096

chrissound · 2017-08-22T10:57:55Z

What happened: Readiness probe failed

What you expected to happen: Succesfull container startup

How to reproduce it (as minimally and precisely as possible): I created a NodePort service, and then the issue seemed to start. I deleted the NodePort service and reverted back to the previous LoadBalancer service but the issue remains.

If I remove the liveness probe from my deployment config, the containers start up successfully, and I'm able to access them via the LoadBalancer service external IP

Cloud provider or hardware configuration: Google Cloud Platform

The container in question is a Nginx Container listening on port 80.

The event from kubectl describe is:
Unhealthy Readiness probe failed: Get http://10.24.1.33:80/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

I can exec into the container and do: wget 10.24.1.33 which works correctly (HTTP 200 response).

Readiness probe config:

          readinessProbe:
            httpGet:
              path: /
              port: 80
            initialDelaySeconds: 5 
            periodSeconds: 5

The text was updated successfully, but these errors were encountered:

k8s-github-robot · 2017-08-22T10:58:31Z

@chrissound
There are no sig labels on this issue. Please add a sig label by:

mentioning a sig: @kubernetes/sig-<group-name>-<group-suffix>
e.g., @kubernetes/sig-contributor-experience-<group-suffix> to notify the contributor experience sig, OR
specifying the label manually: /sig <label>
e.g., /sig scalability to apply the sig/scalability label

Note: Method 1 will trigger an email to the group. You can find the group list here and label list here.
The <group-suffix> in the method 1 has to be replaced with one of these: bugs, feature-requests, pr-reviews, test-failures, proposals

chrissound · 2017-08-22T11:00:55Z

/sig network

chrissound · 2017-08-22T16:00:21Z

If I SSH into the GCP instance I'm also able to get a HTTP 200 from the container IP.

chrissound · 2017-08-23T16:49:13Z

Nginx logs is showing this:

10.24.1.1 - - [23/Aug/2017:16:47:19 +0000] "GET / HTTP/1.1" 499 0 "-" "Go-http-client/1.1"
10.24.1.1 - - [23/Aug/2017:16:47:24 +0000] "GET / HTTP/1.1" 499 0 "-" "Go-http-client/1.1"
10.24.1.1 - - [23/Aug/2017:16:47:29 +0000] "GET / HTTP/1.1" 499 0 "-" "Go-http-client/1.1"
10.24.1.1 - - [23/Aug/2017:16:47:34 +0000] "GET / HTTP/1.1" 499 0 "-" "Go-http-client/1.1"
10.24.1.1 - - [23/Aug/2017:16:47:39 +0000] "GET / HTTP/1.1" 499 0 "-" "Go-http-client/1.1"
10.24.1.1 - - [23/Aug/2017:16:47:44 +0000] "GET / HTTP/1.1" 499 0 "-" "Go-http-client/1.1"
10.24.1.1 - - [23/Aug/2017:16:47:49 +0000] "GET / HTTP/1.1" 499 0 "-" "Go-http-client/1.1"
10.24.1.1 - - [23/Aug/2017:16:47:54 +0000] "GET / HTTP/1.1" 499 0 "-" "Go-http-client/1.1"
10.24.1.1 - - [23/Aug/2017:16:47:59 +0000] "GET / HTTP/1.1" 499 0 "-" "Go-http-client/1.1"

Which is probably the readiness probe.

chrissound · 2017-08-23T17:05:26Z

Nginx debug logs:

2017/08/23 17:03:26 [debug] 5#5: accept on 0.0.0.0:80, ready: 1
2017/08/23 17:03:26 [debug] 5#5: posix_memalign: 000055DEEC1D5BE0:512 @16
2017/08/23 17:03:26 [debug] 5#5: *9 accept: 10.24.1.1:53422 fd:3
2017/08/23 17:03:26 [debug] 5#5: *9 event timer add: 3: 60000:1503507866741
2017/08/23 17:03:26 [debug] 5#5: *9 reusable connection: 1
2017/08/23 17:03:26 [debug] 5#5: *9 epoll add event: fd:3 op:1 ev:80002001
2017/08/23 17:03:26 [debug] 5#5: accept() not ready (11: Resource temporarily unavailable)
2017/08/23 17:03:26 [debug] 5#5: *9 http wait request handler
2017/08/23 17:03:26 [debug] 5#5: *9 malloc: 000055DEEC1CBEC0:1024
2017/08/23 17:03:26 [debug] 5#5: *9 recv: eof:0, avail:1
2017/08/23 17:03:26 [debug] 5#5: *9 recv: fd:3 113 of 1024
2017/08/23 17:03:26 [debug] 5#5: *9 reusable connection: 0
2017/08/23 17:03:26 [debug] 5#5: *9 posix_memalign: 000055DEEC1D9E80:4096 @16
2017/08/23 17:03:26 [debug] 5#5: *9 http process request line
2017/08/23 17:03:26 [debug] 5#5: *9 http request line: "GET / HTTP/1.1"
2017/08/23 17:03:26 [debug] 5#5: *9 http uri: "/"
2017/08/23 17:03:26 [debug] 5#5: *9 http args: ""
2017/08/23 17:03:26 [debug] 5#5: *9 http exten: ""
2017/08/23 17:03:26 [debug] 5#5: *9 posix_memalign: 000055DEEC1DAEA0:4096 @16
2017/08/23 17:03:26 [debug] 5#5: *9 http process request header line
2017/08/23 17:03:26 [debug] 5#5: *9 http header: "Host: 10.24.1.56:80"
2017/08/23 17:03:26 [debug] 5#5: *9 http header: "User-Agent: Go-http-client/1.1"
2017/08/23 17:03:26 [debug] 5#5: *9 http header: "Accept-Encoding: gzip"
2017/08/23 17:03:26 [debug] 5#5: *9 http header: "Connection: close"
2017/08/23 17:03:26 [debug] 5#5: *9 http header done
2017/08/23 17:03:26 [debug] 5#5: *9 event timer del: 3: 1503507866741
2017/08/23 17:03:26 [debug] 5#5: *9 generic phase: 0
2017/08/23 17:03:26 [debug] 5#5: *9 rewrite phase: 1
2017/08/23 17:03:26 [debug] 5#5: *9 test location: "/"
2017/08/23 17:03:26 [debug] 5#5: *9 test location: ~ "\.php$"
2017/08/23 17:03:26 [debug] 5#5: *9 using configuration "/"
2017/08/23 17:03:26 [debug] 5#5: *9 http cl:-1 max:1048576
2017/08/23 17:03:26 [debug] 5#5: *9 rewrite phase: 3
2017/08/23 17:03:26 [debug] 5#5: *9 post rewrite phase: 4
2017/08/23 17:03:26 [debug] 5#5: *9 generic phase: 5
2017/08/23 17:03:26 [debug] 5#5: *9 generic phase: 6
2017/08/23 17:03:26 [debug] 5#5: *9 generic phase: 7
2017/08/23 17:03:26 [debug] 5#5: *9 access phase: 8
2017/08/23 17:03:26 [debug] 5#5: *9 access phase: 9
2017/08/23 17:03:26 [debug] 5#5: *9 access phase: 10
2017/08/23 17:03:26 [debug] 5#5: *9 post access phase: 11
2017/08/23 17:03:26 [debug] 5#5: *9 try files phase: 12
2017/08/23 17:03:26 [debug] 5#5: *9 http script var: "/"
2017/08/23 17:03:26 [debug] 5#5: *9 trying to use file: "/" "/var/www/html/"
2017/08/23 17:03:26 [debug] 5#5: *9 http script var: "/"
2017/08/23 17:03:26 [debug] 5#5: *9 trying to use dir: "/" "/var/www/html/"
2017/08/23 17:03:26 [debug] 5#5: *9 try file uri: "/"
2017/08/23 17:03:26 [debug] 5#5: *9 content phase: 13
2017/08/23 17:03:26 [debug] 5#5: *9 content phase: 14
2017/08/23 17:03:26 [debug] 5#5: *9 open index "/var/www/html/index.php"
2017/08/23 17:03:26 [debug] 5#5: *9 internal redirect: "/index.php?"
2017/08/23 17:03:26 [debug] 5#5: *9 rewrite phase: 1
2017/08/23 17:03:26 [debug] 5#5: *9 test location: "/"
2017/08/23 17:03:26 [debug] 5#5: *9 test location: ~ "\.php$"
2017/08/23 17:03:26 [debug] 5#5: *9 using configuration "\.php$"
2017/08/23 17:03:26 [debug] 5#5: *9 http cl:-1 max:1048576
2017/08/23 17:03:26 [debug] 5#5: *9 rewrite phase: 3
2017/08/23 17:03:26 [debug] 5#5: *9 post rewrite phase: 4
2017/08/23 17:03:26 [debug] 5#5: *9 generic phase: 5
2017/08/23 17:03:26 [debug] 5#5: *9 generic phase: 6
2017/08/23 17:03:26 [debug] 5#5: *9 generic phase: 7
2017/08/23 17:03:26 [debug] 5#5: *9 access phase: 8
2017/08/23 17:03:26 [debug] 5#5: *9 access phase: 9
2017/08/23 17:03:26 [debug] 5#5: *9 access phase: 10
2017/08/23 17:03:26 [debug] 5#5: *9 post access phase: 11
2017/08/23 17:03:26 [debug] 5#5: *9 try files phase: 12
2017/08/23 17:03:26 [debug] 5#5: *9 http init upstream, client timer: 0
2017/08/23 17:03:26 [debug] 5#5: *9 epoll add event: fd:3 op:3 ev:80002005
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "GATEWAY_INTERFACE"
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "CGI/1.1"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "GATEWAY_INTERFACE: CGI/1.1"
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "SERVER_SOFTWARE"
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "nginx"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "SERVER_SOFTWARE: nginx"
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "QUERY_STRING"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "QUERY_STRING: "
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "REQUEST_METHOD"
2017/08/23 17:03:26 [debug] 5#5: *9 http script var: "GET"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "REQUEST_METHOD: GET"
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "CONTENT_TYPE"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "CONTENT_TYPE: "
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "CONTENT_LENGTH"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "CONTENT_LENGTH: "
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "SCRIPT_FILENAME"
2017/08/23 17:03:26 [debug] 5#5: *9 http script var: "/var/www/html"
2017/08/23 17:03:26 [debug] 5#5: *9 http script var: "/index.php"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "SCRIPT_FILENAME: /var/www/html/index.php"
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "SCRIPT_NAME"
2017/08/23 17:03:26 [debug] 5#5: *9 http script var: "/index.php"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "SCRIPT_NAME: /index.php"
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "REQUEST_URI"
2017/08/23 17:03:26 [debug] 5#5: *9 http script var: "/"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "REQUEST_URI: /"
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "DOCUMENT_URI"
2017/08/23 17:03:26 [debug] 5#5: *9 http script var: "/index.php"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "DOCUMENT_URI: /index.php"
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "DOCUMENT_ROOT"
2017/08/23 17:03:26 [debug] 5#5: *9 http script var: "/var/www/html"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "DOCUMENT_ROOT: /var/www/html"
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "SERVER_PROTOCOL"
2017/08/23 17:03:26 [debug] 5#5: *9 http script var: "HTTP/1.1"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "SERVER_PROTOCOL: HTTP/1.1"
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "REMOTE_ADDR"
2017/08/23 17:03:26 [debug] 5#5: *9 http script var: "10.24.1.1"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "REMOTE_ADDR: 10.24.1.1"
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "REMOTE_PORT"
2017/08/23 17:03:26 [debug] 5#5: *9 http script var: "53422"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "REMOTE_PORT: 53422"
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "SERVER_ADDR"
2017/08/23 17:03:26 [debug] 5#5: *9 http script var: "10.24.1.56"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "SERVER_ADDR: 10.24.1.56"
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "SERVER_PORT"
2017/08/23 17:03:26 [debug] 5#5: *9 http script var: "80"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "SERVER_PORT: 80"
2017/08/23 17:03:26 [debug] 5#5: *9 http script copy: "SERVER_NAME"
2017/08/23 17:03:26 [debug] 5#5: *9 http script var: ""
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "SERVER_NAME: "
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "HTTP_HOST: 10.24.1.56:80"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "HTTP_USER_AGENT: Go-http-client/1.1"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "HTTP_ACCEPT_ENCODING: gzip"
2017/08/23 17:03:26 [debug] 5#5: *9 fastcgi param: "HTTP_CONNECTION: close"
2017/08/23 17:03:26 [debug] 5#5: *9 http cleanup add: 000055DEEC1DB990
2017/08/23 17:03:26 [debug] 5#5: *9 get rr peer, try: 2
2017/08/23 17:03:26 [debug] 5#5: *9 get rr peer, current: 000055DEEC1EA890 -1
2017/08/23 17:03:26 [debug] 5#5: *9 stream socket 12
2017/08/23 17:03:26 [debug] 5#5: *9 epoll add connection: fd:12 ev:80002005
2017/08/23 17:03:26 [debug] 5#5: *9 connect to 127.0.0.1:9000, fd:12 #10
2017/08/23 17:03:26 [debug] 5#5: *9 http upstream connect: -2
2017/08/23 17:03:26 [debug] 5#5: *9 posix_memalign: 000055DEEC1F1460:128 @16
2017/08/23 17:03:26 [debug] 5#5: *9 event timer add: 12: 60000:1503507866742
2017/08/23 17:03:26 [debug] 5#5: *9 http finalize request: -4, "/index.php?" a:1, c:3
2017/08/23 17:03:26 [debug] 5#5: *9 http request count:3 blk:0
2017/08/23 17:03:26 [debug] 5#5: *9 http finalize request: -4, "/index.php?" a:1, c:2
2017/08/23 17:03:26 [debug] 5#5: *9 http request count:2 blk:0
2017/08/23 17:03:26 [debug] 5#5: *9 http run request: "/index.php?"
2017/08/23 17:03:26 [debug] 5#5: *9 http upstream check client, write event:1, "/index.php"
2017/08/23 17:03:26 [debug] 5#5: *9 http upstream request: "/index.php?"
2017/08/23 17:03:26 [debug] 5#5: *9 http upstream send request handler
2017/08/23 17:03:26 [debug] 5#5: *9 http upstream send request
2017/08/23 17:03:26 [debug] 5#5: *9 http upstream send request body
2017/08/23 17:03:26 [debug] 5#5: *9 chain writer buf fl:0 s:504
2017/08/23 17:03:26 [debug] 5#5: *9 chain writer in: 000055DEEC1DB9D0
2017/08/23 17:03:26 [debug] 5#5: *9 writev: 504 of 504
2017/08/23 17:03:26 [debug] 5#5: *9 chain writer out: 0000000000000000
2017/08/23 17:03:26 [debug] 5#5: *9 event timer del: 12: 1503507866742
2017/08/23 17:03:26 [debug] 5#5: *9 event timer add: 12: 60000:1503507866742
2017/08/23 17:03:27 [debug] 5#5: *9 http run request: "/index.php?"
2017/08/23 17:03:27 [debug] 5#5: *9 http upstream check client, write event:0, "/index.php"
2017/08/23 17:03:27 [info] 5#5: *9 epoll_wait() reported that client prematurely closed connection, so upstream connection is closed too while sending request to upstream, client: 10.24.1.1, server: , request: "GET / HTTP/1.1", upstream: "fastcgi://127.0.0.1:9000", host: "10.24.1.56:80"
2017/08/23 17:03:27 [debug] 5#5: *9 finalize http upstream request: 499
2017/08/23 17:03:27 [debug] 5#5: *9 finalize http fastcgi request
2017/08/23 17:03:27 [debug] 5#5: *9 free rr peer 2 0
2017/08/23 17:03:27 [debug] 5#5: *9 close http upstream connection: 12
2017/08/23 17:03:27 [debug] 5#5: *9 free: 000055DEEC1F1460, unused: 48
2017/08/23 17:03:27 [debug] 5#5: *9 event timer del: 12: 1503507866742
2017/08/23 17:03:27 [debug] 5#5: *9 reusable connection: 0
2017/08/23 17:03:27 [debug] 5#5: *9 http finalize request: 499, "/index.php?" a:1, c:1
2017/08/23 17:03:27 [debug] 5#5: *9 http terminate request count:1
2017/08/23 17:03:27 [debug] 5#5: *9 http terminate cleanup count:1 blk:0
2017/08/23 17:03:27 [debug] 5#5: *9 http posted request: "/index.php?"
2017/08/23 17:03:27 [debug] 5#5: *9 http terminate handler count:1
2017/08/23 17:03:27 [debug] 5#5: *9 http request count:1 blk:0
2017/08/23 17:03:27 [debug] 5#5: *9 http close request
2017/08/23 17:03:27 [debug] 5#5: *9 http log handler
2017/08/23 17:03:27 [debug] 5#5: *9 free: 000055DEEC1D9E80, unused: 0
2017/08/23 17:03:27 [debug] 5#5: *9 free: 000055DEEC1DAEA0, unused: 1089
2017/08/23 17:03:27 [debug] 5#5: *9 close http connection: 3
2017/08/23 17:03:27 [debug] 5#5: *9 reusable connection: 0
2017/08/23 17:03:27 [debug] 5#5: *9 free: 000055DEEC1CBEC0
2017/08/23 17:03:27 [debug] 5#5: *9 free: 000055DEEC1D5BE0, unused: 120

And according to https://stackoverflow.com/questions/15613452/nginx-issues-http-499-error-after-60-seconds-despite-config-php-and-aws#answer-15621223 it sounds like the probe is closing the connection....

Hmm

jiaxuanzhou · 2017-08-29T01:56:55Z

@chrissound
what happens when the "host" set to "127.0.0.1" ?

chrissound · 2017-08-29T07:25:29Z

The issue was - the container was taking longer then a second to respond.

saurabhdevops · 2017-10-30T10:20:28Z

@chrissound Would you mind suggesting what you did to resolve this? I am facing a similar issue.

mgalgs · 2018-01-17T00:59:17Z

@saurabhdevops You can increase the probe timeout if you don't expect the container to respond within 1 second (livenessProbe.timeoutSeconds).

Macmee · 2018-09-29T18:29:46Z

sorry to bump this, but I'm on EKS (if it matters) and my nginx ingress readiness probes are taking more than 1 second to respond too. Sometimes they hang upwards of 5-8 seconds and then respond. I increased livenessProbe.timeoutSeconds but I'm trying to figure out why this delay occurs. Does anyone here have any suggestions?

tproenca · 2018-10-09T20:29:23Z

I'm having the same issue and I have increased the timeoutSeconds to 5.

JorgenG · 2018-11-19T13:46:18Z

@Macmee Having the same strange issue here. Basically we started using NodePort, and it sporadically times out the readiness / liveness probe checks.

When I inspect our monitoring metrics, the endpoints used for this has a maximum response time of 4.5ms, in other words, there is something stalling the connection into the container/pod somehow.

The way I run it is that two services attach to the container, one headless and one http. I'm curious if that could be part of the problem?

Have you learned anything more about the issue?

stafot · 2018-12-04T12:18:23Z

Same experience here EKS too. c5 instances.

nomcopter · 2019-02-14T00:38:40Z

Seeing this issue in GCP as well. Not sure where this stall is coming from.

wwyhy · 2019-02-18T03:10:18Z

Same experience here EKS too. c5 instances.

we are using same C5 instances, but not EKS, we are deploy k8s with kops.

we set the timeoutSeconds to 5, no help.

futare · 2019-03-12T14:35:23Z

Same experience. I'm using kops on AWS v1.10.6. timeoutSeconds to 5 (I tried higher also) but metrics show sub second response time. Using tomcat in pod. Get 200 in tomcat access logs and externally app responds fine. Can reach from kubelet host fine also. Using r5 instances. Anyone here have suggestions?

blaketastic2 · 2019-04-01T16:27:12Z

Did anyone ever solve this?

mbelang · 2019-05-31T13:15:27Z

same problem with m5 instances on EKS (do no think EKS is related) k8s 1.12.x and gunicorn app. If I expose flask server directly the issue doesn't appear anymore...

What is super odd is that I see log from the pod and this is coming from the probe :

2019-05-31T13:16:46.863528Z {"url": "/healthcheck", "event": "Healthcheck", "timestamp": "2019-05-31T13:16:46.863284Z", "logger": "demo.main", "level": "info"}

We have other services running uvicorn and gunicorn and not problem.

cattermo · 2019-09-26T09:18:43Z

Did anyone find a solution to this?

gitnik · 2019-11-18T15:05:14Z

Any updates on this?

AndrewGrachov · 2019-12-02T20:55:23Z

Ran into this issue today. Really strange behaviour

Biga-incorta · 2019-12-03T08:21:31Z

Any Updates ?

rohitrd0987 · 2019-12-10T08:04:27Z

Same issue, this is effecting only in one of my GCP project, I have 3 private GKE clusters only 2 of them are being affected. The same code running in different projects are not behaving the same. Only one container is failing the readiness/liveness probes. This is seen with multiple pods across different namespaces. In kube-system namespace the kube-dns, l7-default-backend, heapster pods are restarting with the same behavior. Any insight is much appreciated.

ervinb · 2020-01-13T13:16:27Z

Try doubling your CPU requests to see if that helps. I saw readiness probes failing due to lack of resources.

muscovitebob · 2020-01-16T10:14:43Z

I have encountered this with a resource-limited namespace that was close to capacity (unsure if this was causative). Setting the timeout as suggested here fixed it.

vanhaxl · 2020-01-17T21:35:18Z

hi everyone. I wonder how can we know the root cause is the resource?
This issue is happening to our software currently. I check CPU usage is only 40%, memory is 36%.
is this the situation of thread starvation

vanhaxl · 2020-01-17T22:52:48Z

Add more info: the service is still available when I make a REST call from my postman. But in Kubernetes ecosystem, when other services call it, the unhealthy pods takes a long time to response => causes a time out exception

rockysingh · 2020-01-22T16:01:17Z

Add more info: the service is still available when I make a REST call from my postman. But in Kubernetes ecosystem, when other services call it, the unhealthy pods takes a long time to response => causes a time out exception

did you resolve this issue?

vanhaxl · 2020-01-22T17:07:26Z

hi rockysingh, Not yet.

vanhaxl · 2020-01-22T17:09:28Z

I tried to increase (twice) the time in Restemplate while calling API in unhealthy pods, but I still received some time out exception. Still finding the root cause of this

vatsramkesh · 2020-02-11T10:13:37Z

Here facing the same issue Readiness probe failed: Get http://127.0.0.1:8085/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) even though service return 200 response

Nittarab · 2020-02-25T12:34:44Z

I have the same issue... Is not a resources problem.

I can see the correct response in the pods logs:

10.48.8.1 - - [25/Feb/2020:12:30:05 +0000] "GET /wp-login.php HTTP/1.1" 200 1340
10.48.8.1 - - [25/Feb/2020:12:30:09 +0000] "GET /wp-login.php HTTP/1.1" 200 1340
10.156.0.12 - - [25/Feb/2020:12:30:11 +0000] "GET /wp-login.php HTTP/1.1" 200 3206

But if I describe the pod:

Events:
  Type     Reason     Age                   From                                          Message
  ----     ------     ----                  ----                                          -------
  Normal   Scheduled  3m6s                  default-scheduler                             Successfully assigned default/prod-shopmates-crm-74db965897-g9xzl to gke-shop-mates-pool-3-10336c2f-q5jf
  Warning  Unhealthy  105s (x4 over 2m15s)  kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Readiness probe failed: Get http://10.48.8.21:80/wp-login.php: dial tcp 10.48.8.21:80: connect: connection refused
  Warning  Unhealthy  101s                  kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Liveness probe failed: Get http://10.48.8.21:80/wp-login.php: dial tcp 10.48.8.21:80: connect: connection refused
  Warning  Unhealthy  25s (x6 over 75s)     kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Readiness probe failed: Get http://10.48.8.21:80/wp-login.php: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  21s (x6 over 71s)     kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Liveness probe failed: Get http://10.48.8.21:80/wp-login.php: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

        livenessProbe:
          failureThreshold: 6
          httpGet:
            path: /wp-login.php
            port: 80
            scheme: HTTP
          initialDelaySeconds: 80
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 10
        readinessProbe:
          failureThreshold: 6
          httpGet:
            path: /wp-login.php
            port: 80
            scheme: HTTP
          initialDelaySeconds: 40
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 10

Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.9-gke.8", GitCommit:"a9973cbb2722793e2ea08d20880633ca61d3e669", GitTreeState:"clean", BuildDate:"2020-02-07T00:50:57Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}

yuripastushenko · 2020-04-06T15:30:56Z

/triage unresolved

rockysingh · 2020-08-20T08:35:46Z

It is worth checking if the pods CPU spikes on startup.

rajeshkumaratappviewx · 2020-11-09T17:49:05Z

Started seeing the problem when there was a very high CPU and memory usage during a performance testing. Continued testing by increasing the liveliness and readiness timeout as it was working after the change. But after started failing with the increased values. Some of the pods fails to register as a services in kubernetes even though the docker service is up properly.

Looking for any help to understand what causes this problem.

Note: Please note that mongodb also runs in the same setup, and mongodb is consuming almost all the memory. Mongodb is running as a statefulset. After few more rounds of testing even mongodb pods also went to crashloopbackoff due to liveliness failure.

hs18manish · 2024-01-06T17:42:19Z

I am also facing this issue on k8 version 1.26 , 1.27 and 1.28 .

NAME="Rocky Linux"
VERSION="8.9 (Green Obsidian)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="8.9"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Rocky Linux 8.9 (Green Obsidian)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:8:GA"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2029-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-8"
ROCKY_SUPPORT_PRODUCT_VERSION="8.9"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.9"

k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 22, 2017

k8s-ci-robot added the sig/network Categorizes an issue or PR as relevant to SIG Network. label Aug 22, 2017

k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 22, 2017

chrissound closed this as completed Aug 29, 2017

danielhelfand mentioned this issue Jul 8, 2019

Add initialDelaySeconds and Increase Readiness Probe timeoutSeconds openshift/pipelines-tutorial#9

Merged

Nittarab mentioned this issue Feb 26, 2020

Readiness probe failed: net/http: request canceled (Client.Timeout exceeded while awaiting headers) #88555

Closed

k8s-ci-robot added the triage/unresolved Indicates an issue that can not or will not be resolved. label Apr 6, 2020

yuripastushenko mentioned this issue Apr 6, 2020

Sometime Liveness/Readiness Probes fail because of net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) #89898

Closed

Readiness probe is failing, but the service is accessible when execd #51096

Readiness probe is failing, but the service is accessible when execd #51096

Comments

chrissound commented Aug 22, 2017 • edited

k8s-github-robot commented Aug 22, 2017

chrissound commented Aug 22, 2017

chrissound commented Aug 22, 2017

chrissound commented Aug 23, 2017

chrissound commented Aug 23, 2017

jiaxuanzhou commented Aug 29, 2017

chrissound commented Aug 29, 2017

saurabhdevops commented Oct 30, 2017

mgalgs commented Jan 17, 2018

Macmee commented Sep 29, 2018

tproenca commented Oct 9, 2018

JorgenG commented Nov 19, 2018 • edited

stafot commented Dec 4, 2018

nomcopter commented Feb 14, 2019

wwyhy commented Feb 18, 2019

futare commented Mar 12, 2019

blaketastic2 commented Apr 1, 2019

mbelang commented May 31, 2019 • edited

cattermo commented Sep 26, 2019

gitnik commented Nov 18, 2019

AndrewGrachov commented Dec 2, 2019

Biga-incorta commented Dec 3, 2019

rohitrd0987 commented Dec 10, 2019

ervinb commented Jan 13, 2020

muscovitebob commented Jan 16, 2020

vanhaxl commented Jan 17, 2020

vanhaxl commented Jan 17, 2020

rockysingh commented Jan 22, 2020

vanhaxl commented Jan 22, 2020

vanhaxl commented Jan 22, 2020

vatsramkesh commented Feb 11, 2020

Nittarab commented Feb 25, 2020

yuripastushenko commented Apr 6, 2020

rockysingh commented Aug 20, 2020

rajeshkumaratappviewx commented Nov 9, 2020 • edited

hs18manish commented Jan 6, 2024

chrissound commented Aug 22, 2017 •

edited

JorgenG commented Nov 19, 2018 •

edited

mbelang commented May 31, 2019 •

edited

rajeshkumaratappviewx commented Nov 9, 2020 •

edited