Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readiness probe failed: net/http: request canceled (Client.Timeout exceeded while awaiting headers) #88555

Closed
Nittarab opened this issue Feb 26, 2020 · 10 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/unresolved Indicates an issue that can not or will not be resolved.

Comments

@Nittarab
Copy link

Nittarab commented Feb 26, 2020

What happened:

Readiness probe failed: Get http://10.48.8.31:80/wp-login.php: net/http: request cancelled (Client.Timeout exceeded while awaiting headers)

My pod configuration:

        ports:
        - containerPort: 80
          name: http
          protocol: TCP
        - containerPort: 433
          name: https
          protocol: TCP
        livenessProbe:
          failureThreshold: 6
          httpGet:
            path: /wp-login.php
            port: http
            scheme: HTTP
          initialDelaySeconds: 120
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 10
        readinessProbe:
          failureThreshold: 6
          httpGet:
            path: /wp-login.php
            port: http
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 30

It is an helm cart based on stable/wordpress
If I run kubectl destribe pod [pod-name]

Events:
  Type     Reason                  Age                   From                                          Message
  ----     ------                  ----                  ----                                          -------
  Normal   Scheduled               23m                   default-scheduler                             Successfully assigned default/prod-shopmates-crm-bb5c5d8b7-f9rjs to gke-shop-mates-pool-3-10336c2f-q5jf
  Normal   SuccessfulAttachVolume  23m                   attachdetach-controller                       AttachVolume.Attach succeeded for volume "pvc-7475aa53-299b-11e9-b850-42010a9c0105"
  Warning  Unhealthy               18m (x2 over 19m)     kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Liveness probe failed: Get http://10.48.8.33:80/wp-login.php: EOF
  Normal   Killing                 17m                   kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Container prod-shopmates-crm failed liveness probe, will be restarted
  Warning  Unhealthy               17m (x2 over 18m)     kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Readiness probe failed: Get http://10.48.8.33:80/wp-login.php: EOF
  Normal   Pulled                  17m (x2 over 23m)     kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Container image "docker.io/pixdev/shopmates-crm:v1.0.1-wp4.9.8-9" already present on machine
  Normal   Created                 17m (x2 over 23m)     kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Created container prod-shopmates-crm
  Normal   Started                 17m (x2 over 23m)     kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Started container prod-shopmates-crm
  Warning  Unhealthy               13m (x7 over 20m)     kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Liveness probe failed: Get http://10.48.8.33:80/wp-login.php: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy               7m48s (x12 over 20m)  kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Readiness probe failed: Get http://10.48.8.33:80/wp-login.php: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy               3m11s (x11 over 22m)  kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Readiness probe failed: Get http://10.48.8.33:80/wp-login.php: dial tcp 10.48.8.33:80: connect: connection refused

This issue suggest to increse the timeoutSeconds, I tried to increase it to 40 without sucess.

This is a good way to debug the error:
https://stackoverflow.com/questions/53067308/readiness-probe-failed-get-http-10-32-1-7180-setting-s-net-http-request-ca

I tried to run this command: kubectl exec -t [another_pod] -- curl -I [pod's cluster IP]
in my case: k exec -t barsanti-tv-shopmates-67f7958444-84hq7 -- curl -I http://10.48.8.31:80/wp-login.php

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:28 --:--:--     0
HTTP/1.1 200 OK
Date: Tue, 25 Feb 2020 20:49:51 GMT
Server: Apache/2.4.37 (Unix) OpenSSL/1.1.0j PHP/7.1.24
X-Powered-By: PHP/7.1.24
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Set-Cookie: wordpress_test_cookie=WP+Cookie+check; path=/
X-Frame-Options: SAMEORIGIN
Content-Type: text/html; charset=UTF-8

Also in the Pod logs I can see the requests and the 200 responses:

10.48.8.1 - - [25/Feb/2020:20:49:44 +0000] "GET /wp-login.php HTTP/1.1" 200 1341
10.156.0.16 - - [25/Feb/2020:20:49:52 +0000] "GET /wp-login.php HTTP/1.1" 200 3206
10.48.8.14 - - [25/Feb/2020:20:49:51 +0000] "HEAD /wp-login.php HTTP/1.1" 200 -
10.156.0.20 - - [25/Feb/2020:20:49:52 +0000] "GET /wp-login.php HTTP/1.1" 200 3206
10.48.8.1 - - [25/Feb/2020:20:49:54 +0000] "GET /wp-login.php HTTP/1.1" 200 1341
10.156.0.12 - - [25/Feb/2020:20:49:51 +0000] "GET /wp-login.php HTTP/1.1" 200 3206
10.156.0.14 - - [25/Feb/2020:20:49:52 +0000] "GET /wp-login.php HTTP/1.1" 200 3206
10.156.0.6 - - [25/Feb/2020:20:49:59 +0000] "GET /wp-login.php HTTP/1.1" 200 3195
10.156.0.5 - - [25/Feb/2020:20:49:58 +0000] "GET /wp-login.php HTTP/1.1" 200 3195
10.156.0.19 - - [25/Feb/2020:20:49:59 +0000] "GET /wp-login.php HTTP/1.1" 200 3206
10.156.0.12 - - [25/Feb/2020:20:49:59 +0000] "GET /wp-login.php HTTP/1.1" 200 3206
10.48.8.1 - - [25/Feb/2020:20:51:02 +0000] "GET /wp-login.php HTTP/1.1" 200 1341
10.156.0.12 - - [25/Feb/2020:20:51:01 +0000] "GET /wp-login.php HTTP/1.1" 200 3206
10.156.0.15 - - [25/Feb/2020:20:52:01 +0000] "GET /wp-login.php HTTP/1.1" 200 3206

What you expected to happen:
Readiness should not fail...

How to reproduce it (as minimally and precisely as possible):
I really don't know how to reproduce the error...

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:14:22Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.9-gke.8", GitCommit:"a9973cbb2722793e2ea08d20880633ca61d3e669", GitTreeState:"clean", BuildDate:"2020-02-07T00:50:57Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider or hardware configuration: GKE
  • Network plugin and version (if this is a network-related bug): no

/sig network

@Nittarab Nittarab added the kind/bug Categorizes issue or PR as related to a bug. label Feb 26, 2020
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. sig/network Categorizes an issue or PR as relevant to SIG Network. labels Feb 26, 2020
@k8s-ci-robot
Copy link
Contributor

@Nittarab: The label(s) sig/ cannot be applied, because the repository doesn't have them

In response to this:

What happened:

Readiness probe failed: Get http://10.48.8.31:80/wp-login.php: net/http: request cancelled (Client.Timeout exceeded while awaiting headers)

My pod configuration:

       ports:
       - containerPort: 80
         name: http
         protocol: TCP
       - containerPort: 433
         name: https
         protocol: TCP
       livenessProbe:
         failureThreshold: 6
         httpGet:
           path: /wp-login.php
           port: http
           scheme: HTTP
         initialDelaySeconds: 120
         periodSeconds: 30
         successThreshold: 1
         timeoutSeconds: 10
       readinessProbe:
         failureThreshold: 6
         httpGet:
           path: /wp-login.php
           port: http
           scheme: HTTP
         initialDelaySeconds: 60
         periodSeconds: 10
         successThreshold: 1
         timeoutSeconds: 30

It is an helm cart based on stable/wordpress
If I run kubectl destribe pod [pod-name]

Events:
 Type     Reason                  Age                   From                                          Message
 ----     ------                  ----                  ----                                          -------
 Normal   Scheduled               23m                   default-scheduler                             Successfully assigned default/prod-shopmates-crm-bb5c5d8b7-f9rjs to gke-shop-mates-pool-3-10336c2f-q5jf
 Normal   SuccessfulAttachVolume  23m                   attachdetach-controller                       AttachVolume.Attach succeeded for volume "pvc-7475aa53-299b-11e9-b850-42010a9c0105"
 Warning  Unhealthy               18m (x2 over 19m)     kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Liveness probe failed: Get http://10.48.8.33:80/wp-login.php: EOF
 Normal   Killing                 17m                   kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Container prod-shopmates-crm failed liveness probe, will be restarted
 Warning  Unhealthy               17m (x2 over 18m)     kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Readiness probe failed: Get http://10.48.8.33:80/wp-login.php: EOF
 Normal   Pulled                  17m (x2 over 23m)     kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Container image "docker.io/pixdev/shopmates-crm:v1.0.1-wp4.9.8-9" already present on machine
 Normal   Created                 17m (x2 over 23m)     kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Created container prod-shopmates-crm
 Normal   Started                 17m (x2 over 23m)     kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Started container prod-shopmates-crm
 Warning  Unhealthy               13m (x7 over 20m)     kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Liveness probe failed: Get http://10.48.8.33:80/wp-login.php: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
 Warning  Unhealthy               7m48s (x12 over 20m)  kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Readiness probe failed: Get http://10.48.8.33:80/wp-login.php: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
 Warning  Unhealthy               3m11s (x11 over 22m)  kubelet, gke-shop-mates-pool-3-10336c2f-q5jf  Readiness probe failed: Get http://10.48.8.33:80/wp-login.php: dial tcp 10.48.8.33:80: connect: connection refused

This issue suggest to increse the timeoutSeconds, I tried to increase it to 40 without sucess.

This is a good way to debug the error:
https://stackoverflow.com/questions/53067308/readiness-probe-failed-get-http-10-32-1-7180-setting-s-net-http-request-ca

I tried to run this command: kubectl exec -t [another_pod] -- curl -I [pod's cluster IP]
in my case: k exec -t barsanti-tv-shopmates-67f7958444-84hq7 -- curl -I http://10.48.8.31:80/wp-login.php

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
 0     0    0     0    0     0      0      0 --:--:--  0:00:28 --:--:--     0
HTTP/1.1 200 OK
Date: Tue, 25 Feb 2020 20:49:51 GMT
Server: Apache/2.4.37 (Unix) OpenSSL/1.1.0j PHP/7.1.24
X-Powered-By: PHP/7.1.24
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Set-Cookie: wordpress_test_cookie=WP+Cookie+check; path=/
X-Frame-Options: SAMEORIGIN
Content-Type: text/html; charset=UTF-8

Also in the Pod logs I can see the requests and the 200 responses:

10.48.8.1 - - [25/Feb/2020:20:49:44 +0000] "GET /wp-login.php HTTP/1.1" 200 1341
10.156.0.16 - - [25/Feb/2020:20:49:52 +0000] "GET /wp-login.php HTTP/1.1" 200 3206
10.48.8.14 - - [25/Feb/2020:20:49:51 +0000] "HEAD /wp-login.php HTTP/1.1" 200 -
10.156.0.20 - - [25/Feb/2020:20:49:52 +0000] "GET /wp-login.php HTTP/1.1" 200 3206
10.48.8.1 - - [25/Feb/2020:20:49:54 +0000] "GET /wp-login.php HTTP/1.1" 200 1341
10.156.0.12 - - [25/Feb/2020:20:49:51 +0000] "GET /wp-login.php HTTP/1.1" 200 3206
10.156.0.14 - - [25/Feb/2020:20:49:52 +0000] "GET /wp-login.php HTTP/1.1" 200 3206
10.156.0.6 - - [25/Feb/2020:20:49:59 +0000] "GET /wp-login.php HTTP/1.1" 200 3195
10.156.0.5 - - [25/Feb/2020:20:49:58 +0000] "GET /wp-login.php HTTP/1.1" 200 3195
10.156.0.19 - - [25/Feb/2020:20:49:59 +0000] "GET /wp-login.php HTTP/1.1" 200 3206
10.156.0.12 - - [25/Feb/2020:20:49:59 +0000] "GET /wp-login.php HTTP/1.1" 200 3206
10.48.8.1 - - [25/Feb/2020:20:51:02 +0000] "GET /wp-login.php HTTP/1.1" 200 1341
10.156.0.12 - - [25/Feb/2020:20:51:01 +0000] "GET /wp-login.php HTTP/1.1" 200 3206
10.156.0.15 - - [25/Feb/2020:20:52:01 +0000] "GET /wp-login.php HTTP/1.1" 200 3206

What you expected to happen:
Readiness should not fail...

How to reproduce it (as minimally and precisely as possible):
I really don't know how to reproduce the error...

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:14:22Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.9-gke.8", GitCommit:"a9973cbb2722793e2ea08d20880633ca61d3e669", GitTreeState:"clean", BuildDate:"2020-02-07T00:50:57Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider or hardware configuration: GKE
  • Network plugin and version (if this is a network-related bug): no

/sig network

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Feb 26, 2020
@athenabot
Copy link

/triage unresolved

Comment /remove-triage unresolved when the issue is assessed and confirmed.

🤖 I am a bot run by vllry. 👩‍🔬

@k8s-ci-robot k8s-ci-robot added the triage/unresolved Indicates an issue that can not or will not be resolved. label Feb 26, 2020
@Adithya-copart
Copy link

@Nittarab I may have ran into this issue as well.
Is this happening frequently on your end?

Thanks!

@Nittarab
Copy link
Author

Nittarab commented Feb 26, 2020

@Adithya-copart At the moment I am unable to start one of the most important pod in my system. It happens all the time...

@Nittarab
Copy link
Author

I tried to use a simple docker container: https://hub.docker.com/r/jmalloc/echo-server/ and the readiness Probe do not fail anymore. So is not a Kubernetes bug...

@funding-invoice-admin
Copy link

funding-invoice-admin commented Jun 28, 2020

sorry can you tell me how did you manage the error at the end ?

@grvcurefit
Copy link

@Nittarab Were you able to resolve it finally?

@Nittarab
Copy link
Author

Hi, sorry for my late replay. In the end, it was a resource issue. The container was a WordPress application and during initialization, the requested resource was above the resource limit. As a result, it did not respond in the desired time and returned "Client.Timeout".
It was not easy to analyze the problem, because the metrics on Kubernetes are sometimes not so easy to read, also the resources were only needed for the initialization of the container, once wordpress was active it used a third of the resources.

@grvcurefit
Copy link

@Nittarab
I'm also facing a similar issue, but in my case, a call to mongodb times out.
And rolling over the pods fixes it 100% of the times.

@sun-mir
Copy link

sun-mir commented Nov 24, 2021

Here's the relevant issue that has a workaround for this problem, plus some deep-dive details about the root cause.
#89898 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/unresolved Indicates an issue that can not or will not be resolved.
Projects
None yet
Development

No branches or pull requests

7 participants