Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coredns pods connect to coredns service timeout #4674

Closed
rootdeep opened this issue Apr 29, 2019 · 12 comments
Closed

coredns pods connect to coredns service timeout #4674

rootdeep opened this issue Apr 29, 2019 · 12 comments
Labels
kind/bug lifecycle/rotten

Comments

@rootdeep
Copy link

@rootdeep rootdeep commented Apr 29, 2019

Environment:

  • Cloud provider or hardware configuration:

  • **OS **
    Linux 3.10.0-693.el7.x86_64 x86_64
    NAME="CentOS Linux"
    VERSION="7 (Core)"
    ID="centos"
    ID_LIKE="rhel fedora"
    VERSION_ID="7"
    PRETTY_NAME="CentOS Linux 7 (Core)"
    ANSI_COLOR="0;31"
    CPE_NAME="cpe:/o:centos:centos:7"
    HOME_URL="https://www.centos.org/"
    BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

  • Version of Ansible (ansible --version):
    ansible 2.7.10

Kubespray version (commit) (git rev-parse --short HEAD):
tag 2.9.0
v1.13.5
Network plugin used:
calico

I ues kubespray v1.13.5 to deploy a k8s service,but error occurs .Bellow is the version of coredns and error log.

Version of CoreDNS : v1.4.0 (also test with coredns v1.5.0)

Corefile:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
labels:
addonmanager.kubernetes.io/mode: EnsureExists
data:
Corefile: |
.:53 {
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream /etc/resolv.conf
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}

Logs:

[root@k8s-node-1 ~]# kubectl logs -f -n kube-system coredns-9d85fb698-tnrgn
.:53
2019-04-29T12:26:42.180Z [INFO] plugin/reload: Running configuration MD5 = 1335ba7188be742fe37cd05805faa0fa
2019-04-29T12:26:42.180Z [INFO] CoreDNS-1.5.0
2019-04-29T12:26:42.180Z [INFO] linux/amd64, go1.12.2, e3f9a80
CoreDNS-1.5.0
linux/amd64, go1.12.2, e3f9a80
2019-04-29T12:26:48.181Z [ERROR] plugin/errors: 2 8373768935828175380.8715076686105595443. HINFO: read udp 10.233.113.56:51809->10.233.0.3:53: i/o timeout
2019-04-29T12:26:51.181Z [ERROR] plugin/errors: 2 8373768935828175380.8715076686105595443. HINFO: read udp 10.233.113.56:52463->10.233.0.3:53: i/o timeout
2019-04-29T12:26:52.181Z [ERROR] plugin/errors: 2 8373768935828175380.8715076686105595443. HINFO: read udp 10.233.113.56:44654->10.233.0.3:53: i/o timeout
2019-04-29T12:26:53.181Z [ERROR] plugin/errors: 2 8373768935828175380.8715076686105595443. HINFO: read udp 10.233.113.56:35028->10.233.0.3:53: i/o timeout
2019-04-29T12:26:56.181Z [ERROR] plugin/errors: 2 8373768935828175380.8715076686105595443. HINFO: read udp 10.233.113.56:44331->10.233.0.3:53: i/o timeout
2019-04-29T12:26:59.182Z [ERROR] plugin/errors: 2 8373768935828175380.8715076686105595443. HINFO: read udp 10.233.113.56:38640->10.233.0.3:53: i/o timeout
2019-04-29T12:27:02.182Z [ERROR] plugin/errors: 2 8373768935828175380.8715076686105595443. HINFO: read udp 10.233.113.56:57424->10.233.0.3:53: i/o timeout
2019-04-29T12:27:05.182Z [ERROR] plugin/errors: 2 8373768935828175380.8715076686105595443. HINFO: read udp 10.233.113.56:56166->10.233.0.3:53: i/o timeout
2019-04-29T12:27:08.182Z [ERROR] plugin/errors: 2 8373768935828175380.8715076686105595443. HINFO: read udp 10.233.113.56:59509->10.233.0.3:53: i/o timeout
2019-04-29T12:27:11.183Z [ERROR] plugin/errors: 2 8373768935828175380.8715076686105595443. HINFO: read udp 10.233.113.56:56157->10.233.0.3:53: i/o timeout

image

image

The 10.233.0.3:53 is coredns service and 10.233.113.56 coredns pods .
When I configure /etc/resolv.conf on host which the coredns pod is running on, the coredns pods is running, but keeping content of the /etc/resolv.conf is empty, the pods will not run, and the log in coredns pods tells me timeout.
From the iptables rule output, the coredns service has no endpoints.

The 'timeout’ and 'no endpoints' seems like a deadlock.

@rootdeep rootdeep added the kind/bug label Apr 29, 2019
@woopstar
Copy link
Member

@woopstar woopstar commented Apr 29, 2019

What is the content of /etc/resolv.conf on the host

@rootdeep
Copy link
Author

@rootdeep rootdeep commented Apr 29, 2019

What is the content of /etc/resolv.conf on the host

@woopstar The content is my local dns server's ip such as "nameserver 10.19.8.10", this dns server is active.

@debarshibasak
Copy link

@debarshibasak debarshibasak commented Apr 29, 2019

We are facing the same issue, when we have the 2 node replica of coredns pods. One of the coredns pods throws the same error.

@jesusmariajurado
Copy link

@jesusmariajurado jesusmariajurado commented Jul 1, 2019

I was using microk8s with this warning

WARNING: IPtables FORWARD policy is DROP. Consider enabling traffic forwarding with: sudo iptables -P FORWARD ACCEPT

fixing it worked for me

@crystaldust
Copy link

@crystaldust crystaldust commented Jul 27, 2019

I was using microk8s with this warning

WARNING: IPtables FORWARD policy is DROP. Consider enabling traffic forwarding with: sudo iptables -P FORWARD ACCEPT

fixing it worked for me

You saved my life, bro! I'm trying to setup a k8s cluster on several Raspberry Pis, facing the same problem and tried everything I can but finally it's the iptables problem, get it fixed with the forward policy set to accept.

@crystaldust
Copy link

@crystaldust crystaldust commented Jul 31, 2019

Following @jeremyrickard 's comments, another thing to notice is that Docker will set the iptable rules, making FORWARD DROP. So if you are using systemd to manage docker, remember to add the command 'iptables -w -P FORWARD ACCEPT' as a post exec start script in docker.service file.

Hope this helps.

@idcrook
Copy link
Contributor

@idcrook idcrook commented Aug 18, 2019

I was using microk8s with this warning

WARNING: IPtables FORWARD policy is DROP. Consider enabling traffic forwarding with: sudo iptables -P FORWARD ACCEPT

fixing it worked for me

the iptables command also worked to fix DNS failures seen on recent kubeadm-based cluster on Raspberry Pi 3/4 running Raspbian buster (docker Version: 19.03.1)

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Nov 16, 2019

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale label Nov 16, 2019
@nilbacardit26
Copy link

@nilbacardit26 nilbacardit26 commented Nov 19, 2019

Just for someone looking into the same issue. I am running a cluster with 4 raspberry pi 4 and I encounter this issue in each node. @jesusmariajurado was right about the solution but it was not enough for me. There were other rules that were preventing my pods to reach out external ips. Flushing all rules, deleting alll chains, and accept all work form me, as it is said in the link: https://www.digitalocean.com/community/tutorials/how-to-list-and-delete-iptables-firewall-rules

What I am observing now is this: kubernetes/kubernetes#82361 running on 1.16.2. I am gonna try to update k8s to 1.17 as it seems solved

Might be worthy to note as well (for others having problems with coredns such as connections timeouts or not being able to resolve names) that the nameserver defined in the /etc/resolv.conf must be an upstream server or else will create loopbacks: https://coredns.io/plugins/loop/#troubleshooting

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Dec 20, 2019

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten and removed lifecycle/stale labels Dec 20, 2019
@fejta-bot
Copy link

@fejta-bot fejta-bot commented Jan 19, 2020

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot commented Jan 19, 2020

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug lifecycle/rotten
Projects
None yet
Development

No branches or pull requests

9 participants