Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Connect to CoreDNS #3148

Closed
wangzheyuan opened this issue Jul 13, 2022 · 1 comment
Closed

Unable to Connect to CoreDNS #3148

wangzheyuan opened this issue Jul 13, 2022 · 1 comment

Comments

@wangzheyuan
Copy link

wangzheyuan commented Jul 13, 2022

Hi, I have a RKE2 cluster. I disabled the firewall and selinux on every node. Pods on agent-gpu can't resolve hostnames.
Environmental Info:

[root@istio-245 ~]# kubectl get node -o wide
NAME        STATUS   ROLES                       AGE   VERSION           INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                           KERNEL-VERSION                 CONTAINER-RUNTIME
agent-gpu   Ready    <none>                      40h   v1.21.14+rke2r1   192.168.186.6   <none>        CentOS Linux 8                     4.18.0-305.12.1.el8_4.x86_64   containerd://1.4.13-k3s1
istio-245   Ready    control-plane,etcd,master   41h   v1.21.14+rke2r1   172.16.40.245   <none>        Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64      containerd://1.4.13-k3s1
istio-246   Ready    <none>                      41h   v1.21.14+rke2r1   172.16.40.246   <none>        Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64      containerd://1.4.13-k3s1
istio-247   Ready    <none>                      41h   v1.21.14+rke2r1   172.16.40.247   <none>        Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64      containerd://1.4.13-k3s1

[root@istio-245 ~]# kubectl -n kube-system get pods -l k8s-app=kube-dns -o wide
NAME                                         READY   STATUS    RESTARTS   AGE   IP          NODE        NOMINATED NODE   READINESS GATES
rke2-coredns-rke2-coredns-6775f768c8-9sg9b   1/1     Running   0          42h   10.42.1.4   istio-246   <none>           <none>
rke2-coredns-rke2-coredns-6775f768c8-fphvb   1/1     Running   0          42h   10.42.0.2   istio-245   <none>           <none>

Describe the bug:
According to the log and the output, pods on agent-gpu can ping pods in other nodes successfully, but can't connect to CoreDNS.

2022-07-12T02:27:36.861438Z     warn    ca      ca request failed, starting attempt 1 in 96.641121ms
2022-07-12T02:27:36.958787Z     warn    ca      ca request failed, starting attempt 2 in 206.455727ms
2022-07-12T02:27:37.166162Z     warn    ca      ca request failed, starting attempt 3 in 436.48165ms
2022-07-12T02:27:37.603792Z     warn    ca      ca request failed, starting attempt 4 in 769.681644ms
2022-07-12T02:27:38.373820Z     warn    sds     failed to warm certificate: failed to generate workload certificate: create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 10.43.0.10:53: read udp 10.42.3.20:40374->10.43.0.10:53: i/o timeout"
2022-07-12T02:27:49.724136Z     warning envoy config    StreamAggregatedResources gRPC config stream closed: 14, connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
2022-07-12T02:28:26.584569Z     warning envoy config    StreamAggregatedResources gRPC config stream closed: 14, connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
[root@istio-245 dns]# export DOMAIN=www.google.com; echo "=> Start DNS resolve test"; kubectl get pods -l name=dnstest --
no-headers -o custom-columns=NAME:.metadata.name,HOSTIP:.status.hostIP | while read pod host; do kubectl exec $pod -- /b
in/sh -c "nslookup $DOMAIN > /dev/null 2>&1"; RC=$?; if [ $RC -ne 0 ]; then echo $host cannot resolve $DOMAIN; fi; done;
 echo "=> End DNS resolve test"
=> Start DNS resolve test
command terminated with exit code 1
192.168.186.6 cannot resolve www.google.com
=> End DNS resolve test

[root@istio-245 ~]# kubectl get pod -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP           NODE        NOMINATED NODE   READINESS GATES
dnstest-fsdfc   1/1     Running   0          23m   10.42.2.19   istio-247   <none>           <none>
dnstest-lrdww   1/1     Running   0          23m   10.42.0.3    istio-245   <none>           <none>
dnstest-vc5bk   1/1     Running   0          23m   10.42.3.21   agent-gpu   <none>           <none>
dnstest-wzj44   1/1     Running   0          23m   10.42.1.10   istio-246   <none>           <none>

[root@istio-245 dns]# kubectl exec -it dnstest-vc5bk  -- bash
bash-4.3# ping 10.42.2.19 -c 1
PING 10.42.2.19 (10.42.2.19): 56 data bytes
64 bytes from 10.42.2.19: seq=0 ttl=62 time=0.780 ms

--- 10.42.2.19 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.430/0.567/0.780 ms

#ping the CoreDNS pod.
bash-4.3# ping 10.42.1.4 -c 1
PING 10.42.1.4 (10.42.1.4): 56 data bytes
64 bytes from 10.42.1.4: seq=0 ttl=62 time=0.579 ms

--- 10.42.1.4 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.579/0.579/0.579 ms

# 172.16.40.114 is a dns server outside the cluster.
bash-4.3# nslookup www.google.com 172.16.40.114
Server:         172.16.40.114
Address:        172.16.40.114#53

Non-authoritative answer:
Name:   www.google.com
Address: 103.226.246.99

bash-4.3# nslookup www.google.com 10.43.0.10
;; connection timed out; no servers could be reached

bash-4.3# nslookup www.google.com 10.42.1.4
;; connection timed out; no servers could be reached

bash-4.3# nslookup www.google.com 10.42.0.2
;; connection timed out; no servers could be reached
@wangzheyuan
Copy link
Author

According to k3s-io/k3s#5013, I fixed the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant