Unable to Connect to CoreDNS #3148

wangzheyuan · 2022-07-13T03:25:33Z

Hi, I have a RKE2 cluster. I disabled the firewall and selinux on every node. Pods on agent-gpu can't resolve hostnames.
Environmental Info:

[root@istio-245 ~]# kubectl get node -o wide
NAME        STATUS   ROLES                       AGE   VERSION           INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                           KERNEL-VERSION                 CONTAINER-RUNTIME
agent-gpu   Ready    <none>                      40h   v1.21.14+rke2r1   192.168.186.6   <none>        CentOS Linux 8                     4.18.0-305.12.1.el8_4.x86_64   containerd://1.4.13-k3s1
istio-245   Ready    control-plane,etcd,master   41h   v1.21.14+rke2r1   172.16.40.245   <none>        Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64      containerd://1.4.13-k3s1
istio-246   Ready    <none>                      41h   v1.21.14+rke2r1   172.16.40.246   <none>        Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64      containerd://1.4.13-k3s1
istio-247   Ready    <none>                      41h   v1.21.14+rke2r1   172.16.40.247   <none>        Rocky Linux 8.6 (Green Obsidian)   4.18.0-372.9.1.el8.x86_64      containerd://1.4.13-k3s1

[root@istio-245 ~]# kubectl -n kube-system get pods -l k8s-app=kube-dns -o wide
NAME                                         READY   STATUS    RESTARTS   AGE   IP          NODE        NOMINATED NODE   READINESS GATES
rke2-coredns-rke2-coredns-6775f768c8-9sg9b   1/1     Running   0          42h   10.42.1.4   istio-246   <none>           <none>
rke2-coredns-rke2-coredns-6775f768c8-fphvb   1/1     Running   0          42h   10.42.0.2   istio-245   <none>           <none>

Describe the bug:
According to the log and the output, pods on agent-gpu can ping pods in other nodes successfully, but can't connect to CoreDNS.

2022-07-12T02:27:36.861438Z     warn    ca      ca request failed, starting attempt 1 in 96.641121ms
2022-07-12T02:27:36.958787Z     warn    ca      ca request failed, starting attempt 2 in 206.455727ms
2022-07-12T02:27:37.166162Z     warn    ca      ca request failed, starting attempt 3 in 436.48165ms
2022-07-12T02:27:37.603792Z     warn    ca      ca request failed, starting attempt 4 in 769.681644ms
2022-07-12T02:27:38.373820Z     warn    sds     failed to warm certificate: failed to generate workload certificate: create certificate: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp: lookup istiod.istio-system.svc on 10.43.0.10:53: read udp 10.42.3.20:40374->10.43.0.10:53: i/o timeout"
2022-07-12T02:27:49.724136Z     warning envoy config    StreamAggregatedResources gRPC config stream closed: 14, connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"
2022-07-12T02:28:26.584569Z     warning envoy config    StreamAggregatedResources gRPC config stream closed: 14, connection error: desc = "transport: Error while dialing dial tcp: i/o timeout"

[root@istio-245 dns]# export DOMAIN=www.google.com; echo "=> Start DNS resolve test"; kubectl get pods -l name=dnstest --
no-headers -o custom-columns=NAME:.metadata.name,HOSTIP:.status.hostIP | while read pod host; do kubectl exec $pod -- /b
in/sh -c "nslookup $DOMAIN > /dev/null 2>&1"; RC=$?; if [ $RC -ne 0 ]; then echo $host cannot resolve $DOMAIN; fi; done;
 echo "=> End DNS resolve test"
=> Start DNS resolve test
command terminated with exit code 1
192.168.186.6 cannot resolve www.google.com
=> End DNS resolve test

[root@istio-245 ~]# kubectl get pod -o wide
NAME            READY   STATUS    RESTARTS   AGE   IP           NODE        NOMINATED NODE   READINESS GATES
dnstest-fsdfc   1/1     Running   0          23m   10.42.2.19   istio-247   <none>           <none>
dnstest-lrdww   1/1     Running   0          23m   10.42.0.3    istio-245   <none>           <none>
dnstest-vc5bk   1/1     Running   0          23m   10.42.3.21   agent-gpu   <none>           <none>
dnstest-wzj44   1/1     Running   0          23m   10.42.1.10   istio-246   <none>           <none>

[root@istio-245 dns]# kubectl exec -it dnstest-vc5bk  -- bash
bash-4.3# ping 10.42.2.19 -c 1
PING 10.42.2.19 (10.42.2.19): 56 data bytes
64 bytes from 10.42.2.19: seq=0 ttl=62 time=0.780 ms

--- 10.42.2.19 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.430/0.567/0.780 ms

#ping the CoreDNS pod.
bash-4.3# ping 10.42.1.4 -c 1
PING 10.42.1.4 (10.42.1.4): 56 data bytes
64 bytes from 10.42.1.4: seq=0 ttl=62 time=0.579 ms

--- 10.42.1.4 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.579/0.579/0.579 ms

# 172.16.40.114 is a dns server outside the cluster.
bash-4.3# nslookup www.google.com 172.16.40.114
Server:         172.16.40.114
Address:        172.16.40.114#53

Non-authoritative answer:
Name:   www.google.com
Address: 103.226.246.99

bash-4.3# nslookup www.google.com 10.43.0.10
;; connection timed out; no servers could be reached

bash-4.3# nslookup www.google.com 10.42.1.4
;; connection timed out; no servers could be reached

bash-4.3# nslookup www.google.com 10.42.0.2
;; connection timed out; no servers could be reached

The text was updated successfully, but these errors were encountered:

wangzheyuan · 2022-07-23T15:05:14Z

According to k3s-io/k3s#5013, I fixed the problem.

wangzheyuan closed this as completed Jul 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to Connect to CoreDNS #3148

Unable to Connect to CoreDNS #3148

wangzheyuan commented Jul 13, 2022 •

edited

Loading

wangzheyuan commented Jul 23, 2022

Unable to Connect to CoreDNS #3148

Unable to Connect to CoreDNS #3148

Comments

wangzheyuan commented Jul 13, 2022 • edited Loading

wangzheyuan commented Jul 23, 2022

wangzheyuan commented Jul 13, 2022 •

edited

Loading