DNS query time out after receiving some GB of data #3924

tgruenert · 2023-02-18T21:35:08Z

i am running probably in a UDP paket lost bug. This happens reproducible but not so deterministic. Let me explain.

With a ElasticSearch cluster of 3 nodes running on Kubernetes RKE2 i tried to restore indices from an S3 store. Nodes write with moderate I/O on NVME memory. No bottleneck there.

The problem comes with an more or less huge amount of received data. It appears after around 20..40GB of restored data. Java DNS queries times out. Imho this implies a network DNS query fails 3 times. Elasticsearch can´t deal with that during restore from S3 bucket, throws an exception and all incomplete indices can´t restored after that.

My setting:

Kubernetes with 3 master / 3 worker nodes (6 different hosts)
Kubernetes provisioned with RKE2

Cluster with calico / felix and wireguard enabled:
/var/lib/rancher/rke2/server/manifests/rke2-canal-config.yaml

 apiVersion: helm.cattle.io/v1
 kind: HelmChartConfig
 metadata:
   name: rke2-canal
   namespace: kube-system
 spec:
   valuesContent: |-
 	flannel:
 	  backend: "wireguard"
 	```

checked different OS for worker nodes
- Ubuntu 22.04
  - Debian 11
CoreDNS
- 2 instances for 3 nodes
- 3 instances and one per node

I identified the container network interface of f.e. one of the elasticsearch pods. See the UDP packet receive errors.

$ ip netns exec cni-b6b393ff-7a33-356e-a41f-573fbe9c6835  netstat -s
Ip:
    Forwarding: 2
    505376488 total packets received
    0 forwarded
    0 incoming packets discarded
    505376488 incoming packets delivered
    338727934 requests sent out
    116 fragments failed
Icmp:
    502 ICMP messages received
    0 input ICMP message failed
    ICMP input histogram:
        destination unreachable: 502
    116 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 116
IcmpMsg:
        InType3: 502
        OutType3: 116
Tcp:
    18159 active connection openings
    18097 passive connection openings
    24 failed connection attempts
    0 connection resets received
    65 connections established
    505374708 segments received
    591854326 segments sent out
    49648 segments retransmitted
    9 bad segments received
    453 resets sent
Udp:
    1133 packets received
    0 packets to unknown port received
    145 packet receive errors
    1278 packets sent
    0 receive buffer errors
    0 send buffer errors
UdpLite:
TcpExt:
    556058 packets pruned from receive queue because of socket buffer overrun
    91 ICMP packets dropped because they were out-of-window
    17597 TCP sockets finished time wait in fast timer
    10 time wait sockets recycled by time stamp
    30962 packetes rejected in established connections because of timestamp
    6477 delayed acks sent
    101 delayed acks further delayed because of locked socket
    Quick ack mode was activated 7500 times
    231553418 packet headers predicted
    14372045 acknowledgments not containing data payload received
    62029979 predicted acknowledgments
    TCPSackRecovery: 8247
    Detected reordering 3062220 times using SACK
    Detected reordering 2618 times using time stamp
    18 congestion windows fully recovered without slow start
    1772 congestion windows partially recovered using Hoe heuristic
    TCPDSACKUndo: 16
    TCPLostRetransmit: 250
    TCPSackFailures: 2
    47897 fast retransmits
    1573 retransmits in slow start
    TCPTimeouts: 23
    TCPLossProbes: 163
    TCPLossProbeRecovery: 60
    TCPSackRecoveryFail: 6
    TCPBacklogCoalesce: 440830
    TCPDSACKOldSent: 7525
    TCPDSACKOfoSent: 38
    TCPDSACKRecv: 2875
    TCPDSACKOfoRecv: 5
    6 connections reset due to unexpected data
    273 connections reset due to early user close
    TCPDSACKIgnoredOld: 168
    TCPDSACKIgnoredNoUndo: 2032
    TCPSackShifted: 474657
    TCPSackMerged: 63328
    TCPSackShiftFallback: 5288255
    TCPRcvCoalesce: 216333034
    TCPOFOQueue: 100417949
    TCPOFOMerge: 37
    TCPChallengeACK: 9
    TCPSYNChallenge: 9
    TCPAutoCorking: 23805
    TCPSynRetrans: 15
    TCPOrigDataSent: 262344109
    TCPHystartTrainDetect: 113
    TCPHystartTrainCwnd: 3959
    TCPACKSkippedPAWS: 30239
    TCPACKSkippedSeq: 1239
    TCPKeepAlive: 5135
    TCPDelivered: 262363107
    TCPAckCompressed: 6469077
    TcpTimeoutRehash: 23
    TCPDSACKRecvSegs: 3179
IpExt:
    InOctets: 1782450173047
    OutOctets: 372854174329
    InNoECTPkts: 1328814496

I read a lot about UDP and buffers. But i am completely unsure what I should setup there.

$ cat /proc/net/sockstat
sockets: used 312
TCP: inuse 28 orphan 0 tw 93 alloc 384 mem 1316
UDP: inuse 1 mem 5
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

To be complete lets see elasticsearch logline:

org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: sdk_client_exception: Unable to execute HTTP request: geomap2app-prod-es7-de-odb-wohnen-backup.s3.eu-central-1.wasabisys.com\n\t\t... 36 more\n\tCaused by: java.io.IOException: geomap2app-prod-es7-de-odb-wohnen-backup.s3.eu-central-1.wasabisys.com\n\t\tat java.net.InetAddress$CachedAddresses.get(InetAddress.java:952)\n\t\tat java.net.InetAddress.getAllByName0(InetAddress.java:1663)\n\t\tat java.net.InetAddress.getAllByName(InetAddress.java:1528)\n\t\tat com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27)\n\t\tat com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38)\n\t\tat org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112)\n\t\tat org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)\n\t\tat jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)\n\t\tat java.lang.reflect.Method.invoke(Method.java:578)\n\t\tat com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)\n\t\tat com.amazonaws.http.conn.$Proxy29.connect(Unknown Source)\n\t\tat org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)\n\t\tat org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)\n\t\tat org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)\n\t\tat org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)\n\t\tat org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)\n\t\tat org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)\n\t\tat com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)\n\t\tat com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1346)\n\t\tat com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)\n\t\t... 34 more\n\tSuppressed: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: sdk_client_exception: Unable to execute HTTP request: geomap2app-prod-es7-de-odb-wohnen-backup.s3.eu-central-1.wasabisys.com\n\t\t... 36 more\n\tCaused by: java.io.IOException: geomap2app-prod-es7-de-odb-wohnen-backup.s3.eu-central-1.wasabisys.com\n\t\tat java.net.InetAddress$CachedAddresses.get(InetAddress.java:952)\n\t\tat java.net.InetAddress.getAllByName0(InetAddress.java:1663)\n\t\tat java.net.InetAddress.getAllByName(InetAddress.java:1528)\n\t\tat com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27)\n\t\tat com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38)\n\t\tat org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112)\n\t\tat org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)\n\t\tat jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)\n\t\tat java.lang.reflect.Method.invoke(Method.java:578)\n\t\tat com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)\n\t\tat com.amazonaws.http.conn.$Proxy29.connect(Unknown Source)\n\t\tat org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)\n\t\tat org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)\n\t\tat org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)\n\t\tat org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)\n\t\tat org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)\n\t\tat org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)\n\t\tat com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)\n\t\tat com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1346)\n\t\tat com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)\n\t\t... 34 more\nCaused by: java.io.IOException: geomap2app-prod-es7-de-odb-wohnen-backup.s3.eu-central-1.wasabisys.com\n\tat java.net.InetAddress$CachedAddresses.get(InetAddress.java:952)\n\tat java.net.InetAddress.getAllByName0(InetAddress.java:1663)\n\tat java.net.InetAddress.getAllByName(InetAddress.java:1528)\n\tat com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27)\n\tat com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38)\n\tat org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112)\n\tat org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)\n\tat jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)\n\tat java.lang.reflect.Method.invoke(Method.java:578)\n\tat com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)\n\tat com.amazonaws.http.conn.$Proxy29.connect(Unknown Source)\n\tat org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)\n\tat org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)\n\tat org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)\n\tat org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)\n\tat org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)\n\tat org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)\n\tat com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)\n\tat com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1346)\n\tat com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)\n\t... 34 more\n"}

Any hints on how to fix this? Or how could i investigate further?

Thank you!

The text was updated successfully, but these errors were encountered:

tgruenert · 2023-02-20T09:51:40Z

Additional i found this kubernetes issue addressed by RKE2 NodeLocal DNSCache. Implementing this does not fix this problem.

Further i inspected conntrack. Here i see high 'invalid'-counter and 'restarts'

root@n1 ~ # conntrack -S
cpu=0           found=0 invalid=1 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=3
cpu=1           found=0 invalid=1 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=3
cpu=2           found=0 invalid=2 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=6
cpu=3           found=0 invalid=137 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=8
cpu=4           found=0 invalid=140 insert=0 insert_failed=0 drop=0 early_drop=0 error=1 search_restart=9
cpu=5           found=0 invalid=125 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=12
cpu=6           found=0 invalid=139 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=199
cpu=7           found=0 invalid=4 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1
cpu=8           found=0 invalid=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=8
cpu=9           found=0 invalid=1 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=7
...

Does anybody knows what that means and if it is related too?

tgruenert · 2023-02-20T17:56:00Z

Actually i don´t know witch information is important. So i think better writing more than less.

conntrack buffer message

conntrack -E | grep UDP
WARNING: We have hit ENOBUFS! We are losing events.
This message means that the current netlink socket buffer size is too small.
Please, check --buffer-size in conntrack(8) manpage.

This message is printed sometimes. It seems it have no direct impact.

netcat

Setting:

3 ElasticSearch Container on kubernetes (with root shell for debugging)
on 3 different hosts
shell on elasticsearch master

identifing container network on host

root@n2 ~ # for n in $( ip netns list | awk '{ print $1 }' ); do echo $n; ip netns exec $n ip a s ; done | grep 8.57 -B 10
cni-b4b05fd0-e33f-acbb-9f77-bd1c882fe05d
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
      valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
      valid_lft forever preferred_lft forever
3: eth0@if37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 7e:4e:91:f3:68:df brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.42.8.57/32 scope global eth0
root@n2 ~ # ip netns exec cni-b4b05fd0-e33f-acbb-9f77-bd1c882fe05d netstat -s | grep Udp:  -A 8
Udp:
    7 packets received
    0 packets to unknown port received
    0 packet receive errors
    7 packets sent
    0 receive buffer errors
    0 send buffer errors
UdpLite:
TcpExt:

running DNS queries as fast as possible

i=0; while timeout 2s dig @8.8.8.8 google.de > /dev/null; do i=$(( i + 1 )); echo -n "$i "; done; netstat -s | grep -e "Udp:" -A 7 ; cat /proc/net/sockstat

get no errors

ip netns exec cni-b4b05fd0-e33f-acbb-9f77-bd1c882fe05d netstat -s | grep Udp:  -A 8
Udp:
    10469 packets received
    0 packets to unknown port received
    0 packet receive errors
    10469 packets sent
    0 receive buffer errors
    0 send buffer errors
UdpLite:
TcpExt:

start restore on elasticsearch (downloading some GB of data into own index)

after some minutes DNS queries running in parallel stops (see before) - they have delay > 2sec

bash: netstat
sockets: used 67
TCP: inuse 65 orphan 0 tw 12 alloc 433 mem 498
UDP: inuse 0 mem 23
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

netstat on host:

ip netns exec cni-b4b05fd0-e33f-acbb-9f77-bd1c882fe05d netstat -s | grep Udp:  -A 8
Udp:
    32325 packets received
    1 packets to unknown port received
    9 packet receive errors
    32337 packets sent
    0 receive buffer errors
    0 send buffer errors

from now on packet receive error increasing continuously

brandond · 2023-02-20T20:00:16Z

Have you considered raising the relevant buffer sizes?

cc @rbrtbnfgl @thomasferrandiz @manuelbuil

tgruenert · 2023-02-21T08:36:28Z

Thank you for your response @brandond.
Unfortunately, I do not know what the relevant buffer sizes are. Do you have any suggestions?

rbrtbnfgl · 2023-02-21T08:50:43Z

Could you try to change the UDP buffer size with

sudo sysctl -w net.core.rmem_max=33554432
sudo sysctl -w net.core.netdev_max_backlog=2000

brandond · 2023-02-21T09:32:50Z

Have you done as the error message suggested?

conntrack -E | grep UDP
WARNING: We have hit ENOBUFS! We are losing events.
This message means that the current netlink socket buffer size is too small.
Please, check --buffer-size in conntrack(8)

tgruenert · 2023-02-21T09:58:12Z

thank you for your response @rbrtbnfgl
did this change at worker nodes. the behavior did not change. assuming that no reboot should required.

rbrtbnfgl · 2023-02-21T10:14:28Z

with netstat -unlp you should see the buffer size for each process. You could check if the configured value is enough or not.

tgruenert · 2023-02-21T10:49:42Z

@rbrtbnfgl

root@n2 ~ # netstat -unlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
udp        0      0 0.0.0.0:51820           0.0.0.0:*                           -
udp        0      0 10.43.0.10:53           0.0.0.0:*                           8505/node-cache
udp        0      0 169.254.20.10:53        0.0.0.0:*                           8505/node-cache
udp6       0      0 :::51820                :::*                                -

this is like it looks like on all three nodes (host level) during restore and after first fail. I can´t see any problem here.

rbrtbnfgl · 2023-02-21T11:18:49Z

you need to run it on the right namespace where you got the errors.

tgruenert · 2023-02-21T12:01:10Z

@brandond
I´ve read about conntrack ENOBUFS a litte.
Current setting is

root@n2 ~ # cat /proc/sys/net/core/rmem_default
212992
root@n2 ~ # cat /proc/sys/net/core/rmem_max
33554432

At github I found https://github.com/weaveworks/scope/pull/2739/files/9cc6cdbd5f81e78567886d32bfe66582cb57552f#diff-1ef0abf47eda947a298ede4cfaee9676714793745038108a28a5696e8df9cc1dL291 that changes the value to 4096*1024=4194304.
On any host i did

sysctl net.core.rmem_default=4194304
sysctl net.core.rmem_max=4194304

At the next test to restore the system failed already.

Do you know other / further parameters to set?

rbrtbnfgl · 2023-02-21T13:57:28Z

so changing that value doesn't fix the issue? It seems related to the UDP socket of the application not directly to networking. With netstat -unlp you should see the bytes on the recv buffer Recv-Q and rmem_max should be of the same size. How does the application manage the UDP socket it seems like that it's processing the received data slower than the amount of data from the UDP connection that's why it's filling the buffer.

tgruenert · 2023-02-21T14:03:21Z

right. the issue exists already.
until know i did not manage netstat -unlp inside the right network namespace as you mentioned. (It´s a skill issue by myself but i am learning.)

Application restores data from a S3 bucket. So the main traffic is TCP. UDP is needed only for DNS so UDP traffic is very low. But also as I limited the restore bandwidth to half issue came up. It just took longer.

tgruenert · 2023-02-21T14:28:25Z

i am not sure about missunterstand something.

running this inside the elastic container

root@es7-debug-es-default-1:~# netstat -tulnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:9300            0.0.0.0:*               LISTEN      7/java
tcp        0      0 0.0.0.0:9200            0.0.0.0:*               LISTEN      7/java

but without -t list is empty which mean no UDP happens or is UDP so fast that i can´t see it with a onetime shell command?

tgruenert · 2023-02-21T14:49:49Z

to be complete i add the elasticsearch logfile from this operation. beginning at line 290 the error occurs.
es7-debug-es-default-2.log

As i read

"Caused by: java.io.IOException: geomap2app-prod-es7-de-odb-wohnen-backup.s3.eu-central-1.wasabisys.com",
"at java.net.InetAddress$CachedAddresses.get(InetAddress.java:801) ~[?:?]",
"at java.net.InetAddress.getAllByName0(InetAddress.java:1509) ~[?:?]",

i looked out deeper in Java DNS and set parameter -Dnetworkaddress.cache.ttl=0 -Dnetworkaddress.cache.negative.ttl=0 for the process.
Also this action does not helped out for working.

rbrtbnfgl · 2023-02-21T14:51:21Z

Which version of RKE2 are you using? Maybe it's not a buffer related issue.

tgruenert · 2023-02-21T14:53:20Z

RKE2 v1.25.5+rke2r2
containerd://1.6.14-k3s1
Kernel: 5.10.0-20-amd64
OS Image Debian GNU/Linux 11 (bullseye)

tgruenert · 2023-02-21T16:02:23Z

just an other observation:
I assumed installing a node-cache (see #3924 (comment)) should redirect dns queries from pods to this node-cache.
I did an tcpdump of one curl request but the pod asks the coredns service instead node-cache.

root@es7-debug-es-default-2:~# tcpdump port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:02:58.979529 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.40616 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 1187+ A? google.de.geomap-stage-es7-debug.svc.cluster.local. (68)
16:02:58.979562 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.40616 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 54439+ AAAA? google.de.geomap-stage-es7-debug.svc.cluster.local. (68)
16:02:58.979835 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.57190 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 65512+ PTR? 10.0.43.10.in-addr.arpa. (41)
16:02:58.980670 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.40616: 54439 NXDomain*- 0/1/0 (161)
16:02:58.981194 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.57190: 65512*- 1/0/0 PTR rke2-coredns-rke2-coredns.kube-system.svc.cluster.local. (133)
16:02:58.981326 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.40616: 1187 NXDomain*- 0/1/0 (161)
16:02:58.981402 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.33149 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 54032+ A? google.de.svc.cluster.local. (45)
16:02:58.981443 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.33149 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 15599+ AAAA? google.de.svc.cluster.local. (45)
16:02:58.982028 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.33149: 15599 NXDomain*- 0/1/0 (138)
16:02:58.982080 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.33149: 54032 NXDomain*- 0/1/0 (138)
16:02:58.982131 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.47423 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 34483+ A? google.de.cluster.local. (41)
16:02:58.982152 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.47423 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 26037+ AAAA? google.de.cluster.local. (41)
16:02:58.982830 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.47423: 26037 NXDomain*- 0/1/0 (134)
16:02:58.982875 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.47423: 34483 NXDomain*- 0/1/0 (134)
16:02:58.982924 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.59591 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 16595+ A? google.de. (27)
16:02:58.982944 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.59591 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 55516+ AAAA? google.de. (27)
16:02:58.983613 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.59591: 16595 1/0/0 A 142.250.74.195 (52)
16:02:58.983644 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.59591: 55516 1/0/0 AAAA 2a00:1450:4001:827::2003 (64)
^C
18 packets captured
18 packets received by filter
0 packets dropped by kernel

Is this correct? And if it is true - why ... ?

edit: beautify log

rbrtbnfgl · 2023-02-21T16:12:40Z

Are you doing tcpdump on the node where DNS is located or on the node where the pod is doing the curl? I don't know how the cache is implemented but I presume that it will always contact the DNS service IP and then the proxy will redirect it to the right pod.

tgruenert · 2023-02-21T16:16:42Z

I did this inside elastic pod.
And i checked kubernetes service points to coredns.

Docu writes "Please be aware that nodelocal modifies the iptables of the node to intercept DNS traffic." I am not sure what that means. From the point of view by kubernetes the node-cache pods imho will get no requests.

rbrtbnfgl · 2023-02-21T16:36:36Z

By default the IP address of the DNS service is modified to the IP address of the coreDNS pod by iptables. With this feature enabled I think it creates additional rules to redirect the traffic to the local cache.

rbrtbnfgl · 2023-02-21T17:03:22Z

you can try to use dropwatch to check where the packets are dropped in the kernel.

tgruenert · 2023-02-21T20:03:58Z

after installing dropwatch on host i got a lot of logfiles (see attachment). do you know how to interpret this?
logfile show restoring process and its fail somewhere at the end. i had manually stop dropwatch after getting failure.
dropwatch.log

BTW: thank you for your support @rbrtbnfgl

rbrtbnfgl · 2023-02-22T09:42:14Z

I thought that this could give more meaningful output. There are a lot of unrelated drops.

tgruenert · 2023-02-22T10:59:32Z

first of all - thank you @rbrtbnfgl and @brandond for looking to this issue.

to sum up until now:

dropped packages on host looks like other ones where i installed dropwatch. i crosschecked this with other systems - some looses seems to be normal
dropwatch reported no package losses on hardware, software only
conntrack reported 'invalid' and 'search_restart' packages. but actually we don´t know enough about interpreting these values
until now i can´t see a message queue which is full and loose packages
problem is limited to UDP while application uses UDP only for some DNS queries

increase buffers

sysctl net.core.rmem_default=4194304
sysctl net.core.rmem_max=4194304

don´t fix this issue

using DNS NodeLocal DNSCache don´t fix this issue
disabling Java DNS Cache don´t fix this issue

Are there any other ideas how to inspect deeper in this?

Thanks!

rbrtbnfgl · 2023-02-22T17:18:47Z

it seems that it's not related to the buffer the counter is 0

 0 receive buffer errors

If you check dmesg do you have any errors UDP related?

tgruenert · 2023-02-23T08:05:45Z

@rbrtbnfgl
I´ve checked dmesg during the elastic restore operation. There are no messages at all.

eifelmicha · 2023-02-23T19:37:16Z

I did this inside elastic pod. And i checked kubernetes service points to coredns.

Docu writes "Please be aware that nodelocal modifies the iptables of the node to intercept DNS traffic." I am not sure what that means. From the point of view by kubernetes the node-cache pods imho will get no requests.

This didn't work for me after all with Calico/Canal and verified this with watching metrics on the NodeLocalDNS Pods/ CoreDNS itself. Only thing that helped was setting the following in /etc/rancher/rke2/config.yaml:

kubelet-arg:
- cluster-dns=169.254.20.10

tgruenert · 2023-02-23T20:17:15Z

After digging deeper into the issue i assume that the problem is around UDP, not the DNS.
We used also external DNS for our tests and the problem occurs too.

We also figured out that UDP answer paket for DNS request reaches the container.

tcpdump

# tcpdump port 53 -n
21:01:50.334725 IP 10.42.9.42.34092 > 8.8.8.8.53: 38996+ [1au] A? google.de. (50)
21:01:50.340702 IP 8.8.8.8.53 > 10.42.9.42.34092: 38996 1/0/1 A 142.250.186.131 (54)

This lines came from a DNS request via dig which timeouts.

Further we watched out the netstat on failed DNS queries. See how the counter 'packet receive error' increments:

root@es7-debug-es-default-0:~# netstat -s | grep -e "Udp:" -A 5

Udp:
    463 packets received
    0 packets to unknown port received
    89 packet receive errors
    552 packets sent
    0 receive buffer errors
root@es7-debug-es-default-0:~# dig @8.8.8.8 google.de ; netstat -s | grep -e "Udp:" -A 5

; <<>> DiG 9.16.1-Ubuntu <<>> @8.8.8.8 google.de

... (dig command timed out) ...

Udp:
    465 packets received
    0 packets to unknown port received
    90 packet receive errors
    555 packets sent
    0 receive buffer errors
root@es7-debug-es-default-0:~#

Do you have any hints where to go next @rbrtbnfgl @brandond? Thanks!

rbrtbnfgl · 2023-02-24T09:38:31Z

tcpdump was done inside the container or on the node? Could you capture using -w and attach the file to check if there are any issues with the dropped packet?

tgruenert · 2023-02-24T11:45:08Z

@rbrtbnfgl
yes, tcpdump was done inside container.

i tried to make this more understandable and did a screencast of it.
https://user-images.githubusercontent.com/6465773/221170445-990ab5e4-8938-47af-a072-123366bb7688.mp4

also the pcap file is attached.
udplost.pcap.zip

you see the requests until the first complete timeout. during this some slow dns queries are shown.

tgruenert · 2023-02-24T13:18:12Z

after inspecting pcap file with wireshark (i am not familiar with that) it seems that in packet 117 .. 122 i see:

DNS queries are sent and get answer within milliseconds
this is done 3 times
after 3 times application ends with timeout.

So I want to precise my question.
Why dont get the application the answer packet if it is received by container?

rbrtbnfgl · 2023-02-24T13:49:01Z

could you try this?
ethtool -K ethX rx off tx off
ethX is the name of the interface inside the container

tgruenert · 2023-02-24T14:24:50Z

@rbrtbnfgl I tried this but unfortunately it changes nothing. The system has the same behavior as shown in video before.

rbrtbnfgl · 2023-02-24T14:55:55Z

ethtool -S eth0? It should have more counters

tgruenert · 2023-02-24T15:07:44Z

Yes it has but most of them don´t count. I´ve checked this also with high network load.

root@es7-debug-es-default-0:~# ethtool -S eth0
NIC statistics:
     peer_ifindex: 36
     rx_queue_0_xdp_packets: 0
     rx_queue_0_xdp_bytes: 0
     rx_queue_0_drops: 0
     rx_queue_0_xdp_redirect: 0
     rx_queue_0_xdp_drops: 0
     rx_queue_0_xdp_tx: 0
     rx_queue_0_xdp_tx_errors: 0
     tx_queue_0_xdp_xmit: 0
     tx_queue_0_xdp_xmit_errors: 0

tgruenert · 2023-02-27T08:23:13Z

@eifelmicha
Thank you for your point. I checked this with and without NodeLocalDNS. For me this works as exceptected. CoreDNS get more queries without NodeLocalDNS and less queries with enabled one.

eifelmicha · 2023-02-27T12:48:57Z

Did you test with or without the cluster-dns=169.254.20.10 argument? For me this is the only valid way on Calico to utilize NodeLocalDNS. Tested multiple times with fresh clusters.

tgruenert · 2023-02-27T12:53:17Z

@eifelmicha it was tested without the argument cluster-dns=169.254.20.10. I am on RKE2 1.25.

rbrtbnfgl · 2023-02-27T13:38:58Z

If you try to start a new empty pod on the failing node and try to contact the DNS does it work? It's only to check if there are issues on the networking of the node or it is specific on the failing pod.

tgruenert · 2023-02-27T14:07:16Z

Other pods working already but dont have a comparable notwork traffic. So i saw the issue only on the failing pod right now.

rbrtbnfgl · 2023-02-27T14:30:59Z

And you get the same issue with other CNIs too?

tgruenert · 2023-02-27T15:12:32Z

Well i can´t test this right now. It would need to setup a completely new cluster i think. As it is all bare metal things are not so easy at this point.

tgruenert · 2023-03-01T09:12:57Z

I installed a k0s distribution with Calico + wireguard as CNI. The problem occurs again.
Now i switch back to RKE2.
Next i will try to force DNS request via TCP instead UDP.

rbrtbnfgl · 2023-03-01T09:18:10Z

I don't know probably it could be an issue of the application itself which brings the container interface into an error state.

tgruenert · 2023-03-01T18:03:20Z

Well it is Elasticsearch - a java application. I am not able to inspect this deeper.
At the moment i try to workaround. I patched the pod using TCP instead UDP for DNS. Later on I will write the result.

tgruenert · 2023-03-02T16:00:34Z

I check my workaround a litte. DNS via TCP works as it should 🎉

Workaround for a kubernetes pod is

spec:
  dnsConfig:
    options:
      - name: use-vc

How this works:

kubelet places file /etc/resolv in every container according to cluster DNS settings
directive dnsConfig (see above) can control some details of this
typical resolver in Linux can have options like decsribed
option use-vc forces TCP oder UDP for DNS queries inside a system (container in this case)

As result my /etc/resolv.conf inside container looks like this after deploying

$ cat /etc/resolv.conf
search es7-debug.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.43.0.10
options ndots:5 use-vc

Thank you @rbrtbnfgl for supporting me!

tgruenert closed this as completed Mar 2, 2023

DNS query time out after receiving some GB of data #3924

DNS query time out after receiving some GB of data #3924

Comments

tgruenert commented Feb 18, 2023

tgruenert commented Feb 20, 2023

tgruenert commented Feb 20, 2023

conntrack buffer message

netcat

brandond commented Feb 20, 2023

tgruenert commented Feb 21, 2023

rbrtbnfgl commented Feb 21, 2023 • edited

brandond commented Feb 21, 2023

tgruenert commented Feb 21, 2023

rbrtbnfgl commented Feb 21, 2023

tgruenert commented Feb 21, 2023

rbrtbnfgl commented Feb 21, 2023

tgruenert commented Feb 21, 2023

rbrtbnfgl commented Feb 21, 2023

tgruenert commented Feb 21, 2023

tgruenert commented Feb 21, 2023

tgruenert commented Feb 21, 2023

rbrtbnfgl commented Feb 21, 2023

tgruenert commented Feb 21, 2023

tgruenert commented Feb 21, 2023 • edited

rbrtbnfgl commented Feb 21, 2023

tgruenert commented Feb 21, 2023

rbrtbnfgl commented Feb 21, 2023

rbrtbnfgl commented Feb 21, 2023

tgruenert commented Feb 21, 2023

rbrtbnfgl commented Feb 22, 2023

tgruenert commented Feb 22, 2023

rbrtbnfgl commented Feb 22, 2023

tgruenert commented Feb 23, 2023

eifelmicha commented Feb 23, 2023

tgruenert commented Feb 23, 2023

rbrtbnfgl commented Feb 24, 2023

tgruenert commented Feb 24, 2023

tgruenert commented Feb 24, 2023

rbrtbnfgl commented Feb 24, 2023

tgruenert commented Feb 24, 2023

rbrtbnfgl commented Feb 24, 2023 • edited

tgruenert commented Feb 24, 2023

tgruenert commented Feb 27, 2023

eifelmicha commented Feb 27, 2023

tgruenert commented Feb 27, 2023

rbrtbnfgl commented Feb 27, 2023

tgruenert commented Feb 27, 2023

rbrtbnfgl commented Feb 27, 2023

tgruenert commented Feb 27, 2023

tgruenert commented Mar 1, 2023

rbrtbnfgl commented Mar 1, 2023

tgruenert commented Mar 1, 2023

tgruenert commented Mar 2, 2023

rbrtbnfgl commented Feb 21, 2023 •

edited

tgruenert commented Feb 21, 2023 •

edited

rbrtbnfgl commented Feb 24, 2023 •

edited