Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS query time out after receiving some GB of data #3924

Closed
tgruenert opened this issue Feb 18, 2023 · 47 comments
Closed

DNS query time out after receiving some GB of data #3924

tgruenert opened this issue Feb 18, 2023 · 47 comments

Comments

@tgruenert
Copy link

i am running probably in a UDP paket lost bug. This happens reproducible but not so deterministic. Let me explain.

With a ElasticSearch cluster of 3 nodes running on Kubernetes RKE2 i tried to restore indices from an S3 store. Nodes write with moderate I/O on NVME memory. No bottleneck there.

The problem comes with an more or less huge amount of received data. It appears after around 20..40GB of restored data. Java DNS queries times out. Imho this implies a network DNS query fails 3 times. Elasticsearch can´t deal with that during restore from S3 bucket, throws an exception and all incomplete indices can´t restored after that.

My setting:

  • Kubernetes with 3 master / 3 worker nodes (6 different hosts)
  • Kubernetes provisioned with RKE2
  • Cluster with calico / felix and wireguard enabled:
    /var/lib/rancher/rke2/server/manifests/rke2-canal-config.yaml
     apiVersion: helm.cattle.io/v1
     kind: HelmChartConfig
     metadata:
       name: rke2-canal
       namespace: kube-system
     spec:
       valuesContent: |-
     	flannel:
     	  backend: "wireguard"
     	```
    
  • checked different OS for worker nodes
    • Ubuntu 22.04
      • Debian 11
  • CoreDNS
    • 2 instances for 3 nodes
    • 3 instances and one per node

I identified the container network interface of f.e. one of the elasticsearch pods. See the UDP packet receive errors.

$ ip netns exec cni-b6b393ff-7a33-356e-a41f-573fbe9c6835  netstat -s
Ip:
    Forwarding: 2
    505376488 total packets received
    0 forwarded
    0 incoming packets discarded
    505376488 incoming packets delivered
    338727934 requests sent out
    116 fragments failed
Icmp:
    502 ICMP messages received
    0 input ICMP message failed
    ICMP input histogram:
        destination unreachable: 502
    116 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
        destination unreachable: 116
IcmpMsg:
        InType3: 502
        OutType3: 116
Tcp:
    18159 active connection openings
    18097 passive connection openings
    24 failed connection attempts
    0 connection resets received
    65 connections established
    505374708 segments received
    591854326 segments sent out
    49648 segments retransmitted
    9 bad segments received
    453 resets sent
Udp:
    1133 packets received
    0 packets to unknown port received
    145 packet receive errors
    1278 packets sent
    0 receive buffer errors
    0 send buffer errors
UdpLite:
TcpExt:
    556058 packets pruned from receive queue because of socket buffer overrun
    91 ICMP packets dropped because they were out-of-window
    17597 TCP sockets finished time wait in fast timer
    10 time wait sockets recycled by time stamp
    30962 packetes rejected in established connections because of timestamp
    6477 delayed acks sent
    101 delayed acks further delayed because of locked socket
    Quick ack mode was activated 7500 times
    231553418 packet headers predicted
    14372045 acknowledgments not containing data payload received
    62029979 predicted acknowledgments
    TCPSackRecovery: 8247
    Detected reordering 3062220 times using SACK
    Detected reordering 2618 times using time stamp
    18 congestion windows fully recovered without slow start
    1772 congestion windows partially recovered using Hoe heuristic
    TCPDSACKUndo: 16
    TCPLostRetransmit: 250
    TCPSackFailures: 2
    47897 fast retransmits
    1573 retransmits in slow start
    TCPTimeouts: 23
    TCPLossProbes: 163
    TCPLossProbeRecovery: 60
    TCPSackRecoveryFail: 6
    TCPBacklogCoalesce: 440830
    TCPDSACKOldSent: 7525
    TCPDSACKOfoSent: 38
    TCPDSACKRecv: 2875
    TCPDSACKOfoRecv: 5
    6 connections reset due to unexpected data
    273 connections reset due to early user close
    TCPDSACKIgnoredOld: 168
    TCPDSACKIgnoredNoUndo: 2032
    TCPSackShifted: 474657
    TCPSackMerged: 63328
    TCPSackShiftFallback: 5288255
    TCPRcvCoalesce: 216333034
    TCPOFOQueue: 100417949
    TCPOFOMerge: 37
    TCPChallengeACK: 9
    TCPSYNChallenge: 9
    TCPAutoCorking: 23805
    TCPSynRetrans: 15
    TCPOrigDataSent: 262344109
    TCPHystartTrainDetect: 113
    TCPHystartTrainCwnd: 3959
    TCPACKSkippedPAWS: 30239
    TCPACKSkippedSeq: 1239
    TCPKeepAlive: 5135
    TCPDelivered: 262363107
    TCPAckCompressed: 6469077
    TcpTimeoutRehash: 23
    TCPDSACKRecvSegs: 3179
IpExt:
    InOctets: 1782450173047
    OutOctets: 372854174329
    InNoECTPkts: 1328814496

I read a lot about UDP and buffers. But i am completely unsure what I should setup there.

$ cat /proc/net/sockstat
sockets: used 312
TCP: inuse 28 orphan 0 tw 93 alloc 384 mem 1316
UDP: inuse 1 mem 5
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

To be complete lets see elasticsearch logline:

org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: sdk_client_exception: Unable to execute HTTP request: geomap2app-prod-es7-de-odb-wohnen-backup.s3.eu-central-1.wasabisys.com\n\t\t... 36 more\n\tCaused by: java.io.IOException: geomap2app-prod-es7-de-odb-wohnen-backup.s3.eu-central-1.wasabisys.com\n\t\tat java.net.InetAddress$CachedAddresses.get(InetAddress.java:952)\n\t\tat java.net.InetAddress.getAllByName0(InetAddress.java:1663)\n\t\tat java.net.InetAddress.getAllByName(InetAddress.java:1528)\n\t\tat com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27)\n\t\tat com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38)\n\t\tat org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112)\n\t\tat org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)\n\t\tat jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)\n\t\tat java.lang.reflect.Method.invoke(Method.java:578)\n\t\tat com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)\n\t\tat com.amazonaws.http.conn.$Proxy29.connect(Unknown Source)\n\t\tat org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)\n\t\tat org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)\n\t\tat org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)\n\t\tat org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)\n\t\tat org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)\n\t\tat org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)\n\t\tat com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)\n\t\tat com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1346)\n\t\tat com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)\n\t\t... 34 more\n\tSuppressed: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: sdk_client_exception: Unable to execute HTTP request: geomap2app-prod-es7-de-odb-wohnen-backup.s3.eu-central-1.wasabisys.com\n\t\t... 36 more\n\tCaused by: java.io.IOException: geomap2app-prod-es7-de-odb-wohnen-backup.s3.eu-central-1.wasabisys.com\n\t\tat java.net.InetAddress$CachedAddresses.get(InetAddress.java:952)\n\t\tat java.net.InetAddress.getAllByName0(InetAddress.java:1663)\n\t\tat java.net.InetAddress.getAllByName(InetAddress.java:1528)\n\t\tat com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27)\n\t\tat com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38)\n\t\tat org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112)\n\t\tat org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)\n\t\tat jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)\n\t\tat java.lang.reflect.Method.invoke(Method.java:578)\n\t\tat com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)\n\t\tat com.amazonaws.http.conn.$Proxy29.connect(Unknown Source)\n\t\tat org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)\n\t\tat org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)\n\t\tat org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)\n\t\tat org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)\n\t\tat org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)\n\t\tat org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)\n\t\tat com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)\n\t\tat com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1346)\n\t\tat com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)\n\t\t... 34 more\nCaused by: java.io.IOException: geomap2app-prod-es7-de-odb-wohnen-backup.s3.eu-central-1.wasabisys.com\n\tat java.net.InetAddress$CachedAddresses.get(InetAddress.java:952)\n\tat java.net.InetAddress.getAllByName0(InetAddress.java:1663)\n\tat java.net.InetAddress.getAllByName(InetAddress.java:1528)\n\tat com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27)\n\tat com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38)\n\tat org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112)\n\tat org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)\n\tat jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)\n\tat java.lang.reflect.Method.invoke(Method.java:578)\n\tat com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)\n\tat com.amazonaws.http.conn.$Proxy29.connect(Unknown Source)\n\tat org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)\n\tat org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)\n\tat org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)\n\tat org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)\n\tat org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)\n\tat org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)\n\tat com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)\n\tat com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1346)\n\tat com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)\n\t... 34 more\n"}

Any hints on how to fix this? Or how could i investigate further?

Thank you!

@tgruenert
Copy link
Author

Additional i found this kubernetes issue addressed by RKE2 NodeLocal DNSCache. Implementing this does not fix this problem.

Further i inspected conntrack. Here i see high 'invalid'-counter and 'restarts'

root@n1 ~ # conntrack -S
cpu=0           found=0 invalid=1 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=3
cpu=1           found=0 invalid=1 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=3
cpu=2           found=0 invalid=2 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=6
cpu=3           found=0 invalid=137 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=8
cpu=4           found=0 invalid=140 insert=0 insert_failed=0 drop=0 early_drop=0 error=1 search_restart=9
cpu=5           found=0 invalid=125 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=12
cpu=6           found=0 invalid=139 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=199
cpu=7           found=0 invalid=4 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=1
cpu=8           found=0 invalid=0 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=8
cpu=9           found=0 invalid=1 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=7
...

Does anybody knows what that means and if it is related too?

@tgruenert
Copy link
Author

Actually i don´t know witch information is important. So i think better writing more than less.

conntrack buffer message

conntrack -E | grep UDP
WARNING: We have hit ENOBUFS! We are losing events.
This message means that the current netlink socket buffer size is too small.
Please, check --buffer-size in conntrack(8) manpage.

This message is printed sometimes. It seems it have no direct impact.

netcat

Setting:

  • 3 ElasticSearch Container on kubernetes (with root shell for debugging)
  • on 3 different hosts
  • shell on elasticsearch master
  • identifing container network on host
    root@n2 ~ # for n in $( ip netns list | awk '{ print $1 }' ); do echo $n; ip netns exec $n ip a s ; done | grep 8.57 -B 10
    cni-b4b05fd0-e33f-acbb-9f77-bd1c882fe05d
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
          valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
          valid_lft forever preferred_lft forever
    3: eth0@if37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
        link/ether 7e:4e:91:f3:68:df brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet 10.42.8.57/32 scope global eth0
    root@n2 ~ # ip netns exec cni-b4b05fd0-e33f-acbb-9f77-bd1c882fe05d netstat -s | grep Udp:  -A 8
    Udp:
        7 packets received
        0 packets to unknown port received
        0 packet receive errors
        7 packets sent
        0 receive buffer errors
        0 send buffer errors
    UdpLite:
    TcpExt:
    
  • running DNS queries as fast as possible
    i=0; while timeout 2s dig @8.8.8.8 google.de > /dev/null; do i=$(( i + 1 )); echo -n "$i "; done; netstat -s | grep -e "Udp:" -A 7 ; cat /proc/net/sockstat
    
    get no errors
    ip netns exec cni-b4b05fd0-e33f-acbb-9f77-bd1c882fe05d netstat -s | grep Udp:  -A 8
    Udp:
        10469 packets received
        0 packets to unknown port received
        0 packet receive errors
        10469 packets sent
        0 receive buffer errors
        0 send buffer errors
    UdpLite:
    TcpExt:
    
  • start restore on elasticsearch (downloading some GB of data into own index)
    • after some minutes DNS queries running in parallel stops (see before) - they have delay > 2sec
      bash: netstat
      sockets: used 67
      TCP: inuse 65 orphan 0 tw 12 alloc 433 mem 498
      UDP: inuse 0 mem 23
      UDPLITE: inuse 0
      RAW: inuse 0
      FRAG: inuse 0 memory 0
      
    • netstat on host:
      ip netns exec cni-b4b05fd0-e33f-acbb-9f77-bd1c882fe05d netstat -s | grep Udp:  -A 8
      Udp:
          32325 packets received
          1 packets to unknown port received
          9 packet receive errors
          32337 packets sent
          0 receive buffer errors
          0 send buffer errors
      
    • from now on packet receive error increasing continuously

@brandond
Copy link
Contributor

Have you considered raising the relevant buffer sizes?

cc @rbrtbnfgl @thomasferrandiz @manuelbuil

@tgruenert
Copy link
Author

Thank you for your response @brandond.
Unfortunately, I do not know what the relevant buffer sizes are. Do you have any suggestions?

@rbrtbnfgl
Copy link
Contributor

rbrtbnfgl commented Feb 21, 2023

Could you try to change the UDP buffer size with

sudo sysctl -w net.core.rmem_max=33554432
sudo sysctl -w net.core.netdev_max_backlog=2000

@brandond
Copy link
Contributor

Have you done as the error message suggested?

conntrack -E | grep UDP
WARNING: We have hit ENOBUFS! We are losing events.
This message means that the current netlink socket buffer size is too small.
Please, check --buffer-size in conntrack(8)

@tgruenert
Copy link
Author

thank you for your response @rbrtbnfgl
did this change at worker nodes. the behavior did not change. assuming that no reboot should required.

@rbrtbnfgl
Copy link
Contributor

with netstat -unlp you should see the buffer size for each process. You could check if the configured value is enough or not.

@tgruenert
Copy link
Author

@rbrtbnfgl

root@n2 ~ # netstat -unlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
udp        0      0 0.0.0.0:51820           0.0.0.0:*                           -
udp        0      0 10.43.0.10:53           0.0.0.0:*                           8505/node-cache
udp        0      0 169.254.20.10:53        0.0.0.0:*                           8505/node-cache
udp6       0      0 :::51820                :::*                                -

this is like it looks like on all three nodes (host level) during restore and after first fail. I can´t see any problem here.

@rbrtbnfgl
Copy link
Contributor

you need to run it on the right namespace where you got the errors.

@tgruenert
Copy link
Author

@brandond
I´ve read about conntrack ENOBUFS a litte.
Current setting is

root@n2 ~ # cat /proc/sys/net/core/rmem_default
212992
root@n2 ~ # cat /proc/sys/net/core/rmem_max
33554432

At github I found https://github.com/weaveworks/scope/pull/2739/files/9cc6cdbd5f81e78567886d32bfe66582cb57552f#diff-1ef0abf47eda947a298ede4cfaee9676714793745038108a28a5696e8df9cc1dL291 that changes the value to 4096*1024=4194304.
On any host i did

sysctl net.core.rmem_default=4194304
sysctl net.core.rmem_max=4194304

At the next test to restore the system failed already.

Do you know other / further parameters to set?

@rbrtbnfgl
Copy link
Contributor

so changing that value doesn't fix the issue? It seems related to the UDP socket of the application not directly to networking. With netstat -unlp you should see the bytes on the recv buffer Recv-Q and rmem_max should be of the same size. How does the application manage the UDP socket it seems like that it's processing the received data slower than the amount of data from the UDP connection that's why it's filling the buffer.

@tgruenert
Copy link
Author

right. the issue exists already.
until know i did not manage netstat -unlp inside the right network namespace as you mentioned. (It´s a skill issue by myself but i am learning.)

Application restores data from a S3 bucket. So the main traffic is TCP. UDP is needed only for DNS so UDP traffic is very low. But also as I limited the restore bandwidth to half issue came up. It just took longer.

@tgruenert
Copy link
Author

i am not sure about missunterstand something.

running this inside the elastic container

root@es7-debug-es-default-1:~# netstat -tulnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:9300            0.0.0.0:*               LISTEN      7/java
tcp        0      0 0.0.0.0:9200            0.0.0.0:*               LISTEN      7/java

but without -t list is empty which mean no UDP happens or is UDP so fast that i can´t see it with a onetime shell command?

@tgruenert
Copy link
Author

to be complete i add the elasticsearch logfile from this operation. beginning at line 290 the error occurs.
es7-debug-es-default-2.log

As i read

"Caused by: java.io.IOException: geomap2app-prod-es7-de-odb-wohnen-backup.s3.eu-central-1.wasabisys.com",
"at java.net.InetAddress$CachedAddresses.get(InetAddress.java:801) ~[?:?]",
"at java.net.InetAddress.getAllByName0(InetAddress.java:1509) ~[?:?]",

i looked out deeper in Java DNS and set parameter -Dnetworkaddress.cache.ttl=0 -Dnetworkaddress.cache.negative.ttl=0 for the process.
Also this action does not helped out for working.

@rbrtbnfgl
Copy link
Contributor

Which version of RKE2 are you using? Maybe it's not a buffer related issue.

@tgruenert
Copy link
Author

RKE2 v1.25.5+rke2r2
containerd://1.6.14-k3s1
Kernel: 5.10.0-20-amd64
OS Image Debian GNU/Linux 11 (bullseye)

@tgruenert
Copy link
Author

tgruenert commented Feb 21, 2023

just an other observation:
I assumed installing a node-cache (see #3924 (comment)) should redirect dns queries from pods to this node-cache.
I did an tcpdump of one curl request but the pod asks the coredns service instead node-cache.

root@es7-debug-es-default-2:~# tcpdump port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:02:58.979529 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.40616 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 1187+ A? google.de.geomap-stage-es7-debug.svc.cluster.local. (68)
16:02:58.979562 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.40616 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 54439+ AAAA? google.de.geomap-stage-es7-debug.svc.cluster.local. (68)
16:02:58.979835 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.57190 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 65512+ PTR? 10.0.43.10.in-addr.arpa. (41)
16:02:58.980670 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.40616: 54439 NXDomain*- 0/1/0 (161)
16:02:58.981194 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.57190: 65512*- 1/0/0 PTR rke2-coredns-rke2-coredns.kube-system.svc.cluster.local. (133)
16:02:58.981326 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.40616: 1187 NXDomain*- 0/1/0 (161)
16:02:58.981402 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.33149 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 54032+ A? google.de.svc.cluster.local. (45)
16:02:58.981443 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.33149 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 15599+ AAAA? google.de.svc.cluster.local. (45)
16:02:58.982028 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.33149: 15599 NXDomain*- 0/1/0 (138)
16:02:58.982080 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.33149: 54032 NXDomain*- 0/1/0 (138)
16:02:58.982131 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.47423 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 34483+ A? google.de.cluster.local. (41)
16:02:58.982152 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.47423 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 26037+ AAAA? google.de.cluster.local. (41)
16:02:58.982830 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.47423: 26037 NXDomain*- 0/1/0 (134)
16:02:58.982875 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.47423: 34483 NXDomain*- 0/1/0 (134)
16:02:58.982924 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.59591 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 16595+ A? google.de. (27)
16:02:58.982944 IP es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.59591 > rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53: 55516+ AAAA? google.de. (27)
16:02:58.983613 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.59591: 16595 1/0/0 A 142.250.74.195 (52)
16:02:58.983644 IP rke2-coredns-rke2-coredns.kube-system.svc.cluster.local.53 > es7-debug-es-default-2.es7-debug-es-default.geomap-stage-es7-debug.svc.cluster.local.59591: 55516 1/0/0 AAAA 2a00:1450:4001:827::2003 (64)
^C
18 packets captured
18 packets received by filter
0 packets dropped by kernel

Is this correct? And if it is true - why ... ?

edit: beautify log

@rbrtbnfgl
Copy link
Contributor

Are you doing tcpdump on the node where DNS is located or on the node where the pod is doing the curl? I don't know how the cache is implemented but I presume that it will always contact the DNS service IP and then the proxy will redirect it to the right pod.

@tgruenert
Copy link
Author

I did this inside elastic pod.
And i checked kubernetes service points to coredns.

Docu writes "Please be aware that nodelocal modifies the iptables of the node to intercept DNS traffic." I am not sure what that means. From the point of view by kubernetes the node-cache pods imho will get no requests.

@rbrtbnfgl
Copy link
Contributor

By default the IP address of the DNS service is modified to the IP address of the coreDNS pod by iptables. With this feature enabled I think it creates additional rules to redirect the traffic to the local cache.

@rbrtbnfgl
Copy link
Contributor

you can try to use dropwatch to check where the packets are dropped in the kernel.

@tgruenert
Copy link
Author

after installing dropwatch on host i got a lot of logfiles (see attachment). do you know how to interpret this?
logfile show restoring process and its fail somewhere at the end. i had manually stop dropwatch after getting failure.
dropwatch.log

BTW: thank you for your support @rbrtbnfgl

@rbrtbnfgl
Copy link
Contributor

I thought that this could give more meaningful output. There are a lot of unrelated drops.

@tgruenert
Copy link
Author

first of all - thank you @rbrtbnfgl and @brandond for looking to this issue.

to sum up until now:

  • dropped packages on host looks like other ones where i installed dropwatch. i crosschecked this with other systems - some looses seems to be normal

  • dropwatch reported no package losses on hardware, software only

  • conntrack reported 'invalid' and 'search_restart' packages. but actually we don´t know enough about interpreting these values

  • until now i can´t see a message queue which is full and loose packages

  • problem is limited to UDP while application uses UDP only for some DNS queries

  • increase buffers

    sysctl net.core.rmem_default=4194304
    sysctl net.core.rmem_max=4194304
    

    don´t fix this issue

  • using DNS NodeLocal DNSCache don´t fix this issue

  • disabling Java DNS Cache don´t fix this issue

Are there any other ideas how to inspect deeper in this?

Thanks!

@rbrtbnfgl
Copy link
Contributor

it seems that it's not related to the buffer the counter is 0

 0 receive buffer errors

If you check dmesg do you have any errors UDP related?

@tgruenert
Copy link
Author

@rbrtbnfgl
I´ve checked dmesg during the elastic restore operation. There are no messages at all.

@eifelmicha
Copy link

I did this inside elastic pod. And i checked kubernetes service points to coredns.

Docu writes "Please be aware that nodelocal modifies the iptables of the node to intercept DNS traffic." I am not sure what that means. From the point of view by kubernetes the node-cache pods imho will get no requests.

This didn't work for me after all with Calico/Canal and verified this with watching metrics on the NodeLocalDNS Pods/ CoreDNS itself. Only thing that helped was setting the following in /etc/rancher/rke2/config.yaml:

kubelet-arg:
- cluster-dns=169.254.20.10

@tgruenert
Copy link
Author

After digging deeper into the issue i assume that the problem is around UDP, not the DNS.
We used also external DNS for our tests and the problem occurs too.

We also figured out that UDP answer paket for DNS request reaches the container.

tcpdump

# tcpdump port 53 -n
21:01:50.334725 IP 10.42.9.42.34092 > 8.8.8.8.53: 38996+ [1au] A? google.de. (50)
21:01:50.340702 IP 8.8.8.8.53 > 10.42.9.42.34092: 38996 1/0/1 A 142.250.186.131 (54)

This lines came from a DNS request via dig which timeouts.

Further we watched out the netstat on failed DNS queries. See how the counter 'packet receive error' increments:

root@es7-debug-es-default-0:~# netstat -s | grep -e "Udp:" -A 5

Udp:
    463 packets received
    0 packets to unknown port received
    89 packet receive errors
    552 packets sent
    0 receive buffer errors
root@es7-debug-es-default-0:~# dig @8.8.8.8 google.de ; netstat -s | grep -e "Udp:" -A 5

; <<>> DiG 9.16.1-Ubuntu <<>> @8.8.8.8 google.de

... (dig command timed out) ...

Udp:
    465 packets received
    0 packets to unknown port received
    90 packet receive errors
    555 packets sent
    0 receive buffer errors
root@es7-debug-es-default-0:~#

Do you have any hints where to go next @rbrtbnfgl @brandond? Thanks!

@rbrtbnfgl
Copy link
Contributor

tcpdump was done inside the container or on the node? Could you capture using -w and attach the file to check if there are any issues with the dropped packet?

@tgruenert
Copy link
Author

@rbrtbnfgl
yes, tcpdump was done inside container.

i tried to make this more understandable and did a screencast of it.
https://user-images.githubusercontent.com/6465773/221170445-990ab5e4-8938-47af-a072-123366bb7688.mp4

also the pcap file is attached.
udplost.pcap.zip

you see the requests until the first complete timeout. during this some slow dns queries are shown.

@tgruenert
Copy link
Author

after inspecting pcap file with wireshark (i am not familiar with that) it seems that in packet 117 .. 122 i see:

  • DNS queries are sent and get answer within milliseconds
  • this is done 3 times
  • after 3 times application ends with timeout.

So I want to precise my question.
Why dont get the application the answer packet if it is received by container?

@rbrtbnfgl
Copy link
Contributor

could you try this?
ethtool -K ethX rx off tx off
ethX is the name of the interface inside the container

@tgruenert
Copy link
Author

@rbrtbnfgl I tried this but unfortunately it changes nothing. The system has the same behavior as shown in video before.

@rbrtbnfgl
Copy link
Contributor

rbrtbnfgl commented Feb 24, 2023

ethtool -S eth0? It should have more counters

@tgruenert
Copy link
Author

Yes it has but most of them don´t count. I´ve checked this also with high network load.

root@es7-debug-es-default-0:~# ethtool -S eth0
NIC statistics:
     peer_ifindex: 36
     rx_queue_0_xdp_packets: 0
     rx_queue_0_xdp_bytes: 0
     rx_queue_0_drops: 0
     rx_queue_0_xdp_redirect: 0
     rx_queue_0_xdp_drops: 0
     rx_queue_0_xdp_tx: 0
     rx_queue_0_xdp_tx_errors: 0
     tx_queue_0_xdp_xmit: 0
     tx_queue_0_xdp_xmit_errors: 0

@tgruenert
Copy link
Author

@eifelmicha
Thank you for your point. I checked this with and without NodeLocalDNS. For me this works as exceptected. CoreDNS get more queries without NodeLocalDNS and less queries with enabled one.

@eifelmicha
Copy link

Did you test with or without the cluster-dns=169.254.20.10 argument? For me this is the only valid way on Calico to utilize NodeLocalDNS. Tested multiple times with fresh clusters.

@tgruenert
Copy link
Author

@eifelmicha it was tested without the argument cluster-dns=169.254.20.10. I am on RKE2 1.25.

@rbrtbnfgl
Copy link
Contributor

If you try to start a new empty pod on the failing node and try to contact the DNS does it work? It's only to check if there are issues on the networking of the node or it is specific on the failing pod.

@tgruenert
Copy link
Author

Other pods working already but dont have a comparable notwork traffic. So i saw the issue only on the failing pod right now.

@rbrtbnfgl
Copy link
Contributor

And you get the same issue with other CNIs too?

@tgruenert
Copy link
Author

Well i can´t test this right now. It would need to setup a completely new cluster i think. As it is all bare metal things are not so easy at this point.

@tgruenert
Copy link
Author

I installed a k0s distribution with Calico + wireguard as CNI. The problem occurs again.
Now i switch back to RKE2.
Next i will try to force DNS request via TCP instead UDP.

@rbrtbnfgl
Copy link
Contributor

I don't know probably it could be an issue of the application itself which brings the container interface into an error state.

@tgruenert
Copy link
Author

Well it is Elasticsearch - a java application. I am not able to inspect this deeper.
At the moment i try to workaround. I patched the pod using TCP instead UDP for DNS. Later on I will write the result.

@tgruenert
Copy link
Author

I check my workaround a litte. DNS via TCP works as it should 🎉

Workaround for a kubernetes pod is

spec:
  dnsConfig:
    options:
      - name: use-vc 

How this works:

  • kubelet places file /etc/resolv in every container according to cluster DNS settings
  • directive dnsConfig (see above) can control some details of this
  • typical resolver in Linux can have options like decsribed
  • option use-vc forces TCP oder UDP for DNS queries inside a system (container in this case)

As result my /etc/resolv.conf inside container looks like this after deploying

$ cat /etc/resolv.conf
search es7-debug.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.43.0.10
options ndots:5 use-vc

Thank you @rbrtbnfgl for supporting me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants