New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube-dns: dnsmasq intermittent connection refused #45976
Comments
|
cc @bowei we are also seeing this intermittently in our clusters, specifically from java based containers. the lookups that fail are for non-cluster domains. @someword what version of kube-dns manifest are you running with. specifically, does it have this #41212 change ? we are running a older version without that change. Just wondering if the above change helps. |
|
@ravilr So we have been running an internal experimental version of the manifest while trying to unravel all of this. The dnsmasq flags that we are currently running are here. I think that's what you were wanting right? The combination of the below flags AND over provisioning has helped but obviously we are still having issues.
|
|
ok. good to know. i was just trying to see if there are any patterns. Another thing that is related for sure in our case is, kube-proxy being up and available all the times. During kubernetes version upgrades (in place upgrade without drain) on the nodes, we've observed this happening when kube-proxy gets restarted. reminds me of this #32749 which helps in removing dependency on some of the kube components for pod dns resolutions of non-cluster-local queries. |
|
It's interesting that you both noted this problem was present in a Java app. Can you note if you are using a specific TTL setting w/i your library or the JVM? |
|
There is a limit on dnsmasq for # of concurrent forwarded queries, |
|
@bowei - We pondered about that as well. We do not have any log messages from dnsmasq like this We tested on a kube-dns pod and set the max to a very low number and verified that log messages would get written when we hit the max. We have been tempted to set it to 300 as a test but from what i've seen dnsmasq will log if this is the reason. |
|
@cmluciano - We do not pass any of these flags to the jvm networkaddress.cache.ttl or networkaddress.cache.negative.ttl. I am going to investigate what networkaddress.cache.ttl is set to and see if maybe the java based apps are not doing any caching. However the issue we are seeing where dnsmasq-metrics in the same pod as dnsmasq is getting connection refused when trying to do a dns lookup against dnsmasq makes me think the issue is in the kube-dns pod itself. Whether that is dnsmasq just being locked up or possibly some resource shortfall (ephemeral ports, file descriptors, etc) that is causing the attempted UDP connection from dnsmasq-metrics to the dnsmasq container to fail at some layer lower than dnsmasq. |
|
@cmluciano we use the openjdk and the default for networkaddress.cache.ttl is 30 seconds according to https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/sun/net/InetAddressCachePolicy.java#L48. I verified by capturing traffic from a java app that is just doing a dns lookup in a loop for kinesis.us-east-1.amazonaws.com. I see requests hit the wire about every 30 seconds even though the loops are at 10 second intervals. Increasing this to 60 seconds may lighten the load on the name servers but dnsmasq is still refusing queries occasionally. |
|
@someword do you know what is the dns QPS hitting dnsmasq? (this can be obtained from measuring the delta hits/misses # from the dns pod http://127.0.0.01/metrics) |
|
@bowei When I look at the available metrics I don't see a cache hit metric. I have a cachemiss counter And request totals What's strange is that for this particular pod i have 4 million cache misses but only 98618 requests? I would assume that cache miss has to be a smaller number than total requests. So we are just in the process in getting these metrics into datadog for visualization across our cluster. Something doesn't seem to be accurate. In this screenshot we are looking at all of our kube-dns pods in a specific production cluster. The request counter is converted to a QPS datapoint for each pod. This is showing on average 1/2 to 1 qps which seems low. This screen shot shows the cache miss broken down by second (MPS?). We are averaging 1.3K MPS but only 13 QPS. The query count seems low and the miss seems very high. Does the above make any sense or am I missing something. I'll capture some traffic and see what sort of DNS query rate I see. |
|
I looked at a 30 second snapshot of traffic going to a single dnsmasq container and my numbers don't line up with the dnsmasq-metrics QPS.
I did this a few times on a couple nodes but the count was in the high 4K and low 5K each time so the above seems like a decent representation. Also we are at a low time in our usage so I would expect those numbers to be even 20% higher during higher load time. |
|
The dnsmasq metrics are available in Given ~4k - 5k QPS at dnsmasq and given the CPU request that dnsmasq has, you may need another replica to handle the load. Or I would try increasing the CPU request for dnsmasq. |
|
@bowei - i was hitting port 10055 (skydns) for metrics and it looks like i want the different metrics on port 10054 for dnsmasq. Doh!
So for cache hits 7,992,626 to miss is 0.000005342244 which seems like a pretty healthy caching name server. We do have 16 kube-dns replicas and have tried running with 30 but still experience dnsmasq refusing connections. When I look at cpu stats for dnsmasq I don't see anything that makes me think it's underpowered. With a 100m cpu limit we have 0 docker.cpu.throttled with a max of 20m for kubernetes.cpu.usage.total. |
|
Interesting -- can you check your node conntrack (http://conntrack-tools.netfilter.org/conntrack.html) tables on your node (not the pod)? The way a lot of resolver libraries work is that they bind an ephemeral port to send the request and each request results in an entry. If you exceed the conntrack limits, you will start getting dropped packets. |
|
From kube-proxy logs I see these values logged
It looks like at this point in time my count is under the max which i gather is either 65536 or 262144. Would conntrack come into play when the dnsmasq-metrics container is performing a dns lookup against the dnsmasq container in the same pod? I was thinking it was traffic traversing the pod localhost network it would not go through conntrack but I've not found the specific details that cover the traffic path as it goes between containers in the same pod over the localhost interface. I'll look to see if we are tracking the size of the conntrack tables or if we hit the maximum would it log and see if we have anything in our log aggregation system. |
|
/assign |
|
@bowei - i'm curious if you have any thoughts on this. We have instrumented a variety of additional metrics to refer to when this issue comes up again. In doing so I'm noticing a busy udp based app running in a pod (not kube-dns) is having udp rcv_buf_errors. For my particular exercise and getting supporting data to help determine the cause of DNS resolutions errors should I only be concerned about udp packet loss at the physical host level? As I write this it makes me think I should add a sidecar to the kube-dns pod to gather network stats specific to the kube-dns pods network namespace. Sorry if this is getting off topic. |
|
@someword -- can you open a new issue re: UDP? That sounds like a problems that should be investigated by itself... |
|
@bowei - would the issue just be my question about whether tracking UDP metrics within a pods network namespace is important vs tracking at the the physical machine level? Also does this current issue provide any benefit or should it be closed? |
|
It sounds like two issues to me:
The second one may already be filed somewhere. Keep this one open for now. |
|
I've run into this issue, and I believe the root cause in my case is saturating the nf_conntrack limits on the kube-dns node. I have a script and configuration that can reproduce this issue on GKE 1.7.6 if it is helpful. The workaround was to set |
|
@evanj Can you post the script to a gist and link it here? |
|
@evanj - Are you hitting limits of nf_conntrack in the kube-dns pod or in the physical instance hosting the pod? We are monitoring nf_conntrack counts and we are not hitting the maximums. I'll checkout out our |
|
I've posted the program and config files, with steps for how to reproduce the issue at the top. I'm hitting the node level One thing I noticed when I created a brand new cluster, rather than my existing cluster that has been upgrade: the kube-dns-autoscaler default configuration now has Code: https://gist.github.com/evanj/261ffbee061d4309673425b705a78c18 |
|
I posted my findings in another issue which seems to be about the same problem (at least related) - #47142 I'd like to also share them here to contribute to the discussion. In my case I explored logs and found something that I cannot yet explain. This is the output when an external host is successfully resolved: This is the output when an external host isn't found: Both cases are for the same external hostname (however it's reproducible with any), same containers, same application and same cluster configuration. For some reason depending on I have no idea what it decides to resolve either using local records only or upstream only. Another thing I'm going to try is to replace Kube-DNS with CoreDNS. |
|
Replacing Kube-DNS with CoreDNS resulted in the same bahaviour... Looks like the issue isn't with DNS servers. The issue must be higher up in the Kubernetes DNS middleware. |
|
@bboreham To be clear, the linked issue is about SNAT specifically. The linked article refers to use of host-gw mode on flannel, which masquerades all cross node pod communication. If you're using a setup without masquerading everything (like vxlan, cloud routing, etc.), then accessing kube-dns or any service IP will use DNAT only, and is not affected by the described issue. |
|
@jsravn could you clarify "the linked issue", "The linked article" and "the described issue" with specific links please? I got lost. |
|
@joekohlsdorf that's a quick win!! thank you for the tip. |
|
@bboreham The weave issue you linked has the article https://tech.xing.com/a-reason-for-unexplained-connection-timeouts-on-kubernetes-docker-abd041cf7e02. This describes a problem with source NATing - if there is a source port collision, it's possible for the packet to get dropped - for UDP and DNS this causes timeouts. When using "host-gw" mode on flannel, every pod connection is source NAT'd on the host VM's IP, making collisions much more likely. Anyways, I think this may actually explain my own DNS failures. My cluster nodes are all setup with a local dnsmasq which proxies all pod DNS queries. I discovered that due to the way service IPs work in iptables, this dnsmasq incorrectly picks the wrong source IP when establishing a connection to the kube-dns pod, and so everything is source NAT'd. I've changed it to force dnsmasq to use the flannel interface for the source IP which stops the source NATing - I'm hoping that fixes things for me! |
|
@jsravn ok, I refute your assertion "the linked issue is about SNAT specifically". E.g. this comment is about DNAT: weaveworks/weave#3287 (comment) |
Cool, didn't see that. The issue OP and its linked post are about snat specifically. But that comment explains that the same race condition can also occur with DNAT. |
|
We're seeing this in GKE as well as in a kops deployed AWS environment. We're starting to move this up into our production environments, but if DNS has transient issues it's a bit concerning. Reading through this thread, it looks like we don't really have a full idea of what's causing this do we? Edit: I've noticed that a pod can sometimes get in a state where DNS will never resolve internal services, deleting that pod fixes the issue. |
|
This comment explains the root cause pretty well: weaveworks/weave#3287 (comment) We have switched our resolvers to TCP and since not seen these issues anymore. This is probably better than the 4ms artificial delay to avoid the race which was suggested in the weave issue and is much easier to implement. The title of this issue should be updated, it doesn't only affect kube-dns. |
|
@joekohlsdorf "We have switched our resolvers to TCP" Could you elaborate on how you made the change? |
|
I just wanted to jump in on this issue as the problems I am experiencing in my environment are extremely similar to the OP. Java containers specifically, no dnsmasq messages about I have attempted adding |
|
@YoniTapingo I run this little script from my container entrypoint: #!/usr/bin/env sh
echo >> /etc/resolv.conf
echo "options use-vc" >> /etc/resolv.confYou could also do it in a preStart lifecycle hook if you have root or sudo. |
|
I was seeing this a lot, and confirmed I was seeing conntrack-related packet drops. We have applications performing a very large number of lookups, some external, but many internal to the cluster (so default resolver was not an option). I was able to fix by setting a very high value on kube-proxy's conntrack commandline parameter:
The nodes seem to have been stable for a couple months with this now. |
|
@joekohlsdorf The most relevant point I could find about DNS TCP support in musl is: https://twitter.com/RichFelker/status/994629795551031296 |
This value seems very high. How did you determine that this was the right value for your use-case ?
So 536870912 gives us virtually the nf_conntrack_buckets value for 2.19TB of memory. I wonder why the conntrack table increase like stated in blog post https://tech.xing.com/a-reason-for-unexplained-connection-timeouts-on-kubernetes-docker-abd041cf7e02 didn't works for them:
But, I tested myself dns-test with default conntack-min and with I also have to test this with our application. Is there any information about when this bug was introduced ? We run without this issue on k8s 1.6.7 ... |
If you know that you have a lot of external queries, you should verify that you are not hitting any limits of your upstream DNS or cloud provider. I ran into this because someone thought it would be a good idea to not cache NXDOMAIN responses in kube-dns by default. Unfortunately checking this is a bit tricky because the number of upstream DNS reqs isn't (yet) provided as a metric. |
@joekohlsdorf How do you change that? |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
|
/remove-lifecycle stale |
|
@bowei If you aren't able to handle this issue, consider unassigning yourself and/or adding the 🤖 I am a bot run by vllry. 👩🔬 |
|
reopen if needed |


Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.):
What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.):
Is this a BUG REPORT or FEATURE REQUEST? (choose one):
Kubernetes version (use
kubectl version):Environment:
uname -a): 4.10.1-coreosWhat happened:
java.net.UnknownHostException: dynamodb.us-east-1.amazonaws.com
What you expected to happen:
Receive a response to the name lookup request.
How to reproduce it (as minimally and precisely as possible):
This is the kicker. We are not able to reproduce this issue on purpose. However we experience this in our production cluster 1 - 500 times a week.
Anything else we need to know:
In the past 2 months or so we had experienced a handful of events where DNS was failing for most/all of our production pods and the event would last for 5 - 10 minutes. During this time the kube-dns service was healthy with 3 - 6 available endpoints at all times. We increased our kube-dns pod count to 20 in 20 node production clusters. This level of provisioning alleviated the DNS issues that were taking down our production services. However we still experience at least weekly smaller events ranging from 1 second to 30 seconds which affect a small subset of pods. During these events 1 - 5 pods on different nodes across the cluster experience a burst of DNS failures which have a much smaller end user impact. We enabled query logging in dnsmasq as we were not sure whether the queries made it from the client pod to one of the kube-dns pods or not. What was interesting is that during the DNS events where query logging was enabled none of the name lookup requests that resulted in an exception were received by dnsmasq. At this point my colleague noticed these errors coming from dnsmasq-metrics
That error as near as I can tell is basically a name resolution error from dnsmasq-metrics as it's trying to query the dnsmasq container in the same pod to get dnsmasq's internal metrics similar to running
dig +short chaos txt cachesize.bind.All of our DNS events are happening at the exact same time that 1 or more dnsmasq-metrics container is throwing those errors. We thought we might be possibly exceeding the default 150 connection limit that dnsmasq has but we do not see any logs indicating that. IF we did we would expect to see these log messages
Based off of conversations with other cluster operators and users in slack I know that other users are experiencing these same problems. I'm hoping that this issue can be used to centralize our efforts and determine if dnsmasq refusing connections is the problem or a symptom of something else.
The text was updated successfully, but these errors were encountered: