Join GitHub today
coredns has caching plugin installed which causes non-authoritative responses most of the time #1512
What keywords did you search in kubeadm issues before filing this one?
Is this a BUG REPORT or FEATURE REQUEST?
kubeadm version (use
The coredns configmap is:
What you expected to happen?
When querying a service dns name, I expect the result to be authoritative ("aa"), ie:
This is a successful query as expected:
This is an unsuccessful query:
This is unsuccessful because it's not in the first 1 second of being updated. Once the ttl of the entry is less than the min ttl, it is, by definition, no longer authoritative. This is because it's being served from the cache instead of from the authoritative domain entry.
How to reproduce it (as minimally and precisely as possible)?
Install a cluster.
The first query will be authoritative because that query populates the cache. One second later, the next query is served from cache and not authoritative.
Anything else we need to know?
This problem was reported to me as affecting some of our customer's software that's written in python that will just fail if the dns response is not authoritative.
The cache plugin in the default Corefile is used to reduce traffic to the upstream DNS.
If you don't want kubernetes records to be cached, you have a couple of options, each has possible drawbacks:
FWIW, I recall a recent issue opened requesting that the we cache kubernetes records for longer than 5 seconds. It's hard to pick a default value that suits everyone.
My view on this is that caching entries breaks existing user software and not caching increases latency. One's breaking the other's inconvenient. IMHO, the breakage should take precedent and optimizing for use case should be the responsibility of the cluster maintainer.
that is true, the umbrella ticket for allowing such customization of kubeadm generated manifests is here:
if the coredns maintainers give their +1 on modifying the default corefile in kubeadm we can proceed to change it, otherwise this ticket should be closed and mentioned in a comment in the above ticket - e.g. "allow customization of the CoreDNS deployment".
is modifying the coredns config map of a running cluster and restarting the pods a viable, immediate solution for you?
It depends on how wide the breakage is. I don't think it common for clients to reject non-authoritative responses - but I'm not a DNS expert. Is this a python wide thing or is it something specific to your customer's application?
If this is something that is fairly common, then we should accommodate it in the default config. If it turns out to be an unusual special case, then probably not.
@chrisohaver it appears to be common to Python, via the
In CoreDNS you can disable cache so all local cluster zone responses will be authoritative. But it wont change responses from upstream servers. They would mostly be non-authoritative, retrieved from the cache of intermediate recursive servers. This is normal, and thus confounding that Python should only be able to resolve names directly from authoritative servers.
I just sanity checked this on a k8s cluster running CoreDNS with cache enabled: In my test, Python3 (3.6.5) seems to be fine with non aa responses from CoreDNS.
... and the CoreDNS logs ...