-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lowers the default nodelocaldns denial cache TTL #74093
Lowers the default nodelocaldns denial cache TTL #74093
Conversation
Similar to `--no-negcache` on dnsmasq, this prevents issues which poll DNS for orchestration such as operators with StatefulSets. It can also be very confusing for users when negative caching results in a change they just made seeming to be broken until the cache expires. This assumes that 5 seconds is reasonable and will still catch repeated AAAA negative responses. We could also set the denial cache size to zero which should effectively fully disable it like dnsmasq in kube-dns but testing shows this approach seems to work well in our (albeit small) test clusters.
/sig network |
Hi @blakebarnett. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @prameshj |
/ok-to-test |
In case negative responses in the cluster dns domain is the primary concern here, these negative responses default to a TTL of 5 as of CoreDNS 1.3.1. e.g.
|
Thanks! I was unaware, I verified we've been testing against a build using CoreDNS v1.2.5, I'll update and test again now. |
Ah, seems the latest published image is still only v1.2.6. @prameshj is upgrading to 1.3.1 and publishing an image viable? |
CoreDNS 1.3.1 is already in the k8s.gcr.io repo: k8s.gcr.io/coredns:1.3.1. These negative responses originate from the cluster dns, not the node local dns. The version of the node local dns should not make a difference. IIUC |
Yes, that's how I understand it too, clusterDNS is setting the TTL on the record. If we use coreDNS as cluster dns, this TTL on the record will be 5s. So no need to use any TTL on the cache side. |
Ah, we've not moved to CoreDNS yet, so in this case I suppose this change is still needed. |
GKE is still defaulting to kube-dns also AFAIK (at least when we built our 1.11.5 cluster) |
In this change, the cache size for success and denial is specified as 10000, but it will be rounded down to nearest multiple of 256 = 9984, which is the default cache size value if nothing is specified. So the only change here is specifying a lower TTL for denial, that looks good to me. |
/lgtm |
/priority important-soon |
https://github.com/coredns/coredns/blob/master/plugin/cache/cache.go#L236 This gets rounded down to the nearest multiple of 256: 9984
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: blakebarnett, MrHohn, prameshj The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Similar to
--no-negcache
on dnsmasq, this prevents issues which poll DNS for orchestration such as operators with StatefulSets. It can also be very confusing for users when negative caching results in a change they just made seeming to be broken until the cache expires. This assumes that 5 seconds is reasonable and will still catch repeated AAAA negative responses. We could also set the denial cache size to zero which should effectively fully disable it like dnsmasq in kube-dns but testing shows this approach seems to work well in our (albeit small) test clusters.Which issue(s) this PR fixes:
Fixes #74092
Does this PR introduce a user-facing change?: