Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation for using Node-local DNS Cache add-on #17944

Conversation

sftim
Copy link
Contributor

@sftim sftim commented Dec 4, 2019

The Node-local DNS Cache is a very simple add-on, essentially one DaemonSet and a related ConfigMap.

I haven't used tried using this feature but I did spot that the existing documentation was a bit far from the style guide, so I'm proposing these changes.
I think there's more work needed (what fills in the missing information? what are appropriate values? is link-local autoconfiguration the risk I think it is?); I'm aiming for these changes to be a first set of improvements.
Preview of new page

Reviewers: I've seen various names and capitalizations for this add-on. For documentation, I selected the name “Node-local DNS Cache” because it avoids the risk of looking like a Kubernetes feature flag or object. I don't want readers trying to kubectl get NodeLocalDNSCache, or even having to make a decision on whether that makes sense.

Relevant to #14822
/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. labels Dec 4, 2019
Comment on lines 30 to 44
In detail:

* With the current DNS architecture, it is possible that Pods with the highest DNS QPS have to reach out to a different node, if there is no local kube-dns/CoreDNS instance.
Having a local cache will help improve the latency in such scenarios.
* With the current DNS architecture, it is possible that Pods with the
highest DNS QPS have to reach out to a different node, if there is no
local kube-dns / CoreDNS instance.
Having a local cache help improve the latency in such scenarios.

* Skipping iptables DNAT and connection tracking will help reduce [conntrack races](https://github.com/kubernetes/kubernetes/issues/56903) and avoid UDP DNS entries filling up conntrack table.
* Skipping iptables DNAT and connection tracking helps to reduce [conntrack races](https://github.com/kubernetes/kubernetes/issues/56903) and to avoid UDP DNS entries filling up conntrack table.

* Connections from local caching agent to kube-dns servie can be upgraded to TCP. TCP conntrack entries will be removed on connection close in contrast with UDP entries that have to timeout ([default](https://www.kernel.org/doc/Documentation/networking/nf_conntrack-sysctl.txt) `nf_conntrack_udp_timeout` is 30 seconds)
* Connections from local caching agent to kube-dns servie can be upgraded to TCP. TCP conntrack entries will be removed on connection close in contrast with UDP entries that have to time out ([default](https://www.kernel.org/doc/Documentation/networking/nf_conntrack-sysctl.txt) `nf_conntrack_udp_timeout` is 30 seconds)

* Upgrading DNS queries from UDP to TCP would reduce tail latency attributed to dropped UDP packets and DNS timeouts usually up to 30s (3 retries + 10s timeout). Since the nodelocal cache listens for UDP DNS queries, applications don't need to be changed.
* Upgrading DNS queries from UDP to TCP would reduce tail latency attributed to dropped UDP packets and DNS timeouts usually up to 30s (3 retries + 10s timeout). Since the node-local cache container listens for UDP DNS queries, you don't need to change applications.

* Metrics & visibility into dns requests at a node level.

* Negative caching can be re-enabled, thereby reducing number of queries to kube-dns service.

## Architecture Diagram
* Negative caching can be re-enabled, thereby reducing number of queries to the kube-dns service.

This is the path followed by DNS Queries after NodeLocal DNSCache is enabled:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is a Task page I'm tempted to omit these details, or to move them to the end of the page. What do people think?


{{< figure src="/images/docs/nodelocaldns.jpg" alt="NodeLocal DNSCache flow" title="Nodelocal DNSCache flow" caption="This image shows how NodeLocal DNSCache handles DNS queries." >}}
{{< figure src="/images/docs/nodelocaldns.jpg" alt="Node-local DNS Cache query flow" title="Node-local DNS Cache query flow" caption="This image shows how Node-local DNS Cache handles DNS queries." >}}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice, but not essential, to redraw this in SVG. SVG diagrams are much more easy to localize.


## Configuration

This feature can be enabled using the command:

`KUBE_ENABLE_NODELOCAL_DNS=true go run hack/e2e.go -v --up`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped this because the prerequisites don't explain what you need to do (eg: install Golang) to make it work.

@netlify
Copy link

netlify bot commented Dec 4, 2019

Deploy preview for kubernetes-io-master-staging ready!

Built with commit 6948e83

https://deploy-preview-17944--kubernetes-io-master-staging.netlify.com

@sftim
Copy link
Contributor Author

sftim commented Dec 16, 2019

/retest

@sftim sftim force-pushed the 20191204_tweak_node_local_dns_addon_information branch from 3569dc7 to 3bb8c2c Compare January 11, 2020 18:35
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign sftim
You can assign the PR to them by writing /assign @sftim in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kbhawkey
Copy link
Contributor

@sftim, Would you look into the build failure? Thanks!

1:36:29 PM: Total in 1951 ms
1:36:29 PM: Error: Error building site: "/opt/build/repo/content/en/docs/tasks/administer-cluster/nodelocaldns.md:88:6": failed to extract shortcode: shortcode "capture" has no .Inner, yet a closing tag was provided
1:36:29 PM: Function Dir: /opt/build/repo/functions
1:36:29 PM: Skipping functions preparation step: /opt/build/repo/functions not found
1:36:29 PM: Caching artifact

@sftim
Copy link
Contributor Author

sftim commented Feb 10, 2020

I'll rebase this
/retitle [WIP] Improve(?) documentation for using Node-local DNS Cache add-on

@k8s-ci-robot k8s-ci-robot changed the title Improve(?) documentation for using Node-local DNS Cache add-on [WIP] Improve(?) documentation for using Node-local DNS Cache add-on Feb 10, 2020
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 10, 2020
@sftim sftim force-pushed the 20191204_tweak_node_local_dns_addon_information branch 2 times, most recently from 03c4a57 to 4818d72 Compare February 19, 2020 00:19
@sftim sftim force-pushed the 20191204_tweak_node_local_dns_addon_information branch from 4818d72 to 6948e83 Compare February 19, 2020 00:25
@sftim
Copy link
Contributor Author

sftim commented Feb 19, 2020

/retitle Improve documentation for using Node-local DNS Cache add-on

@k8s-ci-robot k8s-ci-robot changed the title [WIP] Improve(?) documentation for using Node-local DNS Cache add-on Improve documentation for using Node-local DNS Cache add-on Feb 19, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 19, 2020
@sftim
Copy link
Contributor Author

sftim commented Mar 14, 2020

#18716 got there first

/close

@k8s-ci-robot
Copy link
Contributor

@sftim: Closed this PR.

In response to this:

#18716 got there first

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot
Copy link
Contributor

@sftim: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 14, 2020
@sftim sftim deleted the 20191204_tweak_node_local_dns_addon_information branch June 9, 2021 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. sig/docs Categorizes an issue or PR as relevant to SIG Docs. sig/network Categorizes an issue or PR as relevant to SIG Network. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants