Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GA graduation criteria for NodeLocal DNSCache #1351

Merged
merged 4 commits into from
Jan 23, 2020

Conversation

prameshj
Copy link
Contributor

@prameshj prameshj commented Nov 2, 2019

Modified the existing KEP, as discussed in #1307

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 2, 2019
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/network Categorizes an issue or PR as relevant to SIG Network. labels Nov 2, 2019
- Upgrade to a newer CoreDNS version(1.16.x) in [node-cache](https://github.com/kubernetes/dns/pull/328).
- Ensure that Kubernetes [e2e tests with NodeLocal DNSCache](https://k8s-testgrid.appspot.com/sig-network-gce#gci-gce-kube-dns-nodecache) are passing.
- Scalability tests with NodeLocal DNSCache enabled, verifying the [HA modes](https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/20190424-NodeLocalDNS-beta-proposal.md#design-details) as well as the regular mode.
- Have N clusters(number TBD) running in production with NodeLocal DNSCache enabled.

Copy link
Contributor

@sftim sftim Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to get the documentation updated and improved, so that there is a good-quality Task page, before GA.

I'd like to see docs clear enough that someone who has just qualified as CKA can follow that Task page without help, and without needing in to do research / fill in missing details.

At the moment I don't think https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/ meets that criterion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is an add-on, does it also make sense to document how to remove it? (documenting that would help encourage production use)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I submitted the change to docs on how to remove the addon - kubernetes/website@master...prameshj:patch-4

Can you help improve the tasks page with suggestions on what to add? Would adding cluster create commands/kubectl commands to check that node-local-dns pods are running and healthy... make it more useful?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to help with documenting this add-on (I've already opened kubernetes/website#17944). I'd want to be clear on intent here. A few things:

  1. What does it mean for this add-on to be GA? I think it means that its optional and that if you enable it, the Kubernetes project believes the add-on is finished.
  2. Is https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml a manifest or a template that can become a manifest? What kind of template?
  3. How can the reader verify that their changes worked as intended?

From the proposed changes:

KUBE_ENABLE_NODELOCAL_DNS=true kubetest --up

What are the prerequisites for that to work? (and are those explained to the reader?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prameshj, are you planning to make a pull request based on kubernetes/website@master...prameshj:patch-4 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this KEP, I think it's enough to add an extra line about providing clear / comprehensible documentation aimed at cluster operators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubernetes/website@master...prameshj:patch-4

I just did.. I thought I already had, thanks for the reminder!
kubernetes/website#18190

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to help with documenting this add-on (I've already opened kubernetes/website#17944). I'd want to be clear on intent here. A few things:

  1. What does it mean for this add-on to be GA? I think it means that its optional and that if you enable it, the Kubernetes project believes the add-on is finished.
  2. Is https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml a manifest or a template that can become a manifest? What kind of template?
  3. How can the reader verify that their changes worked as intended?
  1. IMO, Addon being GA means it is suitable to run in production.
  2. It is a template, where most of the PILLAR_ variables need to be substituted. Not sure what you are referring to, by kind of template.
  3. Changes can be verified by a) checking that node-local-dns pods are running b) checking the node-local-dns pods metrics to make sure request count is going up.

From the proposed changes:

KUBE_ENABLE_NODELOCAL_DNS=true kubetest --up

What are the prerequisites for that to work? (and are those explained to the reader?)
Good point. That command assumes you have a working kubernetes repo setup to build and run tests. I can include it in the other PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added documentation to the criteria.

@johnbelamaric
Copy link
Member

/assign

@@ -192,7 +192,13 @@ Having the pods query the nodelocal cache introduces a single point of failure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just had a thought: 169.254.20.10 is link-local. If Node-local DNS cache relies, by default, on Pods being able to route to 169.254.20.10 then that is a dependency on undefined / nonstandard behavior.
According to (my reading of) internet standards, 169.254.0.0/16 is only reachable for a client on the same link.

I recommend addressing that, somehow, before GA. If the decision is to go against the standard, let's record that explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link-local IP is attached to an interface on the host. You are right, it does not exist on each pod. Since all pod traffic does go through the host network and subject to iptables rules configured, this works. I am not aware of any CNI plugins where this would not work. From what i understand, only 127.0.00/8 traffic does not leave the pod.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A CNI plugin will need to route traffic to 169.254.20.10, but there's no standardized mechanism to make this happen (because 169.254.20.10 is link-local, and link-local addresses aren't routeable).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The packets just need to hit the hostNetwork since the link-local IP address exists on an interface in the node. This creates a route in the "local" routing table of the form:
"local 169.254.20.10 dev nodelocaldns proto kernel scope host src 169.254.20.10"

This is sufficient to deliver the packet to anything listening locally on that address. Isn't that true?

Also, if the k8s cluster is using kube-proxy in iptables mode, the pods can use either the link-local IP or the kube-dns service IP to access node-local-dns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local 169.254.20.10 dev nodelocaldns proto kernel scope host src 169.254.20.10

I'm really not keen on setting that in stone. Allocating from the link-local range like this and expecting routing to work, even within only one host, just doesn't seem like an approach that can scale.

It's different when a cloud provider does it at the network layer, because there's no concept of those packets being routed. There's a (virtual) link and it turns out that when the instance boots something else is defending 169.254.169.254 or whatever. That works fine.

Ideally, I can set up a Kubernetes cluster that puts all Pods on the same link-local network, and that's a valid supported configuration (provided I find a CNI that makes it happen). I realize that 169.254.0.0/16 looks like a convenient range to borrow an address from, and I can see why it's often going to be a pragmatic choice.

To be clear: my concern is about making & documenting 169.254.20.10 as a default value. If you have to provide an IPv4 address at deploy time and the documentation suggests picking an address inside 169.254.0.0/16 then that isn't a problem and doesn't set any problematic precedent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, if I understand correctly, we want to tweak the documentation to say:
"Selecting a listen ip for NodeLocal DNSCache from 169.254.20.10/16 is recommended, in our examples we will use 169.254.20.10". Is that correct? I have added this to the website doc PR - kubernetes/website#18716

We can continue the discussion there. From the GA graduation perspective, are the changes here ok to implement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Selecting a listen ip for NodeLocal DNSCache from 169.254.20.10/16 is recommended, in our examples we will use 169.254.20.10

(Retrospectively) I think you've taken an appropriate and pragmatic tack. Thanks.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 8, 2020
@prameshj
Copy link
Contributor Author

/assign @bowei @johnbelamaric @thockin

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good criteria, thanks for updating.

Do we have/need a long-term plan to keep the CoreDNS version updated?

Do we have an easy-to-run benchmark test that shows the value of cache over no-cache?

@prameshj
Copy link
Contributor Author

Good criteria, thanks for updating.

Do we have/need a long-term plan to keep the CoreDNS version updated?

Currently, the update is on a need basis. It would be a good idea to update when there is a major bugfix/additional functionality.

Do we have an easy-to-run benchmark test that shows the value of cache over no-cache?
We have https://github.com/kubernetes/perf-tests/tree/master/dns
these are not long-running by default and lookup only a few hostname. Enhancing this is part of the scale test improvements listed in the GA criteria.

@prameshj
Copy link
Contributor Author

I have updated the criteria based on the review comments. PTAl @thockin @johnbelamaric @bowei @sftim

@sftim
Copy link
Contributor

sftim commented Jan 22, 2020

I'm still dubious about taking 169.254.20.10 as a default to GA, because it seems to me like a misuse of the link-local IPv4 address space.

@prameshj
Copy link
Contributor Author

I'm still dubious about taking 169.254.20.10 as a default to GA, because it seems to me like a misuse of the link-local IPv4 address space.

We are not mentioning 169.254.20.10 as a default in this KEP and in the docs, we will be mentioning the 169.254.0.0/16 as a potential space to pick the listen ip from. Do you have suggestions for what else to do here? Also, just to be clear, this feature is not going to be enabled by default, in GA.

Copy link
Member

@johnbelamaric johnbelamaric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

keps/sig-network/0030-nodelocal-dns-cache.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 22, 2020
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 22, 2020
@johnbelamaric
Copy link
Member

@sftim I don't think that concern should block this KEP change. It can be taken up again in the actual docs PR if you're still concerned about the language there.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 22, 2020
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric, prameshj, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 23, 2020
@k8s-ci-robot k8s-ci-robot merged commit 74e610b into kubernetes:master Jan 23, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Jan 23, 2020
@sftim
Copy link
Contributor

sftim commented Jan 23, 2020

@prameshj thanks for clarifying - #1351 (comment) makes sense to me.
If the sample manifests and documentation only suggest using a link-local address and leave that up to the cluster administrator to implement, I've no concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/network Categorizes an issue or PR as relevant to SIG Network. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants