Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CoreDNS as feature in kubeadm #52501

Merged
merged 1 commit into from
Nov 8, 2017
Merged

Add CoreDNS as feature in kubeadm #52501

merged 1 commit into from
Nov 8, 2017

Conversation

rajansandeep
Copy link
Contributor

@rajansandeep rajansandeep commented Sep 14, 2017

What this PR does / why we need it:
This PR adds CoreDNS as a DNS plugin via the feature-gate option in Kubeadm init.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged):
Fixes kubernetes/kubeadm#446

Special notes for your reviewer:

Release note:

kubeadm: Add an experimental mode to deploy CoreDNS instead of KubeDNS

/cc @johnbelamaric

@k8s-ci-robot
Copy link
Contributor

@rajansandeep: GitHub didn't allow me to request PR reviews from the following users: johnbelamaric.

Note that only kubernetes members can review this PR, and authors cannot review their own PRs.

In response to this:

What this PR does / why we need it:
This PR adds CoreDNS as a DNS plugin via the feature-gate option in Kubeadm init.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged):
Fixes kubernetes/enhancements#427

Special notes for your reviewer:

Release note:

/cc @johnbelamaric

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 14, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @rajansandeep. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 14, 2017
@k8s-github-robot k8s-github-robot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Sep 14, 2017
@rajansandeep
Copy link
Contributor Author

@luxas
Copy link
Member

luxas commented Sep 15, 2017

/ok-to-test
/release-note

cc @kubernetes/sig-cluster-lifecycle-pr-reviews
/assign

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 15, 2017
@luxas
Copy link
Member

luxas commented Sep 15, 2017

(I'll take a look at this later when I get to it)

Thanks for this PR!!

Copy link
Contributor

@mattmoyer mattmoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great to see, thanks! I left a few comments.

I have one question for @luxas or someone else in @kubernetes/sig-cluster-lifecycle-feature-requests: do we want to create a way to switch from kube-dns to CoreDNS in a running cluster, or is that out of scope for kubeadm? It feels similar to upgrade but not quite the same.

effect: NoSchedule
containers:
- name: coredns
image: coredns/coredns:{{ .Version }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to add {{ .ImageRepository }} and {{.Arch}} here, similar to the existing kube-dns manifests.

I think this probably also means mirroring CoreDNS into the gcr.io/google_containers registry. Is this an option? I'm not sure who owns that decision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coredns/coredns should be built as a manifest list (ref: k8s multiarch proposal)

Then we don't need to pass {{ .Arch }} at all, docker will just pull the right variant of the image.

Reach out to me if you want to know how to build a manifest list: https://docs.docker.com/registry/spec/manifest-v2-2/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @rajansandeep @johnbelamaric on the manifest list building

name: coredns
items:
- key: Corefile
path: Corefile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add multi-architecture support above, we need the beta.kubernetes.io/arch nodeAffinity selector here to make sure it schedules to an appropriate node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed anymore as the latest image is a manifest list

errors
log stdout
health
kubernetes {{ .DNSDomain }} {{ .Servicecidr }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe s/Servicecidr/ServiceCIDR/ for consistency?

return nil
}

//convSubnet fetches the servicecidr and modifies the mask to the nearest class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should error out when the configured CIDR doesn't match a full class instead of trying to fix it. Any idea how kube-dns handles this case?

Could this lead to answering reverse DNS queries with the wrong response if they were for IPs that were part of the service subnet, but not in the nearest class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default is a /12 so it would error out by default 😉

This is a general DNS problem. We're looking at some solutions here coredns/coredns#1074

kube-dns today captures ALL PTRs. Leading to things like kubernetes/dns#124 and (I think) the ability to hijack the PTR of any IP in the world.

Even if we nail it down to just the service CIDR, we still have problems with PTR and manually added endpoints in K8s, especially in a multi-tenant deployment.


// GetCoreDNSVersion returns the right CoreDNS version for a specific k8s version
func GetCoreDNSVersion(kubeVersion *version.Version) string {
// v1.7.0+ uses CoreDNS-011, just return that here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be v1.8.0+ (here and again below).

@luxas
Copy link
Member

luxas commented Sep 15, 2017

do we want to create a way to switch from kube-dns to CoreDNS in a running cluster, or is that out of scope for kubeadm? It feels similar to upgrade but not quite the same.

Possibly in phases/upgrade/postupgrade.go... If the feature gate is enabled and if we think that's worth the effort.

Thanks for the review @mattmoyer!

@timothysc
Copy link
Member

Please squash the PRs. Given that @luxas and @mattmoyer are on it I'll defer to them.

@timothysc timothysc removed their assignment Sep 18, 2017
@timothysc timothysc added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 18, 2017
errors
log stdout
health
kubernetes {{ .DNSDomain }} {{ .Servicecidr }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should prometheus be added in as part of the default to align with the rest of Kubernetes components?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that makes sense

@amitkumarj441 amitkumarj441 mentioned this pull request Sep 24, 2017
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 24, 2017
if err != nil {
log.Fatal(err)
}
servicecidr = ipv4Net.String()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be good to test that functionality with IPv6 as well ?
IPv6 is coming soon anyway, doesn't make sense to write IPv4 code only.

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 3, 2017
Copy link
Member

@luxas luxas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some quick feedback.
We have a couple of open questions still:

  • Should this be seen as a "drop-in" replacement called kube-dns still or is it a "net-new" replacment for kube-dns?
  • Should RBAC ClusterRole & Binding be auto-bootstrapped?
  • Should the service still be called kube-dns?

Action items:

  • Make sure it's possible to "upgrade" any v1.8 cluster to using coredns. Make sure that kubeadm upgrade does the right things, shows the right text, removes the old kube-dns if not needed, etc.
  • The coredns image must be a manifest list. Please make the next release a manifest list so we can use it. Reach out to me if you need help with doing that.

@@ -382,11 +382,6 @@ func (i *Init) Run(out io.Writer) error {
return err
}

// Create/update RBAC rules that makes the nodes to rotate certificates and get their CSRs approved automatically
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this?

return err
}
} else {
if err := dnsaddonphase.EnsureDNSAddon(i.cfg, client); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this logic inside of EnsureDNSAddon? (i.e. the logic whether to use kube- or core-dns)

effect: NoSchedule
containers:
- name: coredns
image: coredns/coredns:{{ .Version }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coredns/coredns should be built as a manifest list (ref: k8s multiarch proposal)

Then we don't need to pass {{ .Arch }} at all, docker will just pull the right variant of the image.

Reach out to me if you want to know how to build a manifest list: https://docs.docker.com/registry/spec/manifest-v2-2/

protocol: TCP
`

//ConfigMap is the CoreDNS ConfigMap manifest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be a godoc comment

kind: ClusterRole
metadata:
labels:
kubernetes.io/bootstrapping: rbac-defaults
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not be here

@@ -38,3 +39,18 @@ func GetKubeDNSManifest(kubeVersion *version.Version) string {
// In the future when the kube-dns version is bumped at HEAD; add conditional logic to return the right manifest
return v170AndAboveKubeDNSDeployment
}

// GetCoreDNSVersion returns the right CoreDNS version for a specific k8s version
func GetCoreDNSVersion(kubeVersion *version.Version) string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the same functions -- don't create a new one. Instead, choose what version to return based on DNS provider.
This way the right values will be shown for upgrades as well.

k8s-app: coredns
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
scheduler.alpha.kubernetes.io/tolerations: '[{"key":"CriticalAddonsOnly", "operator":"Exists"}]'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This annotation isn't respected anymore, tolerations are now set on the PodSpec. See other manifests in cmd/kubeadm

@@ -127,6 +131,161 @@ func createKubeDNSAddon(deploymentBytes, serviceBytes []byte, client clientset.I
return nil
}

func EnsureCoreDNSAddon(cfg *kubeadmapi.MasterConfiguration, client clientset.Interface) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this private and call from EnsureDNSAddon

// CoreDNSService is the CoreDNS Service manifest
CoreDNSService = `
apiVersion: v1
kind: Service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did we think here? Would it be possible to keep the same kube-dns service?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @kubernetes/sig-network-pr-reviews should we keep the kube-dns name or not?
In some way, it would be cool to keep the name to signal "this is the DNS service for Kubernetes, regardless of implementation"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable to me


//convSubnet fetches the serviceCIDR and modifies the mask to the nearest class
//CoreDNS requires CIDR notations for reverse zones as classful.
func convSubnet(cidr string) (serviceCIDR string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit tests please

@k8s-ci-robot k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Oct 11, 2017
@k8s-github-robot k8s-github-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Nov 7, 2017
@rajansandeep
Copy link
Contributor Author

@luxas I fixed the golint errors. I am not able to verify the other error which is failing the test.

@kad
Copy link
Member

kad commented Nov 7, 2017

@rajansandeep you need to run hack/update-bazel.sh it will update cmd/kubeadm/app/phases/addons/dns/BUILD file, which you need to add to your squashed commit.

@rajansandeep
Copy link
Contributor Author

/test pull-kubernetes-unit

@fturib
Copy link

fturib commented Nov 7, 2017

/retest

@rajansandeep
Copy link
Contributor Author

Did the test flake? @luxas @kad

@fturib
Copy link

fturib commented Nov 7, 2017

Looks like the very same test were ok on the precedent launch (that was ignored because I did this /retest before the end). So it seems to be a flaky test .. let's retry once..

@fturib
Copy link

fturib commented Nov 7, 2017

/retest

@luxas
Copy link
Member

luxas commented Nov 7, 2017

/lgtm

Yes, flaked

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 7, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: luxas, rajansandeep

Associated issue: 427

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@rajansandeep
Copy link
Contributor Author

/retest

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@kad
Copy link
Member

kad commented Nov 8, 2017

bazel builds seems to be legitimate errors:

W1011 19:16:09.889] 2017/10/11 19:16:09 missing strict dependencies:
W1011 19:16:09.890] 	cmd/kubeadm/app/phases/addons/dns/dns.go: import of k8s.io/api/rbac/v1, which is not a direct dependency

and

W1023 21:46:32.637] ERROR: /go/src/k8s.io/kubernetes/cmd/kubeadm/app/phases/addons/dns/BUILD:28:1: GoCompile cmd/kubeadm/app/phases/addons/dns/~normal~go_default_library~/k8s.io/kubernetes/cmd/kubeadm/app/phases/addons/dns.a failed (Exit 1): compile failed: error executing command 
W1023 21:46:32.643] 2017/10/23 21:46:32 missing strict dependencies:
W1023 21:46:32.644] 	cmd/kubeadm/app/phases/addons/dns/dns.go: import of k8s.io/kubernetes/cmd/kubeadm/app/features, which is not a direct dependency
W1023 21:46:32.645] 	cmd/kubeadm/app/phases/addons/dns/versions.go: import of k8s.io/kubernetes/cmd/kubeadm/app/features, which is not a direct dependency

@rajansandeep
Copy link
Contributor Author

The BUILD file seems to already include the missing dependencies. Am I missing something?
Also, the dates in the error logs seem old.

@luxas
Copy link
Member

luxas commented Nov 8, 2017 via email

@rajansandeep
Copy link
Contributor Author

/retest

@rajansandeep
Copy link
Contributor Author

The command for retest isn't restarting the test.

@luxas
Copy link
Member

luxas commented Nov 8, 2017

/test all

@rajansandeep
Copy link
Contributor Author

/retest

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Nov 8, 2017

@rajansandeep: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-bazel 8902354 link /test pull-kubernetes-e2e-gce-bazel
pull-kubernetes-e2e-gce-gpu a64ce7b3a96a480a40ea47d23a9f39a72487cc59 link /test pull-kubernetes-e2e-gce-gpu

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@rajansandeep
Copy link
Contributor Author

/retest

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 54493, 52501, 55172, 54780, 54819). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit d42be07 into kubernetes:master Nov 8, 2017
@rajansandeep rajansandeep deleted the featurecoredns branch November 10, 2017 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an option to use CoreDNS instead of KubeDNS in v1.9