Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bind coredns containers to linux nodes to avoid Windows scheduling #69940

Merged
merged 1 commit into from
Jan 23, 2019
Merged

Conversation

MarcPow
Copy link
Contributor

@MarcPow MarcPow commented Oct 17, 2018

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #69773

Special notes for your reviewer:

This prevents scheduling attempts that are sure to fail on Windows agents. Given the current set of Windows issues, scheduling attempts may sometimes hang and prevent DNS from coming up cleanly in a cluster.

Release note:

CoreDNS is only officially supported on Linux at this time.  As such, when kubeadm is used to deploy this component into your kubernetes cluster, it will be restricted (using nodeSelectors) to run only on nodes with that operating system. This ensures that in clusters which include Windows nodes, the scheduler will not ever attempt to place CoreDNS pods on these machines, reducing setup latency and enhancing initial cluster stability.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 17, 2018
@MarcPow
Copy link
Contributor Author

MarcPow commented Oct 18, 2018

/kind bug

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Oct 18, 2018
@MarcPow
Copy link
Contributor Author

MarcPow commented Oct 18, 2018

/sig Windows

This is a micro-improvement to Windows clusters at startup time.

@k8s-ci-robot k8s-ci-robot added the sig/windows Categorizes an issue or PR as relevant to SIG Windows. label Oct 18, 2018
@MrHohn
Copy link
Member

MrHohn commented Oct 18, 2018

@chrisohaver @rajansandeep do you mind taking a look? Thanks.

@chrisohaver
Copy link
Contributor

Is CoreDNS the only core k8s component that cant run on windows?

@MarcPow
Copy link
Contributor Author

MarcPow commented Oct 18, 2018

Is CoreDNS the only core k8s component that cant run on windows?

I don't have sufficient expertise (yet) to comment on that. I just happened to get bitten by this one when setting a cluster up on a slightly older environment in Azure (where a combination of [seperately filed] bugs in the Docker EE + Azure CNI stacks led to the container to get 'stuck' on a particular node), leaving the cluster without DNS.

Copy link
Contributor

@chrisohaver chrisohaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. CoreDNS can't run on windows nodes currently.

The change should be proposed in other places too...

  • kubernetes/kubeadm (has it's own copy of the coredns manifests)
  • coredns/deployment (intended to be the authoritative k8s deployment default configuration)

@rajansandeep
Copy link
Contributor

rajansandeep commented Oct 19, 2018

The change should be proposed in other places too...

@chrisohaver I'll update them in other install tools and in coredns/deployment
/ok-to-test

@k8s-ci-robot k8s-ci-robot added area/kubeadm and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 19, 2018
@MarcPow
Copy link
Contributor Author

MarcPow commented Oct 21, 2018

/retest

@MarcPow
Copy link
Contributor Author

MarcPow commented Oct 21, 2018

Test failures related to pre-existing flaky test: #66542

@MrHohn
Copy link
Member

MrHohn commented Oct 22, 2018

/approve

@MarcPow
Copy link
Contributor Author

MarcPow commented Oct 22, 2018

/assign @luxas
/assign @fabriziopandini

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2018
@timothysc
Copy link
Member

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 14, 2018
@@ -237,6 +237,8 @@ spec:
operator: Exists
- key: {{ .MasterTaintKey }}
effect: NoSchedule
nodeSelector:
beta.kubernetes.io/os: linux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the docs that outline this is how we are planning todo hostOS steering? It's weird to use labels for things that need to be enforced.

Copy link
Contributor Author

@MarcPow MarcPow Nov 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The os node labels are automatically injected.

But if you're suggesting this should be less manual, that's a fair point - I'd love to see the scheduler to inspect the relevant images and dynamically impose these constraints. But in the absence of such a feature, this seems like a reasonable place to start.

Copy link
Member

@neolit123 neolit123 Nov 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably best to have an issue in the k/kubeadm repo about this and add a TODO comment in the code if this change is to make it in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brendanburns - ping! What's the policy you are using for denoting windows nodes? and is there a KEP somewhere?

/cc @jbeda

@chrisohaver
Copy link
Contributor

chrisohaver commented Nov 14, 2018

CoreDNS does not work on Windows yet

To clarify, CoreDNS itself works fine on Windows. But we don't compile a windows executable into the CoreDNS docker manifest (AFAIK neither does kube-dns, so we probably want to add the node selector there too).

As I understand it, there are some hurdles in generally getting K8s cluster DNS on Windows (not coredns specific). A significant one is that Windows systems don't have an /etc/resolv.conf, and kubelet is geared toward assuming all pods do. So there needs to be some way of informing windows Pods of the DNS policy.

@neolit123
Copy link
Member

neolit123 commented Nov 14, 2018

To clarify, CoreDNS itself works fine on Windows.

there is a new WG in k8s that is currently discussing how multi-platform images are to be build and tagged.

@chrisohaver
Copy link
Contributor

IMO, If it makes sense to add this node selector for the CoreDNS deployment manifests, it should also be added to the kube-dns deployment manifests. AFAIK, both have the same issue.

@timothysc timothysc removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2018
@timothysc
Copy link
Member

I'm taking off approval, we don't have well defined policies on heterogeneous clustering that I'm aware of.

@daschott
Copy link
Contributor

There are no core-DNS pods based off of windowsservercore container images, so I do think we need a way to schedule to the right OS. The current thoughts are captured here:
https://docs.google.com/document/d/1XLs8Mbz1-xOIiDW9XSSuhx9fshpxJM1NDD1a0oVbzfc/edit

and I'm sure it will be discussed again in future SIG-Windows meetings.

@neolit123
Copy link
Member

what is preventing the workaround of updating the deployment with amended node selector toleration?
in kubeadm we had reports that CNCF clusters work with this approach, but it was a slightly different case.

making a node selector OS bound makes sense only as a temporary measure.
cc @kubernetes/sig-windows-misc

@timothysc
Copy link
Member

@daschott This requires a KEP and affects multiple SIGs.
There needs to be:

  • KEP outlining long term strategy
  • Buy-in from both sig-cluster-lifecycle and sig-node
  • Test automation!!!!
  • etc.

@michmike
Copy link
Contributor

cc @PatrickLang for our RuntimePolicy discussion

@PatrickLang
Copy link
Contributor

@timothysc What should the KEP be titled? I'm trying to understand the scope of what you're looking for.

@neolit123
Copy link
Member

going back to @chrisohaver 's comment.

As I understand it, there are some hurdles in generally getting K8s cluster DNS on Windows (not coredns specific). A significant one is that Windows systems don't have an /etc/resolv.conf, and kubelet is geared toward assuming all pods do. So there needs to be some way of informing windows Pods of the DNS policy.

it feels to me that we might want to get a DNS solution working on Windows eventually.
node selectors seem like a temporary solution.

@PatrickLang
Copy link
Contributor

Dns works on Windows pods and is a function of CNI doing the right thing.

CoreDns is used as a Kubernetes service. Windows pods can simply do dns lookups using that service IP and it will route to a Linux node running a matching pod for that service. Because CoreDns doesn't have a Windows container published, it needs the nodeselector.

@PatrickLang
Copy link
Contributor

cc @yujuhong I discussed with SIG-Node today, and they don't have any concern with this as a point in time solution. NodeSelectors are widely used, and the labels needed in this PR are stable as of v1.14 #73048.

Long term, we could remove this node selector if one of the following is done

  1. CoreDNS is published as a multi-arch, multi-OS image
  2. Scheduler is updated to automatically pick nodes based on OS/arch (or other criteria) without NodeSelector. This could happen on its own, or if RuntimeClass is updated to factor in os/arch

Copy link
Member

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed with SIG-Node today, and they don't have any concern with this as a point in time solution. NodeSelectors are widely used, and the labels needed in this PR are stable as of v1.14 #73048.

Long term, we could remove this node selector if one of the following is done

CoreDNS is published as a multi-arch, multi-OS image
Scheduler is updated to automatically pick nodes based on OS/arch (or other criteria) without NodeSelector. This could happen on its own, or if RuntimeClass is updated to factor in os/arch

SGTM,
will leave the approval to @timothysc

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: MarcPow, MrHohn, timothysc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 23, 2019
@michmike
Copy link
Contributor

@timothysc could you please remove the hold so that it can be merged in?

@neolit123
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 23, 2019
@PatrickLang
Copy link
Contributor

/test pull-kubernetes-godeps

@k8s-ci-robot k8s-ci-robot merged commit baaaa15 into kubernetes:master Jan 23, 2019
@k8s-ci-robot
Copy link
Contributor

@MarcPow: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-kops-aws eb818f9 link /test pull-kubernetes-e2e-kops-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Official coredns should be bound to nodeSelector: beta.kubernetes.io/os=linux