Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent nodes from updating taints #63167

Merged
merged 1 commit into from
May 16, 2018

Conversation

liggitt
Copy link
Member

@liggitt liggitt commented Apr 25, 2018

Prevents kubelets from modifying or removing taints on update.

Nodes can set taints when they register themselves, but do not update/remove those taints after creation (that is done by the node controller based on reported node conditions).

xref kubernetes/community#911 kubernetes/enhancements#279

/sig node
/sig auth
/sig scheduling

/assign @mikedanese @k82cn

The NodeRestriction admission plugin now prevents kubelets from modifying/removing taints applied to their Node API object.

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 25, 2018
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 25, 2018
@tallclair
Copy link
Member

/cc @vishh
xref #62747

@vishh
Copy link
Contributor

vishh commented Apr 25, 2018

Aren't existing conditions expected to be transitioned to taints in which case k'let will place and remove taints?
I have use cases around bootstrapping and handling termination where the current plan is to re-use k'let's credentials.
Why is this restriction necessary especially given that taints are yet to be fully adopted even in core?

@liggitt
Copy link
Member Author

liggitt commented Apr 25, 2018

Aren't existing conditions expected to be transitioned to taints

They already are, but are translated by the node controller, not set directly by the kubelet (which works really well to allow a limited set of conditions to affect taints without giving the kubelet full control over them)

Why is this restriction necessary especially given that taints are yet to be fully adopted even in core?

Because kubelets removing/modifying their own taints removes a primary tool an administrator has for preventing nodes from accepting certain workloads (imagine a "compromised" or "unverified" taint set on a node intended to keep the scheduler from sending workloads to it)

See kubernetes/community#911 for a fuller discussion of the issues with letting kubelets steer workloads via self-labeling or self-untainting

@liggitt
Copy link
Member Author

liggitt commented Apr 25, 2018

cc @kubernetes/sig-auth-pr-reviews @kubernetes/sig-node-pr-reviews @kubernetes/sig-scheduling-pr-reviews

@k82cn
Copy link
Member

k82cn commented Apr 26, 2018

The diff is ok to me :) There's a discussion (#62311) on replacing node condition with taints, will we add white list for node conditions?

@aveshagarwal
Copy link
Member

Would it impact these scenario (cloud specific taint being applied on the fly/runtime): https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L1777 ?

@liggitt
Copy link
Member Author

liggitt commented Apr 26, 2018

Would it impact these scenario (cloud specific taint being applied on the fly/runtime): https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L1777 ?

No, that is only called from AttachDisk, which has been done by the attach/detach controller, not the kubelet, since 1.3

xref #55517

@liggitt
Copy link
Member Author

liggitt commented Apr 26, 2018

There's a discussion (#62311) on replacing node condition with taints, will we add white list for node conditions?

If we want the kubelet to start setting/updating/removing taints on itself directly, we can discuss the pros/cons of whitelisting specific taints at that point. I could see whitelisting the set that is automatically translated from node-reported conditions->taints, but would want that to be an intentional decision.

@k82cn
Copy link
Member

k82cn commented Apr 26, 2018

There's a discussion (#62311) on replacing node condition with taints, will we add white list for node conditions?

If we want the kubelet to start setting/updating/removing taints on itself directly, we can discuss the pros/cons of whitelisting specific taints at that point. I could see whitelisting the set that is automatically translated from node-reported conditions->taints, but would want that to be an intentional decision.

That's ok to me, and PR is LGTM; waiting for mikedanese@'s comments if any :)

@mikedanese
Copy link
Member

I would like to restrict this since nothing uses it, and loosen the restriction as needed going forward. @vishh has a reasonable usecase. How should a node agent out of core request evacuation? Does the node agent need an accompanying controller that understands a new node condition?

@liggitt
Copy link
Member Author

liggitt commented Apr 30, 2018

I would like to restrict this since nothing uses it, and loosen the restriction as needed going forward.

I agree.

@vishh has a reasonable usecase. How should a node agent out of core request evacuation? Does the node agent need an accompanying controller that understands a new node condition?

I could see any of the following (roughly in order of preference):

  • running a separate controller for managing a custom condition<->taint (part of the custom node agent setup)
  • deciding on a well-known key or prefix for taints that nodes would be allowed to manage themselves
  • options to the existing node controller to tell it additional custom conditions to translate into taints
  • options to the admission plugin to configure specific taints to allow the node to manage

@luxas
Copy link
Member

luxas commented May 1, 2018

cc @kubernetes/sig-cluster-lifecycle-pr-reviews

@k8s-ci-robot k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label May 1, 2018
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 10, 2018
@mikedanese
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 11, 2018
@mikedanese
Copy link
Member

mikedanese commented May 11, 2018

Let's get this in and relax constraints when we need to.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 15, 2018
@liggitt
Copy link
Member Author

liggitt commented May 15, 2018

rebased

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 15, 2018
@ericchiang
Copy link
Contributor

Download error for bazel build

W0515 17:57:42.467] ERROR: /bazel-scratch/.cache/bazel/_bazel_root/e9f728bbd90b3fba632eb31b20e1dacd/external/io_bazel_rules_go/BUILD.bazel:9:1: GoStdlib external/io_bazel_rules_go/linux_amd64_stripped/stdlib~/pkg failed: Failed to delete output files after incomplete download. Cannot continue with local execution.: /bazel-scratch/.cache/bazel/_bazel_root/e9f728bbd90b3fba632eb31b20e1dacd/execroot/main/bazel-out/host/bin/external/io_bazel_rules_go/linux_amd64_stripped/stdlib~/pkg (Directory not empty)

/retest

@neolit123
Copy link
Member

/test pull-kubernetes-e2e-kops-aws

@k82cn
Copy link
Member

k82cn commented May 16, 2018

running a separate controller for managing a custom condition<->taint (part of the custom node agent setup)

Separate controller manager has race condition issue between condition <-> taints :( , well-known key or prefix for taints makes sense to me.

xref #63897

@k82cn
Copy link
Member

k82cn commented May 16, 2018

/lgtm

help to re-lgtm :)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 16, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: k82cn, liggitt, mikedanese

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 63167, 63357). If you want to cherry-pick this change to another branch, please follow the instructions here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet