Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple TaintManager from NodeLifeCycleController (KEP-3902) #119208

Merged
merged 1 commit into from
Oct 30, 2023

Conversation

atosatto
Copy link
Contributor

@atosatto atosatto commented Jul 10, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

This is implementing KEP-3902 by introducing a feature-flag, SeparateTaintManager, allowing to de-couple taint-manager from node-lifecycle-controller.

Which issue(s) this PR fixes:

kubernetes/enhancements#3902

Special notes for your reviewer:

Unlike the KEP proposal, the flag used to disable/enable the controller in KCM has been named taint-eviction-controller in order to preserve the existing convention of having all of the controllers names ending with -controller. This has been preferred to the taint-manager-controller and/or taint-manager alternatives. For compatibility with the KEP a taint-manager legacy flag has been provided in the implementation.

Does this PR introduce a user-facing change?

Decouple TaintManager from NodeLifeCycleController (KEP-3902)

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 10, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @atosatto. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jul 10, 2023
@Huang-Wei
Copy link
Member

/ok-to-test

/sig scheduling
/sig node
/sig apps

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 10, 2023
@Huang-Wei
Copy link
Member

/milestone v1.28

@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone Jul 10, 2023
@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jul 10, 2023
@atosatto
Copy link
Contributor Author

atosatto commented Jul 10, 2023

/test pull-kubernetes-e2e-gce-cos-alpha-features

Copy link
Member

@Huang-Wei Huang-Wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. The logic looks good. Could you add some integration tests to cover that:

  • if the feature gate is enabled, we are able to disable taintmanager from controller manager
  • if the feature gate is disable, there is no way to disable taintmanager (taintmanger functions by default)

cmd/kube-controller-manager/app/core.go Outdated Show resolved Hide resolved
pkg/controller/taintmanager/taint_manager.go Outdated Show resolved Hide resolved
pkg/features/kube_features.go Outdated Show resolved Hide resolved
pkg/features/kube_features.go Outdated Show resolved Hide resolved
pkg/controller/taintmanager/OWNERS Outdated Show resolved Hide resolved
pkg/controller/nodelifecycle/node_lifecycle_controller.go Outdated Show resolved Hide resolved
@zelenushechka
Copy link
Member

Hey, I'm Anhelina from the Bug Triage Release team 1.28 👋. I wanted to reach out to discuss its progress and ensure it's on track for the v1.28 release, considering the Code Freeze is coming up on 01:00 UTC Wednesday 19th July 2023 / 18:00 PDT Tuesday 18th July 2023. Could you please provide an update on the current status of the issue? Is it planned for the v1.28 release? We are currently in Week 9, and Code Freeze is just over a week away. If there's anything I can do, please let me know:)

@k8s-ci-robot k8s-ci-robot added do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. labels Jul 11, 2023
Copy link
Contributor

@k8s-ci-robot k8s-ci-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atosatto: 1 invalid OWNERS file

In response to this:

What type of PR is this?

/kind feature

What this PR does / why we need it:

This is implementing KEP-3902 by introducing a feature-flag, SeparateTaintManager, allowing to de-couple taint-manager from node-lifecycle-controller.

Which issue(s) this PR fixes:

kubernetes/enhancements#3902

Special notes for your reviewer:

Unlike the KEP proposal, the flag used to disable/enable the controller in KCM has been named taint-eviction-controller in order to preserve the existing convention of having all of the controllers names ending with -controller. This has been preferred to the taint-manager-controller and/or taint-manager alternatives. For compatibility with the KEP a taint-manager legacy flag has been provided in the implementation.

Does this PR introduce a user-facing change?

Decouple TaintManager from NodeLifeCycleController (KEP-3902)

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pkg/controller/taintmanager/OWNERS Outdated Show resolved Hide resolved
@atosatto
Copy link
Contributor Author

atosatto commented Jul 11, 2023

Thanks for the PR. The logic looks good. Could you add some integration tests to cover that:

  • if the feature gate is enabled, we are able to disable taintmanager from controller manager
  • if the feature gate is disable, there is no way to disable taintmanager (taintmanger functions by default)

I've added some specific tests covering the controller registration in e9b28ea8923 which should cover this from the point of view of the kube-controller-manager registered controllers. Is this the scenario you had in mind? If not I am happy to add more tests.

@Huang-Wei
Copy link
Member

/retest

Copy link
Member

@yuanchen8911 yuanchen8911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 26, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: f4a6fbd596ab190095e7a47e7a630f1b0e03b491

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 29, 2023
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 30, 2023
@atosatto atosatto force-pushed the separate-taint-manager branch 2 times, most recently from 2f323d4 to 62cde68 Compare October 30, 2023 12:16
Copy link
Member

@atiratree atiratree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the last minute changes to the controllers registration in #120371.

Feature gating approach has changed a needs to be updated. Otherwise LGTM

@@ -556,6 +557,10 @@ func NewControllerDescriptors() map[string]*ControllerDescriptor {
register(newResourceClaimControllerDescriptor())
register(newLegacyServiceAccountTokenCleanerControllerDescriptor())
register(newValidatingAdmissionPolicyStatusControllerDescriptor())
if utilfeature.DefaultFeatureGate.Enabled(features.SeparateTaintEvictionController) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with #120371 there is no need for defining the gating here anymore, since the gating should be automatic (taken from the descriptor)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atiratree not quite, this controller is on by default, since it's beta, so this check ensures with the feature gate off it's not turned on. Not sure how to handle that with your controller initiation changes. So I think we can leave it as is, for now, and fix in a followup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check has been moved to

for _, featureGate := range controllerDescriptor.GetRequiredFeatureGates() {
if !utilfeature.DefaultFeatureGate.Enabled(featureGate) {
logger.Info("Controller is disabled by a feature gate", "controller", controllerName, "requiredFeatureGates", controllerDescriptor.GetRequiredFeatureGates())
return nil, nil
}
}

and it should work properly even when it is beta and on by default and then turned off by the user.

So it is redundant to have it in this place as well. But I don't mind merging this PR first and fixing this later.

fyi, I am trying to enforce this in #121611

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, but you remove a controller explicitly, but when you also separately have a feature gate, which allows disabling the split, what then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requiredFeatureGates field is meant as a replacement of this logic and the other gated controllers were ported to it without any issue. It is possible to gate a controller with multiple feature gates as you can see for example in StorageVersionGarbageCollectorController.

I made the simplest implementation I could. The implementation can be changed in the future if more custom decision logic (not just AND) is required. Since this controller has just one, it should be okay in this situation as well.

The reason behind this is simple, we want to reason about controllers without running them and have the gating information available declaratively. For example we want to show available controllers in kube-controller-manager --help. We can then also show the gating information there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of warned this was going to happen #120371 (comment) 🙃

Copy link
Member

@atiratree atiratree Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the #121611 PR and included a test to show that the gating works properly with this new controller. Please let me know which part of the test you think is insufficient. Or if you think it would be a good idea to implement a generic solution for future controllers that would require more complex gating.


// TestTaintEvictionControllerDeclaration ensures that it is possible to run taint-manager as a separated controller
// only when the SeparateTaintEvictionController feature is enabled
func TestTaintEvictionControllerDeclaration(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this test now, or just check the presence of descriptor.requiredFeatureGates if you really want to keep it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing as the other comment.

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@@ -556,6 +557,10 @@ func NewControllerDescriptors() map[string]*ControllerDescriptor {
register(newResourceClaimControllerDescriptor())
register(newLegacyServiceAccountTokenCleanerControllerDescriptor())
register(newValidatingAdmissionPolicyStatusControllerDescriptor())
if utilfeature.DefaultFeatureGate.Enabled(features.SeparateTaintEvictionController) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atiratree not quite, this controller is on by default, since it's beta, so this check ensures with the feature gate off it's not turned on. Not sure how to handle that with your controller initiation changes. So I think we can leave it as is, for now, and fix in a followup.


// TestTaintEvictionControllerDeclaration ensures that it is possible to run taint-manager as a separated controller
// only when the SeparateTaintEvictionController feature is enabled
func TestTaintEvictionControllerDeclaration(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing as the other comment.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 30, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 17798ff8a66e3c744af8fdb388a29ff4bdc64c19

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: atosatto, Huang-Wei, soltysh, yuanchen8911

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 30, 2023
@soltysh soltysh removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 30, 2023
@soltysh
Copy link
Contributor

soltysh commented Oct 30, 2023

/milestone v1.29

@k8s-ci-robot k8s-ci-robot modified the milestones: next-candidate, v1.29 Oct 30, 2023
@k8s-ci-robot k8s-ci-robot merged commit e421287 into kubernetes:master Oct 30, 2023
16 checks passed
SIG Node PR Triage automation moved this from Waiting on Author to Done Oct 30, 2023
@atosatto atosatto deleted the separate-taint-manager branch October 31, 2023 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/stable-metrics Issues or PRs involving stable metrics area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Archived in project
Archived in project
Archived in project
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet