New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix handling of NoExecute taint when PodDisruptionConditions is enabled #112518
Fix handling of NoExecute taint when PodDisruptionConditions is enabled #112518
Conversation
Skipping CI for Draft Pull Request. |
@mimowo: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Initial investigation, after adding a generic error message suggests this is a permissions issue: |
dba0c00
to
f50f4a7
Compare
16b3e68
to
880d15d
Compare
880d15d
to
458b2b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/retest |
dcb21a4
to
b7f7cc4
Compare
/retest |
@alculquicondor please renew LGTM, no changes in this PR, but I rebased as there were some unrelated tests failing - don't know why, but rebase helped. |
/lgtm |
/assign @liggitt |
low priority, because it's alpha, but we should. |
@@ -192,19 +196,46 @@ func TestBootstrapClusterRoleBindings(t *testing.T) { | |||
} | |||
|
|||
func TestBootstrapControllerRoles(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert the changes to this test, I don't think we need/want a proliferation of fixtures for every feature gate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, reverted the unit test changes. Will rely on manual testing + e2e testing once in Beta.
@alculquicondor please renew the LGTM if you are ok with the change.
Improve error logging from timed workers which are used for pod eviction Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
b7f7cc4
to
bb561e0
Compare
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liggitt, mimowo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #112517
The issue is triggered by this GET (protected by the
PodDisruptionConditions
feature gate):kubernetes/pkg/controller/nodelifecycle/scheduler/taint_manager.go
Line 124 in a4ad8f4
What is more the error was not logged as it was propagated up to here:
kubernetes/pkg/controller/nodelifecycle/scheduler/timed_workers.go
Line 59 in a4ad8f4
Thus, as a part of this PR I propose to make sure errors from the functions invoked by timed_workers are logged.
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Yes, it fix that pods running on nodes tainted with NoExecute continue to run when the PodDisruptionConditions feature gate is enabled
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: