Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of NoExecute taint when PodDisruptionConditions is enabled #112518

Merged
merged 1 commit into from Sep 23, 2022

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Sep 16, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #112517

The issue is triggered by this GET (protected by the PodDisruptionConditions feature gate):

pod, err := c.CoreV1().Pods(ns).Get(ctx, name, metav1.GetOptions{})

What is more the error was not logged as it was propagated up to here:

timer := clock.AfterFunc(delay, func() { f(ctx, args) })

Thus, as a part of this PR I propose to make sure errors from the functions invoked by timed_workers are logged.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Yes, it fix that pods running on nodes tainted with NoExecute continue to run when the PodDisruptionConditions feature gate is enabled

Fix that pods running on nodes tainted with NoExecute continue to run when the PodDisruptionConditions feature gate is enabled

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 16, 2022
@k8s-ci-robot
Copy link
Contributor

@mimowo: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 16, 2022
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Sep 16, 2022
@mimowo
Copy link
Contributor Author

mimowo commented Sep 16, 2022

Initial investigation, after adding a generic error message suggests this is a permissions issue:
I0916 16:04:35.577007 1 timed_workers.go:62] "Timed worker function failed. " err="pods "job-longrun-c7jcp" is forbidden: User "system:serviceaccount:kube-system:node-controller" cannot get resource "pods" in API group "" in the namespace "default""

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. sig/auth Categorizes an issue or PR as relevant to SIG Auth. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Sep 19, 2022
@mimowo mimowo force-pushed the fix-disruption-conditions branch 2 times, most recently from dba0c00 to f50f4a7 Compare September 19, 2022 10:47
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 19, 2022
@mimowo mimowo force-pushed the fix-disruption-conditions branch 2 times, most recently from 16b3e68 to 880d15d Compare September 19, 2022 12:04
@mimowo mimowo marked this pull request as ready for review September 19, 2022 12:04
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 19, 2022
Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 22, 2022
@mimowo
Copy link
Contributor Author

mimowo commented Sep 23, 2022

/retest

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 23, 2022
@mimowo
Copy link
Contributor Author

mimowo commented Sep 23, 2022

/retest

@mimowo
Copy link
Contributor Author

mimowo commented Sep 23, 2022

@alculquicondor please renew LGTM, no changes in this PR, but I rebased as there were some unrelated tests failing - don't know why, but rebase helped.

@alculquicondor
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 23, 2022
@mimowo
Copy link
Contributor Author

mimowo commented Sep 23, 2022

/assign @liggitt
Should we also cherry-pick it for 1.25?

@alculquicondor
Copy link
Member

low priority, because it's alpha, but we should.

@@ -192,19 +196,46 @@ func TestBootstrapClusterRoleBindings(t *testing.T) {
}

func TestBootstrapControllerRoles(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert the changes to this test, I don't think we need/want a proliferation of fixtures for every feature gate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, reverted the unit test changes. Will rely on manual testing + e2e testing once in Beta.
@alculquicondor please renew the LGTM if you are ok with the change.

Improve error logging from timed workers which are used for pod eviction

Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 23, 2022
@liggitt
Copy link
Member

liggitt commented Sep 23, 2022

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 23, 2022
@alculquicondor
Copy link
Member

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When PodDIsruptionConditions enabled pods are not terminated based on the NoExecute taint
4 participants