Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-9274: canary: Tolerate infra node NoExecute taint #932

Conversation

Miciah
Copy link
Contributor

@Miciah Miciah commented May 18, 2023

Configure the canary daemonset to tolerate a "node-role.kubernetes.io/infra" node taint that specifies the NoExecute effect (or any other effect).

Before this change, the canary daemonset tolerated the "node-role.kubernetes.io/infra" node taint only if it specified the NoSchedule effect. However, some cluster admins taint infra nodes using both NoSchedule as well as NoExecute, in which case canary pods would not be scheduled on these nodes. This inconsistency caused some confusion. With this change, canary pods should be scheduled on infra nodes whether or not they are tainted using the NoExecute effect.

  • assets/canary/daemonset.yaml: Tolerate the "node-role.kubernetes.io/infra" taint irrespective of the specified effect.
  • pkg/manifests/bindata.go: Regenerate.
  • pkg/operator/controller/canary/daemonset_test.go (TestDesiredCanaryDaemonSet): Expect Effect to be unspecified on the toleration.
    (TestCanaryDaemonsetChanged): Add a test case to verify that canaryDaemonSetChanged correctly detects the change and updates the daemonset if the specified effect changes on the toleration.

Configure the canary daemonset to tolerate a
"node-role.kubernetes.io/infra" node taint that specifies the "NoExecute"
effect (or any other effect).

Before this commit, the canary daemonset tolerated the
"node-role.kubernetes.io/infra" node taint only if it specified the
"NoSchedule" effect.  However, some cluster admins taint infra nodes using
both "NoSchedule" as well as "NoExecute", in which case canary pods would
not be scheduled on these nodes.  This inconsistency caused some confusion.
With this commit, canary pods should be scheduled on infra nodes whether or
not they are tainted using the "NoExecute" effect.

This commit fixes OCPBUGS-9274.

https://issues.redhat.com/browse/OCPBUGS-9274

* assets/canary/daemonset.yaml: Tolerate the
"node-role.kubernetes.io/infra" taint irrespective of the specified effect.
* pkg/manifests/bindata.go: Regenerate.
* pkg/operator/controller/canary/daemonset_test.go
(TestDesiredCanaryDaemonSet): Expect "Effect" to be unspecified on the
toleration.
(TestCanaryDaemonsetChanged): Add a test case to verify that
canaryDaemonSetChanged correctly detects the change and updates the
daemonset if the specified effect changes on the toleration.
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels May 18, 2023
@openshift-ci-robot
Copy link
Contributor

@Miciah: This pull request references Jira Issue OCPBUGS-9274, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Configure the canary daemonset to tolerate a "node-role.kubernetes.io/infra" node taint that specifies the NoExecute effect (or any other effect).

Before this change, the canary daemonset tolerated the "node-role.kubernetes.io/infra" node taint only if it specified the NoSchedule effect. However, some cluster admins taint infra nodes using both NoSchedule as well as NoExecute, in which case canary pods would not be scheduled on these nodes. This inconsistency caused some confusion. With this change, canary pods should be scheduled on infra nodes whether or not they are tainted using the NoExecute effect.

  • assets/canary/daemonset.yaml: Tolerate the "node-role.kubernetes.io/infra" taint irrespective of the specified effect.
  • pkg/manifests/bindata.go: Regenerate.
  • pkg/operator/controller/canary/daemonset_test.go (TestDesiredCanaryDaemonSet): Expect Effect to be unspecified on the toleration.
    (TestCanaryDaemonsetChanged): Add a test case to verify that canaryDaemonSetChanged correctly detects the change and updates the daemonset if the specified effect changes on the toleration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label May 18, 2023
@candita
Copy link
Contributor

candita commented May 24, 2023

/assign

@candita
Copy link
Contributor

candita commented May 30, 2023

/retest-required

@candita
Copy link
Contributor

candita commented May 31, 2023

Hypershift error:
2023-05-30T15:22:47Z INFO Detected Issuer URL {"issuer": "https://hypershift-ci-1-oidc.s3.us-east-1.amazonaws.com/1b0ddf9389406da45310-mgmt"}
2023-05-30T15:22:47Z ERROR Failed to create cluster {"error": "failed to create iam: LimitExceeded: Cannot exceed quota for OpenIdConnectProvidersPerAccount: 100\n\tstatus

/retest-required

@candita
Copy link
Contributor

candita commented May 31, 2023

Seeing this repeatedly, wonder if it is transient:

[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:

/retest-required

@Miciah
Copy link
Contributor Author

Miciah commented Jun 1, 2023

e2e-aws-operator failed because kube-apiserver and ovnkube-master failed to roll out some pods.
/test e2e-aws-operator

@candita
Copy link
Contributor

candita commented Jun 1, 2023

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 1, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jun 1, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: candita

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 1, 2023
@Miciah
Copy link
Contributor Author

Miciah commented Jun 1, 2023

/hold
#939 is top priority; make sure it merges first.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 1, 2023
@Miciah
Copy link
Contributor Author

Miciah commented Jul 6, 2023

/hold cancel
/test all

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 6, 2023
@Miciah
Copy link
Contributor Author

Miciah commented Jul 6, 2023

e2e-aws-operator, e2e-aws-ovn, e2e-aws-ovn-serial, e2e-aws-ovn-single-node, and e2e-aws-ovn-upgrade failed because teh build cluster had no available nodes.
/test e2e-aws-operator

@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 0cdc132 and 2 for PR HEAD aa400d8 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 6, 2023

@Miciah: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn aa400d8 link false /test e2e-gcp-ovn
ci/prow/e2e-azure-ovn aa400d8 link false /test e2e-azure-ovn
ci/prow/e2e-aws-ovn-single-node aa400d8 link false /test e2e-aws-ovn-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@candita
Copy link
Contributor

candita commented Jul 19, 2023

/retest-required

@openshift-merge-robot openshift-merge-robot merged commit 80cfabf into openshift:master Jul 19, 2023
11 of 14 checks passed
@openshift-ci-robot
Copy link
Contributor

@Miciah: Jira Issue OCPBUGS-9274: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-9274 has been moved to the MODIFIED state.

In response to this:

Configure the canary daemonset to tolerate a "node-role.kubernetes.io/infra" node taint that specifies the NoExecute effect (or any other effect).

Before this change, the canary daemonset tolerated the "node-role.kubernetes.io/infra" node taint only if it specified the NoSchedule effect. However, some cluster admins taint infra nodes using both NoSchedule as well as NoExecute, in which case canary pods would not be scheduled on these nodes. This inconsistency caused some confusion. With this change, canary pods should be scheduled on infra nodes whether or not they are tainted using the NoExecute effect.

  • assets/canary/daemonset.yaml: Tolerate the "node-role.kubernetes.io/infra" taint irrespective of the specified effect.
  • pkg/manifests/bindata.go: Regenerate.
  • pkg/operator/controller/canary/daemonset_test.go (TestDesiredCanaryDaemonSet): Expect Effect to be unspecified on the toleration.
    (TestCanaryDaemonsetChanged): Add a test case to verify that canaryDaemonSetChanged correctly detects the change and updates the daemonset if the specified effect changes on the toleration.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants