Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only check for tolerationSeconds on NoExecute tolerations #23665

Merged

Conversation

danwinship
Copy link
Contributor

The test [Feature:Platform][Smoke] Managed cluster should ensure control plane operators do not make themselves unevictable requires all Exists tolerations in non-whitelisted control plane pods to have a tolerationSeconds field. But it's mistakenly checking NoSchedule tolerations too, even though NoSchedule doesn't make you unevictable, and tolerationSeconds doesn't have any effect on NoSchedule tolerations anyway.

(This is currently breaking that test in e2e-aws-ovn-kubernetes:

1 pods found with invalid tolerations:
openshift-ovn-kubernetes/ovnkube-master-7b846b56b-76vs8 tolerates node.kubernetes.io/not-ready with no tolerationSeconds

but ovnkube-master has:

  tolerations:
  - key: "node-role.kubernetes.io/master"
    operator: "Exists"
    effect: "NoSchedule"
  - key: "node.kubernetes.io/not-ready"
    operator: "Exists"
    effect: "NoSchedule"

so it's not tolerating any NoExecute taints.)

/assign @sjenning

@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Aug 23, 2019
@danwinship
Copy link
Contributor Author

/test e2e-aws-ovn-kubernetes

@squeed
Copy link
Contributor

squeed commented Oct 8, 2019

I updated the ovn-kube master tolerations to also tolerate NoExecute, so I'm not sure that this will have the desired effect for us:

      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
      - key: "node.kubernetes.io/not-ready"
        operator: "Exists"
      - key: "node.kubernetes.io/unreachable"
        operator: "Exists"

@danwinship
Copy link
Contributor Author

I updated the ovn-kube master tolerations to also tolerate NoExecute

That seems wrong but we should debate that elsewhere

so I'm not sure that this will have the desired effect for us:

It would not unbreak e2e-aws-ovn-kubernetes, but the patch here is still correct; we should not be checking for tolerationSeconds on NoSchedule tolerations.

@danwinship
Copy link
Contributor Author

/retest

1 similar comment
@danwinship
Copy link
Contributor Author

/retest

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 20, 2020
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 19, 2020
@danwinship
Copy link
Contributor Author

/remove-lifecycle rotten
/lifecycle frozen

@openshift-ci-robot openshift-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Feb 20, 2020
@sjenning
Copy link
Contributor

sjenning commented Mar 5, 2020

/approve
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 5, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, sjenning

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 5, 2020
@danwinship
Copy link
Contributor Author

/test e2e-gcp

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 0374596 into openshift:master Mar 10, 2020
@openshift-ci-robot
Copy link

@danwinship: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-fips 190a7cb link /test e2e-aws-fips

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@danwinship danwinship deleted the toleration-seconds-check branch January 6, 2021 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants