New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-4.12] OCPBUGS-23020: Introduce upgrading label to block concurrent upgrades #1932
[release-4.12] OCPBUGS-23020: Introduce upgrading label to block concurrent upgrades #1932
Conversation
/jira cherrypick OCPBUGS-23016 |
@jrvaldes: Jira Issue OCPBUGS-23016 has been cloned as Jira Issue OCPBUGS-23020. Will retitle bug to link to clone. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Skipping CI for Draft Pull Request. |
@jrvaldes: This pull request references Jira Issue OCPBUGS-23020, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test ? |
@jrvaldes: The following commands are available to trigger required jobs:
Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/hold |
/hold cancel |
d2d4d25
to
c2c8a14
Compare
/retest-required |
1 similar comment
/retest-required |
/jira refresh |
@jrvaldes: This pull request references Jira Issue OCPBUGS-23020, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/retest-required |
/jira refresh |
/hold need WINC-1191 for validation |
This change introduces the concept of maximum number of parallel upgrades that takes place concurrently for Windows nodes during reconciliation. The `windowsmachineconfig.openshift.io/upgrading` label is proposed as the locking mechanism among the Windows nodes to account for how many instances can perform an upgrade under following a threshold i.e. MaxParallelUpgrades which is fixed to 1. (cherry picked from commit f1dd8f5)
This commit introduces a test to check the maximum allowed numbers of Windows nodes upgrading in parallel. The test is divided in two phases, 1) setup and 2) test, where the setup phase deploys a job with a fixed name that constantly fetch the number of Windows nodes with the `windowsmachineconfig.openshift.io/upgrading` label and fail if is greater than the maximum allowed. The polling frequency is set to 5 seconds. The latter test, checks the number of failed pods for the checker job and require no failures, otherwise fails the e2e test. A new service account is proposed in the test namespace to hold the RBAC required by the checker job to list the nodes in the test cluster. The test is designed to run as a separate job due to the structure of the new upgrade test in vSphere (vsphere-e2e-upgrade) that is scattered between the steps in the release repo and code in the WMCO test suite. (cherry picked from commit e03c792)
This changes aggregates the RBAC resources required by the test runner job in the proposed function ensureTestRunnerRBAC() to avoid duplication of intended functionality. (cherry picked from commit a2a6f6b)
26542b0
to
0a99dc1
Compare
/test azure-e2e-upgrade |
/hold cancel blocker pr merged. |
/test remaining-required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
This change enables the version annotation check while waiting for the Windows nodes to be fully configured after triggering an upgrade. Before, both nodes were getting configured at the same time so this wasn't an issue, but now with the sequential order of the upgrade process WMCO takes some time to start processing the next node and the test was failing due to version annotation mismatch. (cherry picked from commit 32ab771)
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jrvaldes, sebsoto The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/test remaining-required |
@jrvaldes: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/tide refresh |
/lgtm |
/lgtm |
21bf7cb
into
openshift:release-4.12
@jrvaldes: Jira Issue OCPBUGS-23020: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-23020 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
This is a manual cherry-pick of #1901