New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-4.12] OCPBUGS-3955: daemon: gate done state on uncordon completion #3425
[release-4.12] OCPBUGS-3955: daemon: gate done state on uncordon completion #3425
Conversation
The "complete update" workflow in the daemon today is quite convoluted, which can cause the daemon to never uncordon the node, if the first attempt reaches the timeout. This patch moves the storePendingState to the very end of update completion, such that if the drain fails, we just start from the beginning again. I thought we had worked around this via the drain controller (the MCD isn't uncordoning directly anymore), but we can now just fail the request and then stick in a cordoned state forever.
@openshift-cherrypick-robot: Jira Issue OCPBUGS-1491 has been cloned as Jira Issue OCPBUGS-3955. Retitling PR to link against new bug. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@openshift-cherrypick-robot: No Bugzilla bug is referenced in the title of this pull request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-3955, which is valid. The bug has been moved to the POST state. 6 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hmm, for this backport, should we be waiting for the master PR to be verified? The bot seems to indicate that is not the case |
I don't think so if something tide is not set-up in that way. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: openshift-cherrypick-robot, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@openshift-cherrypick-robot: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@openshift-cherrypick-robot: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-3955 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.11 |
@yuqi-zhang: new pull request created: #3434 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is an automated cherry-pick of #3399
/assign yuqi-zhang