New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-4.6] Bug 1987221: Backport drain timeout optimizations #2707
[release-4.6] Bug 1987221: Backport drain timeout optimizations #2707
Conversation
Refactor drain to remove exponential backoff which does not scale well with larger values (hours) (cherry picked from commit 816bd51)
- Refactor metric to correctly fire with new drain refactor. - Remove unnecessary metric labels to simplify metrics & ensure there is always only metric per node/mcd (cherry picked from commit 3005487)
we no longer need to build in a delay since it now takes 1h to produce a drain failure. (cherry picked from commit 95d26ca)
Test time was bumped in 4.8 from 75m to 90m via: openshift#2474 Test has been failing often in 4.7 and will be a little longer consistently as the exponential backoff has been removed in favor of 5 min retries (which is faster in slower cases but a little slower in faster cases). (cherry picked from commit 654bd00)
Also, add log and event for successful node cordon. cordoning the node accidentally got removed in openshift#2605. (cherry picked from commit 0c9ee85)
@kikisdeliveryservice: This pull request references Bugzilla bug 1987221, which is valid. The bug has been updated to refer to the pull request using the external bug tracker. 6 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Bugzilla (rioliu@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
waiting for e2e on this /hold |
Could you PTAL don't want to miss anything this time /assign @sinnykumari |
pkg/daemon/update.go
Outdated
default: | ||
if err := drain.RunNodeDrain(dn.drainer, dn.node.Name); err != nil { | ||
glog.Infof("Draining failed with: %v, retrying", err) | ||
time.Sleep(5 * time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also include patch #2612 so that few initial retry occurs every 1 mins on failure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh good catch!
one comment other than that lgtm |
will update /hold |
Manual backport of openshift#2611 (cherry picked from commit a5ed55c)
/hold cancel |
@kikisdeliveryservice: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kikisdeliveryservice, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
5 similar comments
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/skip |
patch manager: high bz, increases appearance of reliability for upgrade |
@kikisdeliveryservice: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged:
These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Once unlinked, request a bug refresh with Bugzilla bug 1987221 has not been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Supercedes #2698