Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.6] Bug 1987221: Backport drain timeout optimizations #2707

Conversation

kikisdeliveryservice
Copy link
Contributor

Supercedes #2698

kikisdeliveryservice and others added 5 commits August 9, 2021 15:49
Refactor drain to remove exponential backoff which does not
scale well with larger values (hours)

(cherry picked from commit 816bd51)
- Refactor metric to correctly fire with new drain refactor.
- Remove unnecessary metric labels to simplify metrics & ensure
there is always only metric per node/mcd

(cherry picked from commit 3005487)
we no longer need to build in a delay since it now takes 1h to
produce a drain failure.

(cherry picked from commit 95d26ca)
Test time was bumped in 4.8 from 75m to 90m via:
openshift#2474

Test has been failing often in 4.7 and will be a little longer
consistently as the exponential backoff has been removed in favor
of 5 min retries (which is faster in slower cases but a little
slower in faster cases).

(cherry picked from commit 654bd00)
Also, add log and event for successful node cordon.

cordoning the node accidentally got removed in
openshift#2605.

(cherry picked from commit 0c9ee85)
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 9, 2021

@kikisdeliveryservice: This pull request references Bugzilla bug 1987221, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.z) matches configured target release for branch (4.6.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
  • dependent bug Bugzilla bug 1968759 is in the state CLOSED (ERRATA), which is one of the valid states (VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), CLOSED (CURRENTRELEASE))
  • dependent Bugzilla bug 1968759 targets the "4.7.z" release, which is one of the valid target releases: 4.7.0, 4.7.z
  • bug has dependents

No GitHub users were found matching the public email listed for the QA contact in Bugzilla (rioliu@redhat.com), skipping review request.

In response to this:

Bug 1987221: Backport drain timeout optimizations

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Aug 9, 2021
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 9, 2021
@kikisdeliveryservice kikisdeliveryservice removed the request for review from cgwalters August 9, 2021 22:56
@kikisdeliveryservice kikisdeliveryservice changed the title Bug 1987221: Backport drain timeout optimizations [release=4.6] Bug 1987221: Backport drain timeout optimizations Aug 9, 2021
@kikisdeliveryservice kikisdeliveryservice changed the title [release=4.6] Bug 1987221: Backport drain timeout optimizations [release-4.6] Bug 1987221: Backport drain timeout optimizations Aug 9, 2021
@kikisdeliveryservice
Copy link
Contributor Author

waiting for e2e on this

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 9, 2021
@kikisdeliveryservice
Copy link
Contributor Author

Could you PTAL don't want to miss anything this time

/assign @sinnykumari
/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 11, 2021
default:
if err := drain.RunNodeDrain(dn.drainer, dn.node.Name); err != nil {
glog.Infof("Draining failed with: %v, retrying", err)
time.Sleep(5 * time.Minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also include patch #2612 so that few initial retry occurs every 1 mins on failure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh good catch!

@sinnykumari
Copy link
Contributor

one comment other than that lgtm

@kikisdeliveryservice
Copy link
Contributor Author

will update

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 12, 2021
@kikisdeliveryservice
Copy link
Contributor Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 16, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 16, 2021

@kikisdeliveryservice: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/okd-e2e-aws ef478e8 link /test okd-e2e-aws
ci/prow/e2e-aws-workers-rhel7 ef478e8 link /test e2e-aws-workers-rhel7
ci/prow/okd-e2e-gcp-op ef478e8 link /test okd-e2e-gcp-op

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Contributor

@sinnykumari sinnykumari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 17, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 17, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kikisdeliveryservice, sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [kikisdeliveryservice,sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

5 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@sinnykumari
Copy link
Contributor

/skip

@deads2k deads2k added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Aug 18, 2021
@deads2k
Copy link
Contributor

deads2k commented Aug 18, 2021

patch manager: high bz, increases appearance of reliability for upgrade

@openshift-merge-robot openshift-merge-robot merged commit 0ba370e into openshift:release-4.6 Aug 18, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 18, 2021

@kikisdeliveryservice: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Once unlinked, request a bug refresh with /bugzilla refresh.

Bugzilla bug 1987221 has not been moved to the MODIFIED state.

In response to this:

[release-4.6] Bug 1987221: Backport drain timeout optimizations

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants