Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1943804: increases termination timeouts for AWS #1079

Conversation

p0lyn0mial
Copy link
Contributor

@p0lyn0mial p0lyn0mial commented Mar 29, 2021

Due to a known AWS issue: https://bugzilla.redhat.com/show_bug.cgi?id=1943804
the time needed for an LB to notice and remove unhealthy instances must be extended.

shutdown-delay-duration flag has been changed from 70s to 210s and terminationGracePeriodSeconds field from 135s to 275s.

For terminationGracePeriodSeconds the initial 210s is reserved for the minimal termination period.
Additional 60s for finishing all in-flight requests and an extra 5s to make sure a potential SIGTERM will be sent after a server terminates itself.

@p0lyn0mial
Copy link
Contributor Author

/hold

for testing

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 29, 2021
@p0lyn0mial p0lyn0mial force-pushed the increase-shutdown-delay-duration branch from 0a43a9b to 77ab7c4 Compare April 26, 2021 10:22
@p0lyn0mial p0lyn0mial changed the title increases shutdown-delay-duration to 240s increases shutdown-delay-duration to 150s Apr 26, 2021
@p0lyn0mial p0lyn0mial force-pushed the increase-shutdown-delay-duration branch from 77ab7c4 to 7be44f0 Compare April 28, 2021 18:19
@p0lyn0mial p0lyn0mial changed the title increases shutdown-delay-duration to 150s increases shutdown-delay-duration to 210s Apr 28, 2021
@smarterclayton
Copy link
Contributor

i suspect this will trigger alerts and apiserver degraded due to the extended duration (but maybe not.

@p0lyn0mial p0lyn0mial force-pushed the increase-shutdown-delay-duration branch from 7be44f0 to 23d451a Compare April 29, 2021 09:42
@p0lyn0mial
Copy link
Contributor Author

/retest

@p0lyn0mial p0lyn0mial force-pushed the increase-shutdown-delay-duration branch from 382d2e5 to fdbeffc Compare May 5, 2021 11:57
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 5, 2021
@p0lyn0mial p0lyn0mial force-pushed the increase-shutdown-delay-duration branch from fdbeffc to 1b58143 Compare May 5, 2021 12:15
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 5, 2021
@p0lyn0mial p0lyn0mial changed the title increases shutdown-delay-duration to 210s increases termination timeouts for AWS May 5, 2021
@p0lyn0mial p0lyn0mial changed the title increases termination timeouts for AWS Bug 1943804: increases termination timeouts for AWS May 5, 2021
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. label May 5, 2021
@openshift-ci-robot
Copy link

@p0lyn0mial: This pull request references Bugzilla bug 1943804, which is invalid:

  • expected the bug to target the "4.8.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1943804: increases termination timeouts for AWS

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label May 5, 2021
@p0lyn0mial
Copy link
Contributor Author

/assign @sttts

@p0lyn0mial
Copy link
Contributor Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 5, 2021
@p0lyn0mial
Copy link
Contributor Author

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels May 5, 2021
@openshift-ci-robot
Copy link

@p0lyn0mial: This pull request references Bugzilla bug 1943804, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @wangke19

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@p0lyn0mial p0lyn0mial force-pushed the increase-shutdown-delay-duration branch from b54bd3a to 702a485 Compare May 19, 2021 07:33
observedShutdownDelayDuration = "210s"
default:
// just return the existing configuration since we don't have an opinion anyway
return existingConfig, errs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

existingConfig is not no opinion. You have to return an empty map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why an empty map? I haven't changed existingConfig so I am simply returning what I have got.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You return the input for another merge. This is not about changing, it's about outputting desired state.

observedGracefulTerminationDuration = "275"
default:
// just return the existing configuration since we don't have an opinion anyway
return existingConfig, errs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above, empty map

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, I haven't changed existingConfig so I am simply returning what I have got.

// the initial 70s is reserved fo the minimal termination period
// additional 60s for finishing all in-flight requests
// an extra 5s to make sure the potential SIGTERM will be sent after the server terminates itself
gracefulTerminationDuration = 135
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? Shouldn't this come from defaultconfig.yaml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we need this.

defaultconfig.yaml defines shutdown-delay-duration

@p0lyn0mial p0lyn0mial force-pushed the increase-shutdown-delay-duration branch from 729b5fe to 78122f8 Compare May 20, 2021 13:09
@sttts
Copy link
Contributor

sttts commented May 20, 2021

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 20, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 20, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: p0lyn0mial, sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 20, 2021
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 20, 2021

@p0lyn0mial: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-operator 6e27d99 link /test e2e-aws-operator
ci/prow/e2e-aws-operator-encryption 6e27d99 link /test e2e-aws-operator-encryption

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit ce4170b into openshift:master May 20, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented May 20, 2021

@p0lyn0mial: All pull requests linked via external trackers have merged:

Bugzilla bug 1943804 has been moved to the MODIFIED state.

In response to this:

Bug 1943804: increases termination timeouts for AWS

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-urgent Referenced Bugzilla bug's severity is urgent for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants