New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1809665: The console should wait until it is out of rotation to shut down #385
Bug 1809665: The console should wait until it is out of rotation to shut down #385
Conversation
When a pod is marked deleted, endpoints are updated instantly and propagate to load balancers, routers, and nodes. A component that wishes to remain available during upgrades must wait longer than the default propagation interval for these changes to avoid having requests delivered to pods that are shutting down. Change the console to wait 25s before terminating the serving process and up to 40s on the node to ensure all front ends have time to drain. The minimum interval here is how long an average connection can take behind the router to drain, once new connections stop getting created. I.e. wait = time to propagate endpoints (5s) + time for router reload (5s) + time for longest request to finish (15s) = 25s
/hold while testing |
/test images |
/hold cancel Verified during upgrade route ingresses remain available https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/19806 when testing all three PRs together. This is ready for review. |
@smarterclayton: This pull request references Bugzilla bug 1809665, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.4 |
@smarterclayton: once the present PR merges, I will cherry-pick it on top of release-4.4 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.3 |
@smarterclayton: once the present PR merges, I will cherry-pick it on top of release-4.3 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
4.4 bug is 1809667 and 4.3 bug is 1809668 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/lgtm
looking into the e2e test fail to see if its a flake.
/retest
|
/retest |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: benjaminapetersen, smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
/bugzilla refresh |
@smarterclayton: This pull request references Bugzilla bug 1809665, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/bugzilla refresh |
@smarterclayton: This pull request references Bugzilla bug 1809665, which is valid. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
/retest Please review the full test history for this PR and help us cut down flakes. |
8 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@smarterclayton: All pull requests linked via external trackers have merged. Bugzilla bug 1809665 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@smarterclayton: new pull request created: #387 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@smarterclayton: new pull request created: #388 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
When a pod is marked deleted, endpoints are updated instantly and
propagate to load balancers, routers, and nodes. A component that
wishes to remain available during upgrades must wait longer than
the default propagation interval for these changes to avoid having
requests delivered to pods that are shutting down.
Change the console to wait 25s before terminating the serving
process and up to 40s on the node to ensure all front ends have
time to drain. The minimum interval here is how long an average
connection can take behind the router to drain, once new connections
stop getting created. I.e.
This change should result in the console not being disrupted during
upgrade (0s disruption to the console route).
Downtime documented in https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/19447