New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1809665: Start graceful shutdown on SIGTERM #94
Bug 1809665: Start graceful shutdown on SIGTERM #94
Conversation
7fc4d56
to
1b7654c
Compare
Depends on openshift/cluster-ingress-operator#366 (we're getting aborted early) |
Testing upgrade from this PR + 366 to itself (changed images only), https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/19735 got 2s of disruption in the console, which should be fixed by openshift/console-operator#385 |
Remaining change here, pod needs to start reporting not ready as soon as it is marked deleted. Otherwise I'm testing the logging and output. |
This approach, and the code, looks good to me. I'm avoiding saying lgtm in case you intend to add the not-ready reporting before it gets merged. |
1b7654c
to
ecdbae6
Compare
Yeah, I want to see all three of the PRs working together before I merge any one. |
/hold |
ecdbae6
to
636cecd
Compare
/hold cancel Verified the health check during shutdown with:
Verified during upgrade route ingresses remain available https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/19806 when testing all three PRs together. This is ready for review. |
// which is closed on one of these signals. If a second signal is caught, the program | ||
// is terminated with exit code 1. | ||
func SetupSignalHandler() <-chan struct{} { | ||
close(onlyOneSignalHandler) // panics when called twice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What could call SetupSignalHandler
twice? Is the purpose of this line to guard against future coding errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say so; and I ran into problems previously:
- Revert "Make NewCommandStartNode() handle SIGTERM and SIGINT" origin#19986
- Remove signal handler registration from pkg/kubelet kubernetes/kubernetes#63859
@smarterclayton any reason we don't pull an existing implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same code as genericapiserver. It's not appropriate for router to take a dependency on it, and this code is straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the panic guards against coding errors.
images/router/haproxy/reload-haproxy
Outdated
fi | ||
readonly wait_time=${ROUTER_MAX_SHUTDOWN_TIMEOUT:-${MAX_RELOAD_WAIT_TIME:-$max_wait_time}} | ||
kill -USR1 $old_pids | ||
for i in $( seq 1 $wait_time ); do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it matters enough to worry about, but the following would be more precise:
stop=$((SECONDS + wait_time))
while ((SECONDS < stop)); do
# ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, that construct looks much harder to understand for a casual reader. Why is it more precise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the condition is true once $wait_time
seconds has passed since the assignment. In contrast, sleeping for 1 second n times fails to account for execution time. It probably does not matter here as the only commands are builtins and pidof
, which should be fast; it would matter more for a command like, say, curl
, where the loop may spend more time executing commands than sleeping. That is why I say it probably does not matter enough to worry about, and if the for
plus seq
is easier to read, we can run with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd prefer to use the simpler construct if we don't have a concrete reason to avoid it (waiting slightly longer should not be an issue).
// endpoint out of rotation. | ||
delay := getIntervalFromEnv("ROUTER_GRACEFUL_SHUTDOWN_DELAY", 45) | ||
log.Info(fmt.Sprintf("Shutdown requested, waiting %s for new connections to cease", delay)) | ||
time.Sleep(delay) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing that sends the signal that triggers the graceful shutdown is the kubelet, when the pod is marked for deletion, right? As soon as a pod is marked for deletion, the pod is removed from endpoints, so it should not be receiving new connections. So once the endpoints controller updates the endpoints in response to the pod's deletion and the service proxy updates in response to the endpoints update, we're really just waiting for already established connections to drain, right? Where does the 45-second delay come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you're waiting for distributed load balancers to take you out of rotation.
- Pod marked for deletion in etcd
- Delete notification propagates to all consumers
- Consumers stop directing new traffic to the endpoint
1 is fast. 2 may take up to 5-10s depending on load. 3 takes as long as any type of global load balancer in front of the service takes to detect a not ready service (which is (unhealthy checks + 1) * interval check
or 32s for GCP). See https://docs.google.com/document/d/1BUmtdTth49V02UZ5EjRvJ92A5vjF8wMJMSdPb1Wz3wQ/edit# for an explanation (that will become part of openshift/enhancements)
@smarterclayton: This pull request references Bugzilla bug 1809665, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
4.4 bug is 1809667 and 4.3 bug is 1809668 |
/cherry-pick release-4.4 |
@smarterclayton: once the present PR merges, I will cherry-pick it on top of release-4.4 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.3 |
Both jobs passed with zero disruption (just verifying I didn't break the graceful shutdown with the changes to reload-haproxy). |
/retest |
1 similar comment
/retest |
Hrm, that's super weird. AWS install fails because it can't find the haproxy-config.template. That's scary. |
/retest |
/lgtm |
/retest |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Miciah, smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/bugzilla refresh |
@smarterclayton: This pull request references Bugzilla bug 1809665, which is valid. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@smarterclayton: new pull request created: #97 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@smarterclayton: new pull request created: #98 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Graceful shutdown in Kubernetes requires:
This PR updates the router to implement the above logic.
When the main process recieves SIGTERM or SIGINT, wait ROUTER_GRACEFUL_SHUTDOWN_DELAY (default: 45) seconds and then signal the reload script with ROUTER_SHUTDOWN=true that a graceful termination is requested. The reload-haproxy script then invokes USR1 on all child haproxy processes and waits ROUTER_MAX_SHUTDOWN_TIMEOUT or MAX_RELOAD_WAIT_TIME (default 30) seconds before invoking TERM on the child processes. If TERM is invoked the script exits with 1, indicating that not all processes completed their work.
Clients with long running requests should set ROUTER_MAX_SHUTDOWN_TIMEOUT as appropriate to ensure all connections exit cleanly.