-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1809665: Start graceful shutdown on SIGTERM #94
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
package shutdown | ||
|
||
import ( | ||
"os" | ||
"os/signal" | ||
) | ||
|
||
var onlyOneSignalHandler = make(chan struct{}) | ||
var shutdownHandler chan os.Signal | ||
|
||
// SetupSignalHandler registered for SIGTERM and SIGINT. A stop channel is returned | ||
// which is closed on one of these signals. If a second signal is caught, the program | ||
// is terminated with exit code 1. | ||
func SetupSignalHandler() <-chan struct{} { | ||
close(onlyOneSignalHandler) // panics when called twice | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What could call There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would say so; and I ran into problems previously:
@smarterclayton any reason we don't pull an existing implementation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the same code as genericapiserver. It's not appropriate for router to take a dependency on it, and this code is straightforward. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the panic guards against coding errors. |
||
|
||
shutdownHandler = make(chan os.Signal, 2) | ||
|
||
stop := make(chan struct{}) | ||
signal.Notify(shutdownHandler, shutdownSignals...) | ||
go func() { | ||
<-shutdownHandler | ||
close(stop) | ||
<-shutdownHandler | ||
os.Exit(1) // second signal. Exit directly. | ||
}() | ||
|
||
return stop | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
// +build !windows | ||
|
||
package shutdown | ||
|
||
import ( | ||
"os" | ||
"syscall" | ||
) | ||
|
||
var shutdownSignals = []os.Signal{os.Interrupt, syscall.SIGTERM} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
package shutdown | ||
|
||
import ( | ||
"os" | ||
) | ||
|
||
var shutdownSignals = []os.Signal{os.Interrupt} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing that sends the signal that triggers the graceful shutdown is the kubelet, when the pod is marked for deletion, right? As soon as a pod is marked for deletion, the pod is removed from endpoints, so it should not be receiving new connections. So once the endpoints controller updates the endpoints in response to the pod's deletion and the service proxy updates in response to the endpoints update, we're really just waiting for already established connections to drain, right? Where does the 45-second delay come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you're waiting for distributed load balancers to take you out of rotation.
1 is fast. 2 may take up to 5-10s depending on load. 3 takes as long as any type of global load balancer in front of the service takes to detect a not ready service (which is
(unhealthy checks + 1) * interval check
or 32s for GCP). See https://docs.google.com/document/d/1BUmtdTth49V02UZ5EjRvJ92A5vjF8wMJMSdPb1Wz3wQ/edit# for an explanation (that will become part of openshift/enhancements)