-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove non-functional pre-stop hook parts #4801
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I actually like the simplicity of this more than our initial approximation that relies on the (unrelated) StatefulSet headless service DNS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also +1 on this change. Should we mark it as breaking though? For users that set PRE_STOP_ADDITIONAL_WAIT_SECONDS
to 0
this change will mean going from somewhat safe 20 seconds wait to no waiting at all.
First, the PreStop lifecycle hook keeps querying DNS for `PRE_STOP_MAX_WAIT_SECONDS` (defaulting to 20) until the Pod IP is no longer referenced. | ||
Then, it waits for an additional `PRE_STOP_ADDITIONAL_WAIT_SECONDS` (defaulting to 30). Additional wait is used to: | ||
To address this issue and minimize unavailability, ECK relies on a link:https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/[PreStop lifecycle hook]. | ||
It waits for an additional `PRE_STOP_ADDITIONAL_WAIT_SECONDS` (defaulting to 50). The additional wait time is used to: | ||
|
||
1. Give time to in-flight requests to be completed. | ||
2. Give clients time to use the terminating Pod IP resolved just before DNS record was updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 22 should say environment variable
instead of environment variableS
:
The exact behavior is configurable using environment variable, for example:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Fixes #4698
The pre-stop hook script intended to improve availability during rolling upgrades has not been doing what it is advertising to do since ECK 1.3.0.
cloud-on-k8s/pkg/controller/elasticsearch/nodespec/statefulset.go
Line 51 in 72bfb73
We are including non-ready Pods in the headless service which we then try to use to determine when a terminating Pod has been removed from DNS (never, in this constellation). We have since added functionality in #3837 that relies on this setting for client-side node discovery (aka sniffing).
This means that our pre-stop hook was always running the full 20+30 secs before terminating the Pod. I see two potential fixes for this problem:
I have opted for sticking with 50 seconds of wait time (but I am open to cutting this down)
NOTE: I also kept the
terminationGracePeriod
at 180s to avoid a rolling restart on ECK upgrade.