-
Notifications
You must be signed in to change notification settings - Fork 791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hook-image-puller: -pod-scheduling-wait-duration flag added for reliability during helm upgrades #1763
hook-image-puller: -pod-scheduling-wait-duration flag added for reliability during helm upgrades #1763
Conversation
It can be a problem to require a pod to be successfully scheduled on all nodes. If we require this then chart upgrades when any node has insufficient pods will fail, not because --wait part of the Helm upgrade requires daemonset pods to be scheduled and ready because it doesn't, but because the image-awaiter pod requires the desired number of pods to be scheduled are scheduled and running. This PR introduces a flag and sets a default value of 10 for it. Like this, the criteria to successfully have awaited image pulling is to in the initial ten seconds either have as many ready image puller pods as is desired. The difference is that this will change after this initial duration to something new, namely to wait only for the current number of scheduled pods to become ready.
83228ca
to
f24dc60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor suggestions, but otherwise LGTM. Thanks for working on this, @consideRatio
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks super nice @consideRatio 👍 🎉 ❤️
Is it possible for other pods to get scheduled after the strict-check-duration
runs out and the hook-image-awaiter
gets "unblocked"?
@yuvipanda @GeorgianaElena thank you for your reviews ❤️ 🎉 @GeorgianaElena yes a pod that was pending for say 30 seconds can still get scheduled. The scheduling of these pods are done by a kubernetes pod scheduler and isn't influenced by the image-awaiter pod which only monitors the status of the daemonset object, which in turn contains information about: the number of scheduled pods (scheduled), the number of pods that should be scheduled (desired), the number of ready pods (ready). So assuming a strict-check-duration of 10, what could happen is...
|
PR Summary
This PR address a risk of
helm upgrade
failing for a bad reason - that the hook-image-puller pods fail to schedule because the nodes cannot accept more pods on them.How
helm upgrade
currently can fail for a bad reasonWhen
helm upgrade
is run, certain k8s resources are rendered and deployed before others using Chart hooks. We deploy ahook-image-awaiter
Pod and ahook-image-puller
daemonset. Thehook-image-awaiter
pod awaits the pods of thehook-image-puller
daemonset to become ready, as they do when they have been scheduled and pulled all images in dummy init containers.one nodeThe failure we can avoid is when the
hook-image-awaiter
pod get stuck waiting for one of the image puller pods that fails to schedule on a node, because a node is out of capacity to schedule more pods - probably mostly because a maximum pod per node constraint.GKE has a max 110 pods per node for example, and there is no workaround for this limit as far as I know.
How this PR address the issue
This PR updates the logic of the
hook-image-awaiter
pod's binary which we have written ourselves in Go. Before this PR it is waiting for all image puller pods to scheduled, start, and get the main container running/ready. But in this PR, it is made to await only the pods that actually have scheduled, but only to do that after an initial duration which is configurable with a-pod-scheduling-wait-duration
.We can accomplish this by using the status set on the hook-image-puller DaemonSet resource that describes
CurrentNumberScheduled
,DesiredNumberScheduled
, andNumberReady
.Configuration added
This duration is configurable with
prePuller.hook.podSchedulingWaitDuration
and defaults to-1
. It can be configured to be an infinite duration with-1
.