hook-image-puller: -pod-scheduling-wait-duration flag added for reliability during helm upgrades #1763

consideRatio · 2020-09-01T09:04:14Z

PR Summary

This PR address a risk of helm upgrade failing for a bad reason - that the hook-image-puller pods fail to schedule because the nodes cannot accept more pods on them.

How `helm upgrade` currently can fail for a bad reason

When helm upgrade is run, certain k8s resources are rendered and deployed before others using Chart hooks. We deploy a hook-image-awaiter Pod and a hook-image-puller daemonset. The hook-image-awaiter pod awaits the pods of the hook-image-puller daemonset to become ready, as they do when they have been scheduled and pulled all images in dummy init containers.one node

The failure we can avoid is when the hook-image-awaiter pod get stuck waiting for one of the image puller pods that fails to schedule on a node, because a node is out of capacity to schedule more pods - probably mostly because a maximum pod per node constraint.

GKE has a max 110 pods per node for example, and there is no workaround for this limit as far as I know.

How this PR address the issue

This PR updates the logic of the hook-image-awaiter pod's binary which we have written ourselves in Go. Before this PR it is waiting for all image puller pods to scheduled, start, and get the main container running/ready. But in this PR, it is made to await only the pods that actually have scheduled, but only to do that after an initial duration which is configurable with a -pod-scheduling-wait-duration.

We can accomplish this by using the status set on the hook-image-puller DaemonSet resource that describes CurrentNumberScheduled, DesiredNumberScheduled, and NumberReady.

Configuration added

This duration is configurable with prePuller.hook.podSchedulingWaitDuration and defaults to -1. It can be configured to be an infinite duration with -1.

It can be a problem to require a pod to be successfully scheduled on all nodes. If we require this then chart upgrades when any node has insufficient pods will fail, not because --wait part of the Helm upgrade requires daemonset pods to be scheduled and ready because it doesn't, but because the image-awaiter pod requires the desired number of pods to be scheduled are scheduled and running. This PR introduces a flag and sets a default value of 10 for it. Like this, the criteria to successfully have awaited image pulling is to in the initial ten seconds either have as many ready image puller pods as is desired. The difference is that this will change after this initial duration to something new, namely to wait only for the current number of scheduled pods to become ready.

yuvipanda

Minor suggestions, but otherwise LGTM. Thanks for working on this, @consideRatio

images/image-awaiter/main.go

jupyterhub/values.yaml

GeorgianaElena

Everything looks super nice @consideRatio 👍 🎉 ❤️

Is it possible for other pods to get scheduled after the strict-check-duration runs out and the hook-image-awaiter gets "unblocked"?

consideRatio · 2020-09-23T06:14:12Z

@yuvipanda @GeorgianaElena thank you for your reviews ❤️ 🎉

@GeorgianaElena yes a pod that was pending for say 30 seconds can still get scheduled. The scheduling of these pods are done by a kubernetes pod scheduler and isn't influenced by the image-awaiter pod which only monitors the status of the daemonset object, which in turn contains information about: the number of scheduled pods (scheduled), the number of pods that should be scheduled (desired), the number of ready pods (ready).

So assuming a strict-check-duration of 10, what could happen is...

After 5 seconds --- desired: 5, scheduled: 4, ready: 4 --- keep waiting
After 9 seconds --- desired: 5, scheduled: 5, ready: 4 --- keep waiting
After 10 seconds --- image-awaiter will now accept scheduled==ready rather than desired==ready
After 15 seconds --- desired: 6, scheduled: 5, ready: 4 --- keep waiting
After 30 seconds --- desired: 6, scheduled: 5, ready: 5 --- completed!

…wait-duration

consideRatio requested a review from yuvipanda September 1, 2020 09:04

prePuller.hook.strictCheckDuration config option added

f24dc60

consideRatio force-pushed the relaxed-hook-image-puller branch from 83228ca to f24dc60 Compare September 4, 2020 17:36

consideRatio mentioned this pull request Sep 5, 2020

JupyterHub 1.2 / Z2JH 0.10.0 release coordination jupyterhub/team-compass#336

Closed

15 tasks

consideRatio changed the title ~~hook-image-puller: -strict-check-duration flag added~~ Relax hook-image-puller for reliability: -strict-check-duration flag added Sep 9, 2020

consideRatio requested a review from GeorgianaElena September 20, 2020 01:11

yuvipanda requested changes Sep 23, 2020

View reviewed changes

images/image-awaiter/main.go Outdated Show resolved Hide resolved

jupyterhub/values.yaml Outdated Show resolved Hide resolved

GeorgianaElena approved these changes Sep 23, 2020

View reviewed changes

consideRatio mentioned this pull request Sep 23, 2020

Rewrite hook-image-awaiter with Rust and its Kubernetes client library: clux/kube-rs #1786

Closed

consideRatio added 2 commits September 23, 2020 08:41

image-awaiter: rename flag -strict-check-duration to -pod-scheduling-…

2c83112

…wait-duration

image-awaiter: don't relax completion criteria by default yet

0cf688b

consideRatio requested a review from yuvipanda September 23, 2020 06:43

consideRatio changed the title ~~Relax hook-image-puller for reliability: -strict-check-duration flag added~~ Relax hook-image-puller for reliability: -pod-scheduling-wait-duration flag added Sep 23, 2020

consideRatio changed the title ~~Relax hook-image-puller for reliability: -pod-scheduling-wait-duration flag added~~ hook-image-puller: -pod-scheduling-wait-duration flag added for reliability during helm upgrades Sep 23, 2020

consideRatio merged commit f000e94 into jupyterhub:master Sep 23, 2020

consideRatio mentioned this pull request Sep 23, 2020

Relax hook-image-puller to make upgrades more reliable #1787

Merged

consideRatio mentioned this pull request Oct 4, 2020

insufficient pods and the interaction with the hook-image-puller daemonset #1760

Closed

consideRatio added the enhancement label Oct 4, 2020

consideRatio mentioned this pull request Oct 7, 2020

How does image-awaiter and image-puller interact? #1305

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hook-image-puller: -pod-scheduling-wait-duration flag added for reliability during helm upgrades #1763

hook-image-puller: -pod-scheduling-wait-duration flag added for reliability during helm upgrades #1763

consideRatio commented Sep 1, 2020 •

edited

Loading

yuvipanda left a comment

GeorgianaElena left a comment

consideRatio commented Sep 23, 2020

hook-image-puller: -pod-scheduling-wait-duration flag added for reliability during helm upgrades #1763

hook-image-puller: -pod-scheduling-wait-duration flag added for reliability during helm upgrades #1763

Conversation

consideRatio commented Sep 1, 2020 • edited Loading

PR Summary

How helm upgrade currently can fail for a bad reason

How this PR address the issue

Configuration added

yuvipanda left a comment

Choose a reason for hiding this comment

GeorgianaElena left a comment

Choose a reason for hiding this comment

consideRatio commented Sep 23, 2020

consideRatio commented Sep 1, 2020 •

edited

Loading

How `helm upgrade` currently can fail for a bad reason