Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Infinite ImagePullBackOff CronJob results in resource leak #76570
What you expected to happen:
This is especially bad with CronJob, because unlike Deployment where an upper limit for horizontal scalability has to be set, CronJob with no history limit and ConcurrencyPolicy will slowly consume all resources on a cluster.
While this is up for debate, I would personally say when a scheduled Job has the ImagePullBackOff error, it shouldn't try to keep scheduling new pods. It should probably kill the pod trying to pull an image and make a new one, or wait for the pod to successfully pull the image.
Worst case scenario it will consume all cluster resources, best case scenario there is a thundering herd of CronJobs all rushing to completion when the image becomes available.
How to reproduce it (as minimally and precisely as possible):
apiVersion: batch/v1beta1 kind: CronJob spec: schedule: "* * * * *" jobTemplate: spec: template: spec: containers: - name: hello image: darrienglasser.com/busybox:does-not-exist
Deploy the above and wait. Your cluster will collapse over time.
Anything else we need to know?: