Helm should delete job after it is successfully finished #1769

dzavalkinolx · 2016-12-29T21:50:48Z

I have a job bind to post-install,post-upgrade,post-rollback

    "helm.sh/hook": post-install,post-upgrade,post-rollback

when i’m running update charts i get Error: UPGRADE FAILED: jobs.batch "opining-quetzal-dns-upsert" already exists
kubectl get jobs returns

opining-quetzal-dns-upsert   1         1            12m

So, how we are supposed to use jobs as hooks if it is not deleted after it is successfully finished?
There is no way to update chart if it has such job.

The text was updated successfully, but these errors were encountered:

thomastaylor312 · 2016-12-29T21:57:20Z

I would love to see this and I will try to get around to submitting a PR for it if I have the time. In the meantime, I tag this on to the end of the job name as a workaround:

metadata:
  name: {{ template "fullname" . }}-{{ randAlphaNum 5 | lower }}

dzavalkinolx · 2017-01-02T20:39:25Z

I'm currently using workaround from @thomastaylor312 and it is very-very-very bad to do it this way.
There are 2 issues:

If job wasn't finished successfully (by whatever reason) - it will be rescheduled again and again and again. I think we need something like a maxRetry parameter here.
If chart is deleted - job is not deleted by helm. So it will continue to be scheduled and fail indefinitely.

As a result I've just run into a performance issue with a storm of kill container / start container events.

thomastaylor312 · 2017-01-03T17:46:39Z

@dzavalkinolx: We should probably bring this up in the kubernetes issues. That isn't so much a problem with Helm as it is with how Kubernetes Jobs work. Have you also tried setting restartPolicy: OnFailure so that it just restarts the container instead of creating a new pod?

technosophos · 2017-01-03T20:12:06Z

This is a tricky one. Helm doesn't implicitly manage the lifecycles of things like this. In particular, it never deletes a user-created resource without receiving an explicit request from a user.

Now, we recently introduced an annotation called resource-policy that we could use for something like this. Currently, the only defined resource policy is keep, which is used to tell Tiller not to delete that resource on a normal deletion request. I suppose we could implement another policy with something like delete-on-completion that deleted a Pod or Job on completion.

Since Tiller does not actively monitor resources once they are deployed, I'm not sure this would be a terribly powerful annotation, but it could work on hooks because we do watch hooks for lifecycle events.

longseespace · 2017-01-16T16:19:04Z

@dzavalkinolx Have you tried activeDeadlineSeconds?

thomastaylor312 · 2017-01-17T18:16:46Z

@longseespace I tried that too, but it just kills the job after the amount of time and spins up a new one

javiercr · 2017-02-03T08:18:03Z

I ran into this problem today too. I found out about Helm Hooks recently and I thought it would be perfect to implement pre-upgrade database migrations (rake db:migrate, for those of you familar with Rails). But then I found that Helm does not clean up completed jobs, so this doesn't work on a CI environment.

I'm probably missing something, but what's the point of having a job executed in a pre-upgrade / post-upgrade hook if once the job gets executed the first time you do helm upgrade, the next one it will fail?

technosophos · 2017-02-15T00:42:24Z

That's why we recommend appending a random string to a job name if you know for sure you are going to re-run a job again.

javiercr · 2017-02-15T09:37:11Z

I see, I'm not sure I like the approach of leaving behind a new successful job for every deployment. Right now what I've done is adding a new step to our deployment script that does kubectl delete job db-migrate-job --ignore-not-found before our helm upgrade.

johnw188 · 2017-04-05T22:21:24Z

It feels like helm should be managing the full lifecycle of its hooks, as they're documented as the approach to take when it comes to executing lifecycle events such as migrations. Someone without a strong understanding of kubernetes could end up with hundreds of useless jobs.

I almost feel like the ideal solution here would just be to have tiller delete an old job if it hits a name conflict with a hook job it attempts to create. You could verify that the job was required by helm by ensuring that the correct hook annotation is present. You could also put this behavior behind a command line flag, such as --overwrite-hooks with a better error message for users:

Error: UPGRADE FAILED: jobs.batch "opining-quetzal-dns-upsert" already exists
Rerun your upgrade with --overwrite-hooks to automatically replace existing hooks

docmatrix · 2017-05-16T16:17:47Z

I am having the exact same challenges with a python / django app. Does anyone have a mechanism such that helm will abort an upgrade if the pre-upgrade job fails?

thomastaylor312 · 2017-05-18T23:51:59Z

I may take this once I get some other work done for 2.5. Assigning to myself for now, if someone else wants to take it before I work on it, let me know

DoctorZK · 2017-06-15T02:06:51Z

@thomastaylor312 Have you finished this feature yet? If not, I would like to take this work.

thomastaylor312 · 2017-06-15T05:23:09Z

@DoctorZK Feel free to take it. Thank you for offering to do it!

gianrubio · 2017-06-20T13:22:00Z

What about a simple annotation helm.sh/resource-policy: delete-job-after-run? When the job successfully run, helm delete this job.

DoctorZK · 2017-06-21T01:54:16Z

Good suggestion. I have thought out two approaches to solve this problem.

Add a simple annotation in the hook templates, which is the easiest to implement. However, it can not solve this kind of problem: helm fails during the pre-install/pre-upgrade process, but users try to install/upgrade the release with the same chart again, which will incur resource objects name conflict in K8S. Therefore, with the approach, we should add another annotation, such as helm.sh/resource-policy: delete-job-if-job-fails .

Add flags after install/upgrade/rollback/delete commands (e.g., upgrade $release_name --include-hooks) which can solve name conflict problem, however, it will also remove some kinds of hooks that users are not intended to delete, such as configmaps and secrets that are designed to use repeatedly by different versions of the same release.

I prefer the first one, which can control hooks with a finer granularity.

gianrubio · 2017-06-21T06:47:09Z

@DoctorZK as you suggests I would like to have another annotation helm.sh/resource-policy: delete-job-if-succeed. This is important when you're deleting a helm and have a job for cleanup, for now it's not possible to delete this job without running another job.

Are you willing to work on this? if not I can take care of it

DoctorZK · 2017-06-21T06:55:06Z

Thanks for your help. I have finished the coding process, and now is under test. I will submit the pull request as soon as possible.

libesz · 2017-06-27T07:44:17Z

For those who want to have a workaround for this, until the final solution is implemented.
Instead of appending random characters to the Job object's name, the Job may delete itself from the APIserver as the last task before exiting. It is equally "elegant" but not leaking Job objects.
Sad facts:

You need to pass the APIserver credentials into the Job to do this.
You will lose the Job logs as those are deleted also.
Works only for objects which contains running code (works for Job, Pod..., but not for PVC, 3rd party resources, etc.)

DonMartin76 · 2017-10-24T06:10:45Z

@thomastaylor312 Did this land in 2.7?

bacongobbler · 2017-10-24T06:17:28Z

yes, everything currently in master landed in 2.7.

macropin · 2017-11-08T02:30:39Z

Just to be clear... if I add "helm.sh/hook-delete-policy": hook-succeeded to my job. Then every deployment should recreate and re-run that job? Because that's not what I'm seeing here.

thomastaylor312 · 2017-11-09T03:35:47Z

@macropin Is your hook defined as a post-install,post-upgrade (or pre as your case may be)? If it only has the post-install it will only run the first time

macropin · 2017-11-10T01:55:56Z

@thomastaylor312 It's defined as post-install,post-upgrade, so you're saying it should be working?

thomastaylor312 · 2017-11-11T01:19:09Z

It will create a new object (generally a Pod or Job) each time you release. If you use the feature mentioned here, it will delete the job when it is done running. If for some reason a hook isn't deploying, it would be a separate issue

macropin · 2017-11-14T02:40:54Z

The job has the following annotations:

      annotations:
        "helm.sh/hook": post-install,post-upgrade
        "helm.sh/hook-weight": "5"
        "helm.sh/hook-delete-policy": hook-succeeded,hook-failed

The job only runs once on the first install, and never again on subsequent upgrades. Running Helm v2.7.0. Should I create a separate issue for this?

thomastaylor312 · 2017-11-14T17:09:47Z

@macropin Yes. Could you please create another issue with details about your cluster and, if possible, an example chart that duplicates the issue

Without the annotation, helm upgrade fails. helm/helm#1769

Without those annotation, helm upgrade fails because of : helm/helm#1769

* Update job.yaml Without those annotation, helm upgrade fails because of : helm/helm#1769 * Increasing version number

sohel2020 · 2019-11-13T01:29:36Z

The job has the following annotations:
      annotations:
        "helm.sh/hook": post-install,post-upgrade
        "helm.sh/hook-weight": "5"
        "helm.sh/hook-delete-policy": hook-succeeded,hook-failed
The job only runs once on the first install, and never again on subsequent upgrades. Running Helm v2.7.0. Should I create a separate issue for this?

@macropin How did you solve it? I'm facing a similar issue. It never creates job every subsequent upgrade. I'm using the same annotation as yours.

helm version: v2.14.3

guice · 2019-12-13T19:01:22Z

@sohel2020 and @macropin - Same board, helm v3. The job is never re-ran on subsequent upgrades.

thecrazzymouse · 2020-04-08T09:53:48Z

What is the solution for rerunning jobs on helm v3?

renepardon · 2020-04-20T09:36:57Z

Same problem with Helm version: version.BuildInfo{Version:"v3.1.2", GitCommit:"d878d4d45863e42fd5cff6743294a11d28a9abce", GitTreeState:"clean", GoVersion:"go1.13.8"}

Jobs are neither deleted nor run on subsequent upgrade commands.

jabdoa2 · 2020-05-12T15:40:17Z

We regularly hit this one too in Helm 3.1.

vol-clean-{{ template "alfresco-identity.fullname" . }} updated to name: vol-clean-{{ template "alfresco-identity.fullname" . }}-{{ randAlphaNum 5 | lower }} based on helm/helm#1769 our issue is https://issues.alfresco.com/jira/browse/AAE-3212

* Update job.yaml Without those annotation, helm upgrade fails because of : helm/helm#1769 * Increasing version number

paologallinaharbur · 2021-04-12T19:14:25Z

In the future I believe we will be able to rely as well on job TTL

https://kubernetes.io/docs/concepts/workloads/controllers/job/#ttl-mechanism-for-finished-jobs

schollii · 2021-11-23T22:04:41Z

@paologallinaharbur only partially: for example if job ttl is 5 min, and you make a commit before and the previous commit caused job to fail so it is still there, you will have same issue

The two options that have worked for me:

before a helm upgrade, run kubectl delete job;
use the "helm.sh/hook-delete-policy": hook-succeeded,hook-failed policy, which is a better approach than item 1 BUT this approach will drop the logs of failed job which could be detrimental in some cases for troubleshooting

The best option would be to have a hook policy that is applied before a hook is run, eg something like

annotations:
  "helm.sh/hook-delete-policy": previous-hook-failed

Fixes helm#1742

joelmathew003 · 2024-01-18T15:42:11Z

What is the solution for rerunning jobs on helm v3?

@thecrazzymouse Were you able to find a solution?

technosophos added this to the 2.3.0-Triage milestone Jan 3, 2017

krancour mentioned this issue Apr 8, 2017

Proposal: Stages #2243

Closed

thomastaylor312 added feature help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels May 18, 2017

thomastaylor312 self-assigned this May 18, 2017

rothgar mentioned this issue Jun 7, 2017

Sentry chart helm/charts#1226

Merged

thomastaylor312 removed the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jun 15, 2017

gianrubio mentioned this issue Jun 22, 2017

Cleanup thirdypartyresource before deleting the chart prometheus-operator/prometheus-operator#433

Closed

DoctorZK mentioned this issue Jul 17, 2017

feat(*) add key helm.sh/hook-delete-policy to hook annotation #2692

Merged

thomastaylor312 mentioned this issue Aug 8, 2017

Warning in Helm delete StatefulSet with "post-install, post-delete" Hooks #2710

Closed

thomastaylor312 closed this as completed in #2692 Aug 25, 2017

omar-nahhas added a commit to omar-nahhas/charts that referenced this issue Jan 18, 2018

Update job.yaml

4d04072

Without the annotation, helm upgrade fails. helm/helm#1769

omar-nahhas added a commit to omar-nahhas/charts that referenced this issue Jan 18, 2018

Update job.yaml

3aa9af9

Without those annotation, helm upgrade fails because of : helm/helm#1769

omar-nahhas mentioned this issue Jan 18, 2018

Fix helm upgrade on grafana helm chart helm/charts#3373

Merged

k8s-ci-robot pushed a commit to helm/charts that referenced this issue Jan 27, 2018

Fix helm upgrade on grafana helm chart (#3373)

fb5850b

* Update job.yaml Without those annotation, helm upgrade fails because of : helm/helm#1769 * Increasing version number

jascott1 mentioned this issue Jan 31, 2018

Failed jobs aren't deleted along with deleted release #3428

Closed

TaoYang526 mentioned this issue Apr 17, 2020

[YUNIKORN-36] Cleanup leaked resources when deleting the helm chart apache/yunikorn-k8shim#84

Merged

jabdoa2 mentioned this issue May 12, 2020

Helm sometimes fails to delete post-install/post-upgrade job #8101

Closed

almerico mentioned this issue Aug 3, 2020

fix for AAE-3212 Error: jobs.batch already exists Alfresco/alfresco-identity-service#133

Closed

endrec pushed a commit to Rungway/charts-we-use that referenced this issue Aug 14, 2020

Fix helm upgrade on grafana helm chart (#3373)

c00287c

* Update job.yaml Without those annotation, helm upgrade fails because of : helm/helm#1769 * Increasing version number

torstenwalter pushed a commit to grafana/helm-charts that referenced this issue Sep 4, 2020

Fix helm upgrade on grafana helm chart (#3373)

8142335

* Update job.yaml Without those annotation, helm upgrade fails because of : helm/helm#1769 * Increasing version number

torstenwalter pushed a commit to grafana/helm-charts that referenced this issue Sep 4, 2020

Fix helm upgrade on grafana helm chart (#3373)

c69478e

* Update job.yaml Without those annotation, helm upgrade fails because of : helm/helm#1769 * Increasing version number

MichaelMorrisEst pushed a commit to Nordix/helm that referenced this issue Nov 17, 2023

Fix some hook log not honoring log level (helm#1769)

589b26a

Fixes helm#1742

Helm should delete job after it is successfully finished #1769

Helm should delete job after it is successfully finished #1769

Comments

dzavalkinolx commented Dec 29, 2016

thomastaylor312 commented Dec 29, 2016

dzavalkinolx commented Jan 2, 2017

thomastaylor312 commented Jan 3, 2017

technosophos commented Jan 3, 2017

longseespace commented Jan 16, 2017

thomastaylor312 commented Jan 17, 2017

javiercr commented Feb 3, 2017

technosophos commented Feb 15, 2017

javiercr commented Feb 15, 2017

johnw188 commented Apr 5, 2017

docmatrix commented May 16, 2017

thomastaylor312 commented May 18, 2017 • edited Loading

DoctorZK commented Jun 15, 2017

thomastaylor312 commented Jun 15, 2017

gianrubio commented Jun 20, 2017

DoctorZK commented Jun 21, 2017 • edited Loading

gianrubio commented Jun 21, 2017

DoctorZK commented Jun 21, 2017

libesz commented Jun 27, 2017 • edited Loading

DonMartin76 commented Oct 24, 2017

bacongobbler commented Oct 24, 2017

macropin commented Nov 8, 2017

thomastaylor312 commented Nov 9, 2017

macropin commented Nov 10, 2017

thomastaylor312 commented Nov 11, 2017

macropin commented Nov 14, 2017

thomastaylor312 commented Nov 14, 2017

sohel2020 commented Nov 13, 2019 • edited Loading

guice commented Dec 13, 2019

thecrazzymouse commented Apr 8, 2020

renepardon commented Apr 20, 2020

jabdoa2 commented May 12, 2020

paologallinaharbur commented Apr 12, 2021

schollii commented Nov 23, 2021

joelmathew003 commented Jan 18, 2024 • edited Loading

thomastaylor312 commented May 18, 2017 •

edited

Loading

DoctorZK commented Jun 21, 2017 •

edited

Loading

libesz commented Jun 27, 2017 •

edited

Loading

sohel2020 commented Nov 13, 2019 •

edited

Loading

joelmathew003 commented Jan 18, 2024 •

edited

Loading