Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hook BackoffLimitExceeded even though job completes successfully #6873

Closed
vmalloc opened this issue Nov 3, 2019 · 7 comments
Labels

Comments

@vmalloc
Copy link

@vmalloc vmalloc commented Nov 3, 2019

I have a pre-upgrade hook that looks like this in my chart:

apiVersion: batch/v1
kind: Job
metadata:
  name: "{{.Release.Name}}-migrate"
  labels:
    app.kubernetes.io/managed-by: {{.Release.Service | quote }}
    app.kubernetes.io/instance: {{.Release.Name | quote }}
    app.kubernetes.io/version: {{ .Chart.AppVersion | quote}}
    helm.sh/chart: "{{.Chart.Name}}-{{.Chart.Version}}"
  annotations:
    "helm.sh/hook": "pre-upgrade,pre-install"
    "helm.sh/hook-weight": "-1"
    "helm.sh/hook-delete-policy": "before-hook-creation"
spec:
  ttlSecondsAfterFinished: 300
  activeDeadlineSeconds: 300
  template:
    metadata:
      name: "{{.Release.Name}}"
      labels:
        app.kubernetes.io/managed-by: {{.Release.Service | quote }}
        app.kubernetes.io/instance: {{.Release.Name | quote }}
        helm.sh/chart: "{{.Chart.Name}}-{{.Chart.Version}}"
    spec:
      restartPolicy: Never
      containers:
      - name: migrate-db
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        imagePullPolicy: Always
        command: ["..."]
        env:
          ...

I am running the upgrade and getting

debug] Created tunnel using local port: '54226'

[debug] SERVER: "127.0.0.1:54226"

UPGRADE FAILED
Error: Job failed: BackoffLimitExceeded
Error: UPGRADE FAILED: Job failed: BackoffLimitExceeded

In the tiller log I also see indication that helm thinks the job failed:

warning: Release home-backend pre-upgrade home-backend/templates/migrate.yaml could not complete: Job failed: BackoffLimitExceeded

However the job succeeded and appears just fine in pod describe. I am also pretty sure error gets returned even before the job completes successfully...

Output of helm version:

Client: &version.Version{SemVer:"v2.15.2", GitCommit:"8dce272473e5f2a7bf58ce79bb5c3691db54c96b", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.15.2", GitCommit:"8dce272473e5f2a7bf58ce79bb5c3691db54c96b", GitTreeState:"clean"}

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T23:42:50Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.10-eks-5ac0f1", GitCommit:"5ac0f1d9ab2c254ea2b0ce3534fd72932094c6e1", GitTreeState:"clean", BuildDate:"2019-08-20T22:39:46Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}

Cloud Provider/Platform (AKS, GKE, Minikube etc.): EKS

@vmalloc

This comment has been minimized.

Copy link
Author

@vmalloc vmalloc commented Nov 3, 2019

I found out that I had other (unrelated) jobs that were pretty old and did not run to completion. Those were other cron jobs that are not helm hooks. Once I deleted those jobs this error disappeared.

I still believe this is a bug though - failures in unrelated jobs should not cause an upgrade to fail...

@lisfox1

This comment has been minimized.

Copy link

@lisfox1 lisfox1 commented Nov 4, 2019

Removing failed jobs fixed the issue for me as well. Thank you @vmalloc

@froch

This comment has been minimized.

Copy link

@froch froch commented Nov 4, 2019

same for us, deleting failed jobs from another unrelated deployment "fixed" this. thank you

@walbertus

This comment has been minimized.

Copy link

@walbertus walbertus commented Nov 5, 2019

This also to happen to us, thanks @vmalloc

@vmalloc

This comment has been minimized.

Copy link
Author

@vmalloc vmalloc commented Nov 5, 2019

It seems like this, along with #6873, are two very painful regressions with 2.15.x... I wonder if we could get any of the developers to comment on plans to address these issues...

@bacongobbler

This comment has been minimized.

Copy link
Member

@bacongobbler bacongobbler commented Nov 5, 2019

If you can identify the issue we'd be happy to review any patches for 2.15. Right now we're focusing on getting Helm 3 released so we haven't had time to look at this.

@jorge-gasca

This comment has been minimized.

Copy link

@jorge-gasca jorge-gasca commented Nov 7, 2019

This is possibly fixed by #6907

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.