New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job backoffLimit does not cap pod restarts when restartPolicy: OnFailure #54870

Closed
margosok opened this Issue Oct 31, 2017 · 19 comments

Comments

Projects
9 participants
@margosok

margosok commented Oct 31, 2017

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:
When creating a job with backoffLimit: 2 and restartPolicy: OnFailure, the pod with the configuration included below continued to restart and was never marked as failed:

$ kubectl get pods
NAME               READY     STATUS    RESTARTS   AGE
failed-job-t6mln   0/1       Error     4          57s
$ kubectl describe job failed-job
Name:           failed-job
Namespace:      default
Selector:       controller-uid=58c6d945-be62-11e7-86f7-080027797e6b
Labels:         controller-uid=58c6d945-be62-11e7-86f7-080027797e6b
                job-name=failed-job
Annotations:    kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"batch/v1","kind":"Job","metadata":{"annotations":{},"name":"failed-job","namespace":"default"},"spec":{"backoffLimit":2,"template":{"met...
Parallelism:    1
Completions:    1
Start Time:     Tue, 31 Oct 2017 10:38:46 -0700
Pods Statuses:  1 Running / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  controller-uid=58c6d945-be62-11e7-86f7-080027797e6b
           job-name=failed-job
  Containers:
   nginx:
    Image:  nginx:1.7.9
    Port:   <none>
    Command:
      bash
      -c
      exit 1
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  1m    job-controller  Created pod: failed-job-t6mln

What you expected to happen:

We expected only 2 attempted pod restarts before the job was marked as failed. The pod kept restarting and job.pod.status was never set to failed, remained active.

How to reproduce it (as minimally and precisely as possible):

 apiVersion: batch/v1
 kind: Job
 metadata:
   name: failed-job
   namespace: default
 spec:
   backoffLimit: 2
   template:
     metadata:
       name: failed-job
     spec:
       containers:
       - name: nginx
         image: nginx:1.7.9
         command: ["bash", "-c", "exit 1"]
       restartPolicy: OnFailure

Create the above job and observe the number of pod restarts.

Anything else we need to know?:

The backoffLimit flag works as expected when restartPolicy: Never.

Environment:

  • Kubernetes version (use kubectl version): 1.8.0 using minikube
@margosok

This comment has been minimized.

Show comment
Hide comment
@margosok

margosok Oct 31, 2017

@kubernetes/sig-api-machinery-bugs

margosok commented Oct 31, 2017

@kubernetes/sig-api-machinery-bugs

@k8s-ci-robot

This comment has been minimized.

Show comment
Hide comment
@k8s-ci-robot

k8s-ci-robot Oct 31, 2017

Contributor

@ritsok: Reiterating the mentions to trigger a notification:
@kubernetes/sig-api-machinery-bugs

In response to this:

@kubernetes/sig-api-machinery-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Contributor

k8s-ci-robot commented Oct 31, 2017

@ritsok: Reiterating the mentions to trigger a notification:
@kubernetes/sig-api-machinery-bugs

In response to this:

@kubernetes/sig-api-machinery-bugs

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@weiwei04

This comment has been minimized.

Show comment
Hide comment
@weiwei04

weiwei04 Nov 2, 2017

Contributor

I think this is because for restartPolicy: "OnFailure", kubelet will try to restart failed pod in place, and the pod status will be Running even the container is failed to run, so jobcontroller get a illusion as the pod is still running.

Here's my naive thoughts: for short-term to avoid this, add a spec.activeDeadlineSeconds, for long-term we need consider how to recognize continue restarting Pod and count this kind of Pod as failed pod.

cc @soltysh

Contributor

weiwei04 commented Nov 2, 2017

I think this is because for restartPolicy: "OnFailure", kubelet will try to restart failed pod in place, and the pod status will be Running even the container is failed to run, so jobcontroller get a illusion as the pod is still running.

Here's my naive thoughts: for short-term to avoid this, add a spec.activeDeadlineSeconds, for long-term we need consider how to recognize continue restarting Pod and count this kind of Pod as failed pod.

cc @soltysh

@lavalamp lavalamp added sig/apps and removed sig/api-machinery labels Nov 2, 2017

@timoreimann

This comment has been minimized.

Show comment
Hide comment
@timoreimann

timoreimann Nov 7, 2017

Contributor

Just ran into the issue as well. It might be useful to amend the documentation to warn about the fact that backoffLimit won't be effective if OnFailure is chosen as restart policy. At least, it wasn't obvious to me and I didn't catch any relevant note in the docs. Happy to do a quick PR if that's a desirable short-term action.

For Pods failing as reported in this issue, what's the difference in behavior between a Never restart policy and a hypothetical scenario where backoffLimit is respected by OnFailure? Anything other than failing Pods being replaced by newly created ones in the former case vs. existing Pods being restarted in the latter?

Contributor

timoreimann commented Nov 7, 2017

Just ran into the issue as well. It might be useful to amend the documentation to warn about the fact that backoffLimit won't be effective if OnFailure is chosen as restart policy. At least, it wasn't obvious to me and I didn't catch any relevant note in the docs. Happy to do a quick PR if that's a desirable short-term action.

For Pods failing as reported in this issue, what's the difference in behavior between a Never restart policy and a hypothetical scenario where backoffLimit is respected by OnFailure? Anything other than failing Pods being replaced by newly created ones in the former case vs. existing Pods being restarted in the latter?

@weiwei04

This comment has been minimized.

Show comment
Hide comment
@weiwei04

weiwei04 Nov 8, 2017

Contributor

For OnFailure, the failed Pod will restart on the same kubelet host, so emptyDir/hostPath is still there, for Never the failed Pod got delete and a replacement will create on another(may be the same) kubelet host, if the Pod use emptyDir/hostPath to store some tmp data, the tmp data may be lost.

For Pod don't use emptyDir/hostPath, I think there's no much difference for usage(Never will involve controller, apiserver, scheduler to do some work as it needs to create a replacement, backoffLimit policy will avoid bad jobSpec & container cause to create too many replacement consume control plane bandwidth).

Contributor

weiwei04 commented Nov 8, 2017

For OnFailure, the failed Pod will restart on the same kubelet host, so emptyDir/hostPath is still there, for Never the failed Pod got delete and a replacement will create on another(may be the same) kubelet host, if the Pod use emptyDir/hostPath to store some tmp data, the tmp data may be lost.

For Pod don't use emptyDir/hostPath, I think there's no much difference for usage(Never will involve controller, apiserver, scheduler to do some work as it needs to create a replacement, backoffLimit policy will avoid bad jobSpec & container cause to create too many replacement consume control plane bandwidth).

@timoreimann

This comment has been minimized.

Show comment
Hide comment
@timoreimann

timoreimann Nov 8, 2017

Contributor

Might it be a reasonable idea to move the backoffLimit parameter into the Pod? I'd think it could prove valuable to have an upper bound on the restarts on Pods themselves independently of Jobs, and it may solve the "how does the Cronjob controller determine that the limit has been reached" problem more elegantly if the Pod's state automatically transitions to failed once the limit is hit.

WDYT?

Contributor

timoreimann commented Nov 8, 2017

Might it be a reasonable idea to move the backoffLimit parameter into the Pod? I'd think it could prove valuable to have an upper bound on the restarts on Pods themselves independently of Jobs, and it may solve the "how does the Cronjob controller determine that the limit has been reached" problem more elegantly if the Pod's state automatically transitions to failed once the limit is hit.

WDYT?

@innovia

This comment has been minimized.

Show comment
Hide comment
@innovia

innovia Dec 13, 2017

can anyone tell me if I set the Never and the back off to 3 will it try a restart 3 times? this is very confusing

innovia commented Dec 13, 2017

can anyone tell me if I set the Never and the back off to 3 will it try a restart 3 times? this is very confusing

@timoreimann

This comment has been minimized.

Show comment
Hide comment
@timoreimann

timoreimann Dec 13, 2017

Contributor

@innovia yes, it will. In fact, setting it to Never is the only way right now to have a back off limit be applied and effective.

Contributor

timoreimann commented Dec 13, 2017

@innovia yes, it will. In fact, setting it to Never is the only way right now to have a back off limit be applied and effective.

@timoreimann

This comment has been minimized.

Show comment
Hide comment
@timoreimann

timoreimann Dec 13, 2017

Contributor

I should mention that my statement is based on Kubernetes 1.8. I don't know if any fixes possibly made it into the upcoming 1.9. Then again, if there were, this issue should have been updated or at least referenced.

Contributor

timoreimann commented Dec 13, 2017

I should mention that my statement is based on Kubernetes 1.8. I don't know if any fixes possibly made it into the upcoming 1.9. Then again, if there were, this issue should have been updated or at least referenced.

@innovia

This comment has been minimized.

Show comment
Hide comment
@innovia

innovia Dec 25, 2017

@timoreimann this is not working for a cronjob job spec (i am running 1.8.6)
in the picture i still get new pods when i kill the previous on (5 created 3 should have been created and the job should have failed)

any idea how to get this to actually work?

 jobTemplate:
    spec:
      backoffLimit: 3
      template:
        spec:
          containers:
          - name: cron-job
            command:
            - "/bin/bash"
            - "-c"
            - | 
                WORKER_PID=""
                
                handle_sig_term(){
                  echo -e "Sending SIGTERM to $WORKER_PID"
                  kill -SIGTERM $WORKER_PID
                  echo -e "Waiting for $WORKER_PID"
                  wait $WORKER_PID
                  exit 143  # exit code of SIGTERM is 128 + 15
                }

                trap "handle_sig_term" TERM
 
                 x=1
                 while [ $x -le 100 ]; do
                   echo "Welcome $x times"
                   x=$(( $x + 1 ))
                   sleep 5
                  done   & WORKER_PID=$!
              
                  echo "Job PID: ${WORKER_PID}"

                 # wait <n> waits until the process with ID is complete
                 # (it will block until the process completes)
                 wait $WORKER_PID

                 EXIT_CODE=$?

                 if [ $EXIT_CODE -ne 0 ]; then
                   echo "Job failed with exit code: $EXIT_CODE"
                   exit $EXIT_CODE
                else
                  echo "Job done!"
                fi
          restartPolicy: Never

screen shot 2017-12-25 at 12 07 43 pm

innovia commented Dec 25, 2017

@timoreimann this is not working for a cronjob job spec (i am running 1.8.6)
in the picture i still get new pods when i kill the previous on (5 created 3 should have been created and the job should have failed)

any idea how to get this to actually work?

 jobTemplate:
    spec:
      backoffLimit: 3
      template:
        spec:
          containers:
          - name: cron-job
            command:
            - "/bin/bash"
            - "-c"
            - | 
                WORKER_PID=""
                
                handle_sig_term(){
                  echo -e "Sending SIGTERM to $WORKER_PID"
                  kill -SIGTERM $WORKER_PID
                  echo -e "Waiting for $WORKER_PID"
                  wait $WORKER_PID
                  exit 143  # exit code of SIGTERM is 128 + 15
                }

                trap "handle_sig_term" TERM
 
                 x=1
                 while [ $x -le 100 ]; do
                   echo "Welcome $x times"
                   x=$(( $x + 1 ))
                   sleep 5
                  done   & WORKER_PID=$!
              
                  echo "Job PID: ${WORKER_PID}"

                 # wait <n> waits until the process with ID is complete
                 # (it will block until the process completes)
                 wait $WORKER_PID

                 EXIT_CODE=$?

                 if [ $EXIT_CODE -ne 0 ]; then
                   echo "Job failed with exit code: $EXIT_CODE"
                   exit $EXIT_CODE
                else
                  echo "Job done!"
                fi
          restartPolicy: Never

screen shot 2017-12-25 at 12 07 43 pm

@timoreimann

This comment has been minimized.

Show comment
Hide comment
@timoreimann

timoreimann Dec 25, 2017

Contributor

@innovia what's your spec's schedule? I don't see it in your manifest.

Contributor

timoreimann commented Dec 25, 2017

@innovia what's your spec's schedule? I don't see it in your manifest.

@innovia

This comment has been minimized.

Show comment
Hide comment
@innovia

innovia Dec 25, 2017

every minute just for testing

innovia commented Dec 25, 2017

every minute just for testing

@timoreimann

This comment has been minimized.

Show comment
Hide comment
@timoreimann

timoreimann Dec 25, 2017

Contributor

Aren't you risking then that restarting jobs from a failed schedule and those from a succeeding schedule might be overlapping? How does it look like when you set the interval to, say, 10 minutes?

Contributor

timoreimann commented Dec 25, 2017

Aren't you risking then that restarting jobs from a failed schedule and those from a succeeding schedule might be overlapping? How does it look like when you set the interval to, say, 10 minutes?

@innovia

This comment has been minimized.

Show comment
Hide comment
@innovia

innovia Dec 25, 2017

@timoreimann same result with 10 minutes interval

innovia commented Dec 25, 2017

@timoreimann same result with 10 minutes interval

@timoreimann

This comment has been minimized.

Show comment
Hide comment
@timoreimann

timoreimann Dec 25, 2017

Contributor

@innovia I ran a simplified version of your manifest against my Kubernetes 1.8.2 cluster with the command field replaced by /bin/bash exit 1 so that every run fails, the schedule set to every 5 minutes, and the back-off limit reduced to 2. Here's my manifest:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/5 * * * *"
  jobTemplate:
    spec:
      backoffLimit: 2
      template:
        spec:
          containers:
          - name: cron-job
            image: ubuntu
            command:
            - "/bin/bash"
            - "-c"
            - "exit 1"
          restartPolicy: Never

Initially, I get a series of three failing pods:

Mon Dec 25 23:15:28 CET 2017
NAME                     READY     STATUS    RESTARTS   AGE
hello-1514240100-7mktd   0/1       Error     0          19s
hello-1514240100-bf6rw   0/1       Error     0          5s
hello-1514240100-h44kz   0/1       Error     0          15s

Then it takes pretty much 5 minutes for the next batch to start up and fail:

Mon Dec 25 23:20:23 CET 2017
NAME                     READY     STATUS    RESTARTS   AGE
hello-1514240400-cj4bn   0/1       Error     0          4s
hello-1514240400-shnph   0/1       Error     0          14s

Interestingly, I only get two new containers instead of three. The next batch though comes with three again:

Mon Dec 25 23:25:16 CET 2017
NAME                     READY     STATUS    RESTARTS   AGE
hello-1514240700-jr9sp   0/1       Error     0          3s
hello-1514240700-jxck9   0/1       Error     0          13s
hello-1514240700-x26kg   0/1       Error     0          16s

and so does the final batch I tested:

Mon Dec 25 23:30:16 CET 2017
NAME                     READY     STATUS    RESTARTS   AGE

hello-1514241000-mm8c9   0/1       Error     0          2s
hello-1514241000-pqx8f   0/1       Error     0          16s
hello-1514241000-qjtn5   0/1       Error     0          12s

Apart from the supposed off-by-one error, things seem to work for me.

Contributor

timoreimann commented Dec 25, 2017

@innovia I ran a simplified version of your manifest against my Kubernetes 1.8.2 cluster with the command field replaced by /bin/bash exit 1 so that every run fails, the schedule set to every 5 minutes, and the back-off limit reduced to 2. Here's my manifest:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/5 * * * *"
  jobTemplate:
    spec:
      backoffLimit: 2
      template:
        spec:
          containers:
          - name: cron-job
            image: ubuntu
            command:
            - "/bin/bash"
            - "-c"
            - "exit 1"
          restartPolicy: Never

Initially, I get a series of three failing pods:

Mon Dec 25 23:15:28 CET 2017
NAME                     READY     STATUS    RESTARTS   AGE
hello-1514240100-7mktd   0/1       Error     0          19s
hello-1514240100-bf6rw   0/1       Error     0          5s
hello-1514240100-h44kz   0/1       Error     0          15s

Then it takes pretty much 5 minutes for the next batch to start up and fail:

Mon Dec 25 23:20:23 CET 2017
NAME                     READY     STATUS    RESTARTS   AGE
hello-1514240400-cj4bn   0/1       Error     0          4s
hello-1514240400-shnph   0/1       Error     0          14s

Interestingly, I only get two new containers instead of three. The next batch though comes with three again:

Mon Dec 25 23:25:16 CET 2017
NAME                     READY     STATUS    RESTARTS   AGE
hello-1514240700-jr9sp   0/1       Error     0          3s
hello-1514240700-jxck9   0/1       Error     0          13s
hello-1514240700-x26kg   0/1       Error     0          16s

and so does the final batch I tested:

Mon Dec 25 23:30:16 CET 2017
NAME                     READY     STATUS    RESTARTS   AGE

hello-1514241000-mm8c9   0/1       Error     0          2s
hello-1514241000-pqx8f   0/1       Error     0          16s
hello-1514241000-qjtn5   0/1       Error     0          12s

Apart from the supposed off-by-one error, things seem to work for me.

@timoreimann

This comment has been minimized.

Show comment
Hide comment
@timoreimann

timoreimann Dec 25, 2017

Contributor

Maybe you can double-check using my manifest?

Contributor

timoreimann commented Dec 25, 2017

Maybe you can double-check using my manifest?

@soltysh

This comment has been minimized.

Show comment
Hide comment
@soltysh

soltysh Jan 29, 2018

Contributor

In the original proposal we've stated it should work for both restart policies, I've addressed the problem in #58972. It counts pod's restartCount, if overall (iow. all containers in all pods) number exceeds backoff limit the job will fail.

Contributor

soltysh commented Jan 29, 2018

In the original proposal we've stated it should work for both restart policies, I've addressed the problem in #58972. It counts pod's restartCount, if overall (iow. all containers in all pods) number exceeds backoff limit the job will fail.

@kow3ns kow3ns added this to Backlog in Workloads Feb 27, 2018

Workloads automation moved this from Backlog to Done Apr 19, 2018

k8s-merge-robot added a commit that referenced this issue Apr 19, 2018

Merge pull request #58972 from soltysh/issue54870
Automatic merge from submit-queue (batch tested with PRs 61962, 58972, 62509, 62606). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix job's backoff limit for restart policy OnFailure

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #54870

**Release note**:
```release-note
NONE
```

/assign janetkuo
@zhangguanzhang

This comment has been minimized.

Show comment
Hide comment
@zhangguanzhang

zhangguanzhang Jun 3, 2018

I think the backoffLimit doesn't work at all. My kubernetes version is 1.10.3

[root@k8s-m1 k8s]# cat job.yml 
apiVersion: batch/v1
kind: Job
metadata:
  name: test-job
spec:
  backoffLimit: 2
  template:
    spec:
      containers:
      - name: pi
        image: busybox
        command: ["exit 1"]
      restartPolicy: Never
[root@k8s-m1 k8s]# kubectl apply -f job.yml 
job.batch "test-job" created
[root@k8s-m1 k8s]# kubectl get pod
NAME             READY     STATUS               RESTARTS   AGE
pod-test         1/1       Running              0          6h
test-affinity    1/1       Running              0          6h
test-job-g4r8w   0/1       ContainerCreating    0          1s
test-job-tqbvl   0/1       ContainerCannotRun   0          3s
[root@k8s-m1 k8s]# kubectl get pod
NAME             READY     STATUS               RESTARTS   AGE
pod-test         1/1       Running              0          6h
test-affinity    1/1       Running              0          6h
test-job-2h5vp   0/1       ContainerCannotRun   0          13s
test-job-2klzs   0/1       ContainerCannotRun   0          15s
test-job-4c2xz   0/1       ContainerCannotRun   0          24s
test-job-6bcqc   0/1       ContainerCannotRun   0          8s
test-job-6r7wq   0/1       ContainerCannotRun   0          11s
test-job-7cz7c   0/1       Terminating          0          8s
test-job-7mkdn   0/1       Terminating          0          16s
test-job-88ws8   0/1       ContainerCannotRun   0          26s
test-job-bqfk4   0/1       ContainerCannotRun   0          22s
test-job-jh7dp   0/1       ContainerCannotRun   0          4s
test-job-k2c4r   0/1       ContainerCannotRun   0          18s
test-job-qfj7m   0/1       ContainerCannotRun   0          6s
test-job-r8794   0/1       ContainerCreating    0          1s
test-job-r9gz6   0/1       Terminating          0          6s
test-job-w4f9r   0/1       Terminating          0          1s
[root@k8s-m1 k8s]# kubectl delete  -f job.yml 
job.batch "test-job" deleted

zhangguanzhang commented Jun 3, 2018

I think the backoffLimit doesn't work at all. My kubernetes version is 1.10.3

[root@k8s-m1 k8s]# cat job.yml 
apiVersion: batch/v1
kind: Job
metadata:
  name: test-job
spec:
  backoffLimit: 2
  template:
    spec:
      containers:
      - name: pi
        image: busybox
        command: ["exit 1"]
      restartPolicy: Never
[root@k8s-m1 k8s]# kubectl apply -f job.yml 
job.batch "test-job" created
[root@k8s-m1 k8s]# kubectl get pod
NAME             READY     STATUS               RESTARTS   AGE
pod-test         1/1       Running              0          6h
test-affinity    1/1       Running              0          6h
test-job-g4r8w   0/1       ContainerCreating    0          1s
test-job-tqbvl   0/1       ContainerCannotRun   0          3s
[root@k8s-m1 k8s]# kubectl get pod
NAME             READY     STATUS               RESTARTS   AGE
pod-test         1/1       Running              0          6h
test-affinity    1/1       Running              0          6h
test-job-2h5vp   0/1       ContainerCannotRun   0          13s
test-job-2klzs   0/1       ContainerCannotRun   0          15s
test-job-4c2xz   0/1       ContainerCannotRun   0          24s
test-job-6bcqc   0/1       ContainerCannotRun   0          8s
test-job-6r7wq   0/1       ContainerCannotRun   0          11s
test-job-7cz7c   0/1       Terminating          0          8s
test-job-7mkdn   0/1       Terminating          0          16s
test-job-88ws8   0/1       ContainerCannotRun   0          26s
test-job-bqfk4   0/1       ContainerCannotRun   0          22s
test-job-jh7dp   0/1       ContainerCannotRun   0          4s
test-job-k2c4r   0/1       ContainerCannotRun   0          18s
test-job-qfj7m   0/1       ContainerCannotRun   0          6s
test-job-r8794   0/1       ContainerCreating    0          1s
test-job-r9gz6   0/1       Terminating          0          6s
test-job-w4f9r   0/1       Terminating          0          1s
[root@k8s-m1 k8s]# kubectl delete  -f job.yml 
job.batch "test-job" deleted
@soltysh

This comment has been minimized.

Show comment
Hide comment
@soltysh

soltysh Jun 5, 2018

Contributor

See #62382.

Contributor

soltysh commented Jun 5, 2018

See #62382.

diegodelemos added a commit to diegodelemos/reana-job-controller that referenced this issue Jun 6, 2018

k8s: use restartpolicy never
* This will cause that for each retry a new pod will be started instead
  of trying with the same all the time. Even though it seems to be fixed
  kubernetes/kubernetes#54870 (comment)
  in Kubernetes 1.9.4 backoffLimit is ignored with OnFailure policy.

diegodelemos added a commit to diegodelemos/reana-job-controller that referenced this issue Jun 6, 2018

installation: use kubernetes official client
* Upgrades code to support Kubernetes 1.9.4 (1.10 compatible already).

* Uses restartpolicy never. This will cause that for each retry a new pod
  will be started instead of trying with the same all the time. Even though
  it seems to be fixed kubernetes/kubernetes#54870 (comment)
  in Kubernetes 1.9.4 backoffLimit is ignored with OnFailure policy.

diegodelemos added a commit to diegodelemos/reana-job-controller that referenced this issue Jun 6, 2018

installation: use kubernetes official client
* Upgrades code to support Kubernetes 1.9.4 (1.10 compatible already).

* Uses restartpolicy never. This will cause that for each retry a new pod
  will be started instead of trying with the same all the time. Even though
  it seems to be fixed kubernetes/kubernetes#54870 (comment)
  in Kubernetes 1.9.4 backoffLimit is ignored with OnFailure policy.

diegodelemos added a commit to diegodelemos/reana-job-controller that referenced this issue Jun 6, 2018

installation: use kubernetes official client
* Upgrades code to support Kubernetes 1.9.4 (1.10 compatible already).

* Uses restartpolicy never. This will cause that for each retry a new pod
  will be started instead of trying with the same all the time. Even though
  it seems to be fixed kubernetes/kubernetes#54870 (comment)
  in Kubernetes 1.9.4 backoffLimit is ignored with OnFailure policy.

diegodelemos added a commit to diegodelemos/reana-job-controller that referenced this issue Jun 7, 2018

installation: use kubernetes official client
* Upgrades code to support Kubernetes 1.9.4 (1.10 compatible already).

* Uses restartpolicy never. This will cause that for each retry a new pod
  will be started instead of trying with the same all the time. Even though
  it seems to be fixed kubernetes/kubernetes#54870 (comment)
  in Kubernetes 1.9.4 backoffLimit is ignored with OnFailure policy.

diegodelemos added a commit to diegodelemos/reana-job-controller that referenced this issue Jun 7, 2018

installation: use kubernetes official client
* Upgrades code to support Kubernetes 1.9.4 (1.10 compatible already).

* Uses restartpolicy never. This will cause that for each retry a new pod
  will be started instead of trying with the same all the time. Even though
  it seems to be fixed kubernetes/kubernetes#54870 (comment)
  in Kubernetes 1.9.4 backoffLimit is ignored with OnFailure policy.

diegodelemos added a commit to diegodelemos/reana-job-controller that referenced this issue Jun 7, 2018

installation: use kubernetes official client
* Upgrades code to support Kubernetes 1.9.4 (1.10 compatible already).

* Uses restartpolicy never. This will cause that for each retry a new pod
  will be started instead of trying with the same all the time. Even though
  it seems to be fixed kubernetes/kubernetes#54870 (comment)
  in Kubernetes 1.9.4 backoffLimit is ignored with OnFailure policy.

diegodelemos added a commit to diegodelemos/reana-job-controller that referenced this issue Jun 7, 2018

installation: use kubernetes official client
* Upgrades code to support Kubernetes 1.9.4 (1.10 compatible already).

* Uses restartpolicy never. This will cause that for each retry a new pod
  will be started instead of trying with the same all the time. Even though
  it seems to be fixed kubernetes/kubernetes#54870 (comment)
  in Kubernetes 1.9.4 backoffLimit is ignored with OnFailure policy.

diegodelemos added a commit to diegodelemos/reana-job-controller that referenced this issue Jun 8, 2018

installation: use kubernetes official client
* Upgrades code to support Kubernetes 1.9.4 (1.10 compatible already).

* Uses restartpolicy never. This will cause that for each retry a new pod
  will be started instead of trying with the same all the time. Even though
  it seems to be fixed kubernetes/kubernetes#54870 (comment)
  in Kubernetes 1.9.4 backoffLimit is ignored with OnFailure policy.

diegodelemos added a commit to diegodelemos/reana-job-controller that referenced this issue Jun 8, 2018

installation: use kubernetes official client
* Upgrades code to support Kubernetes 1.9.4 (1.10 compatible already).

* Uses restartpolicy never. This will cause that for each retry a new pod
  will be started instead of trying with the same all the time. Even though
  it seems to be fixed kubernetes/kubernetes#54870 (comment)
  in Kubernetes 1.9.4 backoffLimit is ignored with OnFailure policy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment