"Completed" Container should not be in back-off that resulting in "CrashLoopBackOff" #53125

dixudx · 2017-09-27T09:06:54Z

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug
/sig scheduling

What happened:
Deploy a simple pod that let it exit normally.

apiVersion: v1
kind: Pod
metadata:
  name: demo
  labels:
    purpose: demonstrate
spec:
  containers:
  - name: command-demo
    image: nginx:1.10
    command: ["echo"]
    args: ["$HOSTNAME"]

After it finished successfully, the pod run into a back-off loop that kept restarting again to "CrashLoopBackOff".

$ kubectl get pod
NAME                    READY     STATUS             RESTARTS   AGE
demo                    0/1       CrashLoopBackOff   24         1h

$ kubectl describe pod demo
Name:		demo
Namespace:	default
Node:		172.17.8.102/172.17.8.102
Start Time:	Wed, 27 Sep 2017 15:38:22 +0800
Labels:		purpose=demonstrate
Annotations:	<none>
Status:		Running
IP:		10.244.99.9
Containers:
  command-demo:
    Container ID:	docker://ca01ccae7431003bb4c7aeab1811d429bd290cd15474bb2b90a59f90cdc5fe3a
    Image:		nginx:1.10
    Image ID:		docker-pullable://nginx@sha256:6202beb06ea61f44179e02ca965e8e13b961d12640101fca213efbfd145d7575
    Port:		<none>
    Command:
      echo
    Args:
      HOSTNAME
    State:		Waiting
      Reason:		CrashLoopBackOff
    Last State:		Terminated
      Reason:		Completed
      Exit Code:	0
      Started:		Wed, 27 Sep 2017 15:41:24 +0800
      Finished:		Wed, 27 Sep 2017 15:41:24 +0800
    Ready:		False
    Restart Count:	5
    Environment:	<none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8td07 (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  default-token-8td07:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-8td07
    Optional:	false
QoS Class:	BestEffort
Node-Selectors:	<none>
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath			Type		Reason			Message
  ---------	--------	-----	----			-------------			--------	------			-------
  4m		4m		1	default-scheduler					Normal		Scheduled		Successfully assigned demo to 172.17.8.102
  4m		4m		1	kubelet, 172.17.8.102					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "default-token-8td07" 
  4m		3m		4	kubelet, 172.17.8.102	spec.containers{command-demo}	Normal		Pulled			Container image "nginx:1.10" already present on machine
  4m		3m		4	kubelet, 172.17.8.102	spec.containers{command-demo}	Normal		Created			Created container
  4m		3m		4	kubelet, 172.17.8.102	spec.containers{command-demo}	Normal		Started			Started container
  4m		3m		6	kubelet, 172.17.8.102	spec.containers{command-demo}	Warning		BackOff			Back-off restarting failed container
  4m		3m		6	kubelet, 172.17.8.102					Warning		FailedSync		Error syncing pod

What you expected to happen:
To be in "Completed" status

How to reproduce it (as minimally and precisely as possible):
As above

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):
master branch
Cloud provider or hardware configuration**:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

tallclair · 2017-09-28T00:26:58Z

I'm going to close this as WAI. The desired behavior in this case is to set the Pod RestartPolicy to "OnFailure". See https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy

alcohol · 2018-10-22T11:59:38Z

@tallclair does that mean there is no way to have a container exit (with a success state / exit code 0) and be restarted without ending up in the crash loop back-off?

At my current employer we have a range of queue consumers that due to language implementation limitations and other assorted reasons, we gracefully exit after a minute of idle time and restart. While trying to migrate these processes to kubernetes, we've run into the problem that they end up in said loop.

It seems to me that that the crash loop back-off should only apply to containers that actually crashed? If a container exits gracefully, I am not sure why that is considered a crash?

absolutejam · 2020-06-17T09:15:18Z

This is exactly my use case and it seems like a scheduler level issue, not application level. Having to wrap what seems like correct behaviour in a supervising process adds another level of complication (signal handling etc.)

tallclair · 2020-06-26T17:14:01Z

It seems to me that that the crash loop back-off should only apply to containers that actually crashed? If a container exits gracefully, I am not sure why that is considered a crash?

The system has to do work to restart the container. If the container is continuously exiting (even if it's success), then that creates a lot of churn for the system without a backoff.

tmartin · 2024-06-27T06:30:44Z

Hi there, we have exactly the same situation: our consumers/workers pulling messages from a FIFO needs to restart regularly due to technical limitations. We thought K8S would handle this for us but we just discovered this is not the case due to the CrashLoopBackOff.

We end up having to use a process manager like supervisord inside our container as a workaround.

Is there a change this behavior will change in the future?

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Sep 27, 2017

dixudx mentioned this issue Sep 27, 2017

not backoff containers that are completed #53126

Closed

tallclair closed this as completed Sep 28, 2017

wallrj mentioned this issue Mar 16, 2018

Remove the sysctl feature and add documentation instead jetstack/navigator#287

Merged

pkosiec mentioned this issue Sep 27, 2022

Config update detection for runtime config is too slow kubeshop/botkube#765

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Completed" Container should not be in back-off that resulting in "CrashLoopBackOff" #53125

"Completed" Container should not be in back-off that resulting in "CrashLoopBackOff" #53125

dixudx commented Sep 27, 2017

tallclair commented Sep 28, 2017

alcohol commented Oct 22, 2018

absolutejam commented Jun 17, 2020

tallclair commented Jun 26, 2020

tmartin commented Jun 27, 2024

"Completed" Container should not be in back-off that resulting in "CrashLoopBackOff" #53125

"Completed" Container should not be in back-off that resulting in "CrashLoopBackOff" #53125

Comments

dixudx commented Sep 27, 2017

tallclair commented Sep 28, 2017

alcohol commented Oct 22, 2018

absolutejam commented Jun 17, 2020

tallclair commented Jun 26, 2020

tmartin commented Jun 27, 2024