Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Completed" Container should not be in back-off that resulting in "CrashLoopBackOff" #53125

Closed
dixudx opened this issue Sep 27, 2017 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@dixudx
Copy link
Member

dixudx commented Sep 27, 2017

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug
/sig scheduling

What happened:
Deploy a simple pod that let it exit normally.

apiVersion: v1
kind: Pod
metadata:
  name: demo
  labels:
    purpose: demonstrate
spec:
  containers:
  - name: command-demo
    image: nginx:1.10
    command: ["echo"]
    args: ["$HOSTNAME"]

After it finished successfully, the pod run into a back-off loop that kept restarting again to "CrashLoopBackOff".

$ kubectl get pod
NAME                    READY     STATUS             RESTARTS   AGE
demo                    0/1       CrashLoopBackOff   24         1h
$ kubectl describe pod demo
Name:		demo
Namespace:	default
Node:		172.17.8.102/172.17.8.102
Start Time:	Wed, 27 Sep 2017 15:38:22 +0800
Labels:		purpose=demonstrate
Annotations:	<none>
Status:		Running
IP:		10.244.99.9
Containers:
  command-demo:
    Container ID:	docker://ca01ccae7431003bb4c7aeab1811d429bd290cd15474bb2b90a59f90cdc5fe3a
    Image:		nginx:1.10
    Image ID:		docker-pullable://nginx@sha256:6202beb06ea61f44179e02ca965e8e13b961d12640101fca213efbfd145d7575
    Port:		<none>
    Command:
      echo
    Args:
      HOSTNAME
    State:		Waiting
      Reason:		CrashLoopBackOff
    Last State:		Terminated
      Reason:		Completed
      Exit Code:	0
      Started:		Wed, 27 Sep 2017 15:41:24 +0800
      Finished:		Wed, 27 Sep 2017 15:41:24 +0800
    Ready:		False
    Restart Count:	5
    Environment:	<none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8td07 (ro)
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  default-token-8td07:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-8td07
    Optional:	false
QoS Class:	BestEffort
Node-Selectors:	<none>
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath			Type		Reason			Message
  ---------	--------	-----	----			-------------			--------	------			-------
  4m		4m		1	default-scheduler					Normal		Scheduled		Successfully assigned demo to 172.17.8.102
  4m		4m		1	kubelet, 172.17.8.102					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "default-token-8td07" 
  4m		3m		4	kubelet, 172.17.8.102	spec.containers{command-demo}	Normal		Pulled			Container image "nginx:1.10" already present on machine
  4m		3m		4	kubelet, 172.17.8.102	spec.containers{command-demo}	Normal		Created			Created container
  4m		3m		4	kubelet, 172.17.8.102	spec.containers{command-demo}	Normal		Started			Started container
  4m		3m		6	kubelet, 172.17.8.102	spec.containers{command-demo}	Warning		BackOff			Back-off restarting failed container
  4m		3m		6	kubelet, 172.17.8.102					Warning		FailedSync		Error syncing pod

What you expected to happen:
To be in "Completed" status

How to reproduce it (as minimally and precisely as possible):
As above

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
    master branch
  • Cloud provider or hardware configuration**:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. labels Sep 27, 2017
@tallclair
Copy link
Member

I'm going to close this as WAI. The desired behavior in this case is to set the Pod RestartPolicy to "OnFailure". See https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy

@alcohol
Copy link

alcohol commented Oct 22, 2018

@tallclair does that mean there is no way to have a container exit (with a success state / exit code 0) and be restarted without ending up in the crash loop back-off?

At my current employer we have a range of queue consumers that due to language implementation limitations and other assorted reasons, we gracefully exit after a minute of idle time and restart. While trying to migrate these processes to kubernetes, we've run into the problem that they end up in said loop.

It seems to me that that the crash loop back-off should only apply to containers that actually crashed? If a container exits gracefully, I am not sure why that is considered a crash?

@absolutejam
Copy link

This is exactly my use case and it seems like a scheduler level issue, not application level. Having to wrap what seems like correct behaviour in a supervising process adds another level of complication (signal handling etc.)

@tallclair
Copy link
Member

It seems to me that that the crash loop back-off should only apply to containers that actually crashed? If a container exits gracefully, I am not sure why that is considered a crash?

The system has to do work to restart the container. If the container is continuously exiting (even if it's success), then that creates a lot of churn for the system without a backoff.

@tmartin
Copy link

tmartin commented Jun 27, 2024

Hi there, we have exactly the same situation: our consumers/workers pulling messages from a FIFO needs to restart regularly due to technical limitations. We thought K8S would handle this for us but we just discovered this is not the case due to the CrashLoopBackOff.

We end up having to use a process manager like supervisord inside our container as a workaround.

Is there a change this behavior will change in the future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants