Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Pod stuck in Pending due to kubelet skip to create pod sandbox #3760

Closed
yqwang-ms opened this issue Oct 21, 2019 · 3 comments
Closed

Pod stuck in Pending due to kubelet skip to create pod sandbox #3760

yqwang-ms opened this issue Oct 21, 2019 · 3 comments
Assignees

Comments

@yqwang-ms
Copy link
Member

yqwang-ms commented Oct 21, 2019

Organization Name:

Short summary about the issue/question:
kubernetes/kubernetes#79398
kubernetes/kubernetes#79451

Long term solution is (1.17 k8s) kubernetes/kubernetes@3fac48f

Short term workaround:
Change Pod restartPolicy=OnFailure for elastic job (TBD: and training job)

Brief what process you are following:

How to reproduce it:

OpenPAI Environment:

  • OpenPAI version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Hardware (e.g. core number, memory size, storage size, GPU type etc.):
  • Others:

Anything else we need to know:

related log:

kuberuntime_manager.go:427] No ready sandbox for pod "..." can be found. Need to start a new one
@yqwang-ms
Copy link
Member Author

@abuccts pls help to do the change for elastic job :)

@yqwang-ms
Copy link
Member Author

let's hold on this first to test if 1.16.0 can be in this release

abuccts added a commit that referenced this issue Nov 15, 2019
Update restart policy to avoid stuck pending pods #3760
abuccts added a commit that referenced this issue Nov 18, 2019
* Update restart policy to avoid stuck pending pods

Update restart policy to avoid stuck pending pods #3760

* Add comments

Add comments
@abuccts abuccts closed this as completed Nov 25, 2019
@yqwang-ms
Copy link
Member Author

Tracked in #3863.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants