Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods stuck in ContainerCreating state #973

Closed
ionutleca opened this issue May 11, 2021 · 7 comments
Closed

Pods stuck in ContainerCreating state #973

ionutleca opened this issue May 11, 2021 · 7 comments
Assignees
Labels
kind/bug Something isn't working kind/upstream-issue This issue appears to be caused by an upstream bug

Comments

@ionutleca
Copy link

ionutleca commented May 11, 2021

Environmental Info:
RKE2 Version:
v1.20.6+rke2r1 (da4fc2f)

Node(s) CPU architecture, OS, and Version:
Linux server0 4.18.0-193.47.1.el8_2.x86_64 SMP Thu Mar 4 03:03:32 EST 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:
1 server

Describe the bug:
When trying to deploy more than ~100 pods we get pods stuck in the ContainerCreating state.
After further digging into the node logs we found that we have results similar to the ones reported here opencontainers/runc#2865

Steps To Reproduce:

  • Installed RKE2:
selinux: false
write-kubeconfig-mode: "0644"
etcd-snapshot-retention: 7
etcd-snapshot-schedule-cron: "*/5 * * * *"
tls-san:
- ***
disable:
- rke2-ingress-nginx
kubelet-arg:
- "max-pods=200"

Expected behavior:
Being able to create pods up to the 200 limit

Actual behavior:
Pods are stuck in ContainerCreating well before the 200 limit

Additional context / logs:
Looks like upgrading the runC version to 1.0.0-rc94 from 1.0.0-rc93 (opencontainers/runc#2871) does the trick.
We replaced the runC binary with the new version and the rke2 cluster seemed to successfully schedule pods up to the specified limit.

@brandond brandond added this to To Triage in Development [DEPRECATED] via automation May 11, 2021
@brandond brandond added kind/bug Something isn't working kind/upstream-issue This issue appears to be caused by an upstream bug labels May 11, 2021
@brandond brandond added this to the v1.21.1+rke2r1 milestone May 11, 2021
@brandond brandond moved this from To Triage to Next Up in Development [DEPRECATED] May 11, 2021
@brandond brandond moved this from Next Up to Backlog in Development [DEPRECATED] May 11, 2021
@Oats87
Copy link
Contributor

Oats87 commented May 12, 2021

@ionutleca how long are you leaving your node with the containers in "ContainerCreating" state? Do these containers ever get past this stage? In our testing with rc93, we have seen nodes hit this issue but end up getting past the backlog with a little bit of time.

@ionutleca
Copy link
Author

They were stuck in that state >10 minutes. Since that delay is not acceptable for us, we never waited to see how much they actually stay in that state, considering that we experience no such delays with the fixed runC version.

@davidnuzik davidnuzik moved this from Backlog to Next Up in Development [DEPRECATED] May 12, 2021
@cjellick
Copy link
Contributor

@dweomer @brandond @Oats87 will this be addressed by our bump to runc?

@cjellick
Copy link
Contributor

assigning to jacob assuming so

@rajivml
Copy link

rajivml commented May 12, 2021

@Oats87 no they never pass the ContainerCreating state, I have seen them stuck for more than a day

@davidnuzik davidnuzik moved this from Next Up to Peer Review in Development [DEPRECATED] May 12, 2021
@davidnuzik
Copy link
Contributor

k3s-io/k3s#3305

@rancher-max
Copy link
Contributor

Validated this is working on v1.21.1-alpha5+rke2r1 and v1.20.7-rc1+rke2r1

Pods no longer get stuck in this state. Can increase pod limit successfully via kubelet arg.

Development [DEPRECATED] automation moved this from To Test to Done Issue / Merged PR May 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working kind/upstream-issue This issue appears to be caused by an upstream bug
Projects
No open projects
Development [DEPRECATED]
Done Issue / Merged PR
Development

No branches or pull requests

8 participants