Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker container failing to start on GKE #552

Closed
isarkis opened this issue May 19, 2022 · 6 comments
Closed

Docker container failing to start on GKE #552

isarkis opened this issue May 19, 2022 · 6 comments
Assignees
Labels
duplicate This issue or pull request already exists
Projects

Comments

@isarkis
Copy link

isarkis commented May 19, 2022

I am facing a strange problem in GKE where OCI runtime creation is failing like so:

`

/usr/local/bin/docker create --name 353bb0779b0949a789aad2ce0c2fd4cf_alpine314_a47c67 --label 60e226 --workdir /__w/cobalt/cobalt --network github_network_09b19a3178cb4d4cb0e470134b93709d -e "HOME=/github/home" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/runner/_work":"/__w" -v "/runner/externals":"/__e":ro -v "/runner/_work/_temp":"/__w/_temp" -v "/runner/_work/_actions":"/__w/_actions" -v "/opt/hostedtoolcache":"/__t" -v "/runner/_work/_temp/_github_home":"/github/home" -v "/runner/_work/_temp/_github_workflow":"/github/workflow" --entrypoint "tail" alpine:3.14 "-f" "/dev/null"**

e4b70102da9157b4d03fee786560dd5dcf63294f02efc006d342631f982d9789

/usr/local/bin/docker start e4b70102da9157b4d03fee786560dd5dcf63294f02efc006d342631f982d9789**

Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to apply cgroup configuration: mkdir /sys/fs/cgroup/rdma/docker: permission denied: unknown
`

Oddly enough, this error only happens on the new nodes that were autoscaled recently. However, there are handful of nodes that were created last month which still work ok. So, all new nodes that get created experience this issue. I tried creating new cluster, but the problem persists.

What can be causing this permission error?

Node OS: Ubuntu 20.04.4 LTS with containerd
Node Kernel: 5.4.0-1065-gke
Kubelet Version: 1.21.10-gke.2000
Sysbox: 0.5.1 (also tried 0.4.1)

@rodnymolina
Copy link
Member

@isarkis, thanks for filing this one up.

This is a dup of a recently-fixed issue (refer to #544 for details). We are about to publish a new release containing a fix for it (v0.5.2).

@rodnymolina rodnymolina self-assigned this May 19, 2022
@rodnymolina rodnymolina added the duplicate This issue or pull request already exists label May 19, 2022
@rodnymolina rodnymolina added this to To do in Sysbox Dev via automation May 19, 2022
@rodnymolina
Copy link
Member

Fixed in latest release. The sysbox-deploy-k8s images will be updated tomorrow.

@isarkis, please let us know if have any other questions. Closing issue now.

Sysbox Dev automation moved this from To do to Done May 19, 2022
@isarkis
Copy link
Author

isarkis commented May 20, 2022

@rodnymolina, thank you for getting back to me. I noticed 0.5.2 release has been published, but sysbox-deploy-k8s daemonset still points to 0.5.1.

@rodnymolina
Copy link
Member

rodnymolina commented May 20, 2022

That's right @isarkis, i'm working on it (ETA: tomorrow morning if there are no surprises).

@rodnymolina
Copy link
Member

@isarkis, sysbox-deploy-k8s images have been updated now. Please let us know if any issues.

@isarkis
Copy link
Author

isarkis commented May 23, 2022

@rodnymolina, 0.5.2 is working as expected, thank you so much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
Sysbox Dev
  
Done
Development

No branches or pull requests

2 participants