New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RunContainerError: "runContainer: operation timeout: context deadline exceeded" #39028
Comments
It seems the root cause is:
Could you please check what happens to your EBS volume: "vol-02ff0c2158d03e018"? btw, EBS volume requires AWS cloud provider, are you using AWS? |
Sorry, forgot to mention, I am, volume was attached (as seen by |
Any could you please provide related kubelet's log? Is there something suspicious? |
I did, from the original post:
Line 2 and 3 are the error, and I included 1,4 and 5 for context. |
The default timeout of Docker operation of kubelet is 2 min, if you are not running in very high density, it should be enough. So what's the status of this container in |
Output for
And I could start the container with |
So we can make sure it is due to EBS volume. I can see a very suspicious ERROR: |
Sadly not, we had to recreate the the deployment and the volume, but I will check next time we run into this. |
Hi, we have a very similar setup as above (CoreOS , docker 1.12.6 , k8s 1.5.2, a prometheus Pod with EBS) and are seeing the same issue for the prometheus Pod and any other pods subsequently scheduled on the same node. After a lot of digging, our hypothesis is that the prometheus process, upon starting, tries to go through all the data in the EBS volume (~24 GB for us) which causes it to max out the provisioned IOPS and thus throttle any IO on that volume. The IOPS are definitely maxed out, we've confirmed via the AWS CW metrics. Could the kubelet be making a call which hits a code path similar to Here are the relevant kubelet logs:
|
We have the same problems with a cluster on google gke, the solution i found so far is to provision again the node, but is becoming quite unacceptable as we are using the cluster to do continuos integration of several projects and there is no automation that can handle the docker failure on a node. |
We are also hitting this issue after upgrading to v1.5.4 from v1.4.7. We noticed that in v1.5.4, kubelet adds a Our running theory is that this causes libcontainer to try to recursively label all the files in the prometheus volume, which has a lot of files, and the container create request times out. |
cc @pmorie, author of 21116a7. Does my theory in the comment above make sense? The relevant docker code is |
The problem AFAIK is a more extended and includes also older Docker
releases and it's not limited just to volumes, I think that the error
message is quite generic to do a complete analysis.
Maybe k8s should do some sanity check to Docker/containerd to understand
the reliability of pods provisioning.
Il giorno ven 24 mar 2017 alle 09:28 guoshimin <notifications@github.com>
ha scritto:
cc @pmorie <https://github.com/pmorie>, author of 21116a7
<21116a7>.
Does my theory in the comment above make sense?
The relevant docker code is
https://github.com/docker/docker/blob/v1.12.6/volume/volume.go#L127-L131,
https://github.com/docker/runc/blob/54296cf40ad8143b62dbcaa1d90e520a2136ddfe/libcontainer/label/label_selinux.go#L144,
and
https://github.com/docker/runc/blob/54296cf40ad8143b62dbcaa1d90e520a2136ddfe/libcontainer/selinux/selinux.go#L481-L483
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#39028 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAiK1FVaGbJXiitphxQMNSb6e1Rsuw5ks5ro361gaJpZM4LR-ln>
.
--
*SPARKFABRIK*
*Paolo Mainardi*
CTO & Founder
tel: +39-340-1678089
skype: paolo_mainardi
http://www.sparkfabrik.com
ᐧ
|
For me restarting docker on the node is quicker that provisioning a new node @paolomainardi |
Restarting the node is not always the answert, sometimes Docker can get
stucked because disk is full or broken inodes.
Il giorno ven 24 mar 2017 alle 14:28 Duncan McNaught <
notifications@github.com> ha scritto:
… For me restarting docker on the node is quicker that provisioning a new
node @paolomainardi <https://github.com/paolomainardi>
I am seeing this mostly on the node running a prometheus container
(kube-prometheus)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#39028 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAiK-t8Dd9KyPLQZMtULEIA_vvscUIwks5ro8T1gaJpZM4LR-ln>
.
|
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Kubernetes version:
OS Version:
CoreOS stable (1185.5.0)
From the kubelet logs:
docker ps:
No logs on the container:
docker inspect:
kubectl describe pod:
The text was updated successfully, but these errors were encountered: