Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
RunContainerError: "runContainer: operation timeout: context deadline exceeded" #39028
Comments
|
It seems the root cause is:
Could you please check what happens to your EBS volume: "vol-02ff0c2158d03e018"? btw, EBS volume requires AWS cloud provider, are you using AWS? |
george-angel
commented
Dec 21, 2016
|
Sorry, forgot to mention, I am, volume was attached (as seen by |
|
Any could you please provide related kubelet's log? Is there something suspicious? |
george-angel
commented
Dec 21, 2016
•
|
I did, from the original post:
Line 2 and 3 are the error, and I included 1,4 and 5 for context. |
|
The default timeout of Docker operation of kubelet is 2 min, if you are not running in very high density, it should be enough. So what's the status of this container in |
george-angel
commented
Dec 21, 2016
|
Output for
And I could start the container with |
|
So we can make sure it is due to EBS volume. I can see a very suspicious ERROR: |
george-angel
commented
Dec 22, 2016
|
Sadly not, we had to recreate the the deployment and the volume, but I will check next time we run into this. |
|
Hi, we have a very similar setup as above (CoreOS , docker 1.12.6 , k8s 1.5.2, a prometheus Pod with EBS) and are seeing the same issue for the prometheus Pod and any other pods subsequently scheduled on the same node. After a lot of digging, our hypothesis is that the prometheus process, upon starting, tries to go through all the data in the EBS volume (~24 GB for us) which causes it to max out the provisioned IOPS and thus throttle any IO on that volume. The IOPS are definitely maxed out, we've confirmed via the AWS CW metrics. Could the kubelet be making a call which hits a code path similar to Here are the relevant kubelet logs:
|
dimpavloff
referenced this issue
Mar 7, 2017
Open
ContainerGCFailed / ImageGCFailed context deadline exceeded #42164
paolomainardi
commented
Mar 15, 2017
|
We have the same problems with a cluster on google gke, the solution i found so far is to provision again the node, but is becoming quite unacceptable as we are using the cluster to do continuos integration of several projects and there is no automation that can handle the docker failure on a node. |
|
We are also hitting this issue after upgrading to v1.5.4 from v1.4.7. We noticed that in v1.5.4, kubelet adds a Our running theory is that this causes libcontainer to try to recursively label all the files in the prometheus volume, which has a lot of files, and the container create request times out. |
|
cc @pmorie, author of 21116a7. Does my theory in the comment above make sense? The relevant docker code is |
paolomainardi
commented
Mar 24, 2017
|
The problem AFAIK is a more extended and includes also older Docker
releases and it's not limited just to volumes, I think that the error
message is quite generic to do a complete analysis.
Maybe k8s should do some sanity check to Docker/containerd to understand
the reliability of pods provisioning.
Il giorno ven 24 mar 2017 alle 09:28 guoshimin <notifications@github.com>
ha scritto:
cc @pmorie <https://github.com/pmorie>, author of 21116a7
<21116a7>.
Does my theory in the comment above make sense?
The relevant docker code is
https://github.com/docker/docker/blob/v1.12.6/volume/volume.go#L127-L131,
https://github.com/docker/runc/blob/54296cf40ad8143b62dbcaa1d90e520a2136ddfe/libcontainer/label/label_selinux.go#L144,
and
https://github.com/docker/runc/blob/54296cf40ad8143b62dbcaa1d90e520a2136ddfe/libcontainer/selinux/selinux.go#L481-L483
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#39028 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAiK1FVaGbJXiitphxQMNSb6e1Rsuw5ks5ro361gaJpZM4LR-ln>
.
--
*SPARKFABRIK*
*Paolo Mainardi*
CTO & Founder
tel: +39-340-1678089
skype: paolo_mainardi
http://www.sparkfabrik.com
ᐧ
|
dmcnaught
commented
Mar 24, 2017
|
For me restarting docker on the node is quicker that provisioning a new node @paolomainardi |
paolomainardi
commented
Mar 24, 2017
|
Restarting the node is not always the answert, sometimes Docker can get
stucked because disk is full or broken inodes.
Il giorno ven 24 mar 2017 alle 14:28 Duncan McNaught <
notifications@github.com> ha scritto:
…
|
This was referenced Mar 27, 2017
k8s-merge-robot
added
the
needs-sig
label
May 31, 2017
derekwaynecarr
added
sig/node
and removed
needs-sig
labels
Jun 5, 2017
dimpavloff
referenced this issue
Jul 5, 2017
Open
Cannot retrieve logs of running pod via kubectl #45911
fejta-bot
commented
Dec 26, 2017
|
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
george-angel commentedDec 20, 2016
Kubernetes version:
OS Version:
CoreOS stable (1185.5.0)From the kubelet logs:
docker ps:
No logs on the container:
docker inspect:
kubectl describe pod: