Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RunContainerError: "runContainer: operation timeout: context deadline exceeded" #39028

Closed
george-angel opened this issue Dec 20, 2016 · 18 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@george-angel
Copy link
Contributor

Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:57:05Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1+coreos.0", GitCommit:"cc65f5321f9230bf9a3fa171155c1213d6e3480e", GitTreeState:"clean", BuildDate:"2016-12-14T04:08:28Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

OS Version: CoreOS stable (1185.5.0)

From the kubelet logs:

Dec 20 15:40:30 ip-10-66-21-87.eu-west-1.compute.internal kubelet-wrapper[1401]: I1220 15:40:30.077807    1401 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/75207ba5-c6c6-11e6-8758-06099f2a458d-default-token-3nexu" (spec.Name: "default-token-3nexu") pod "75207ba5-c6c6-11e6-8758-06099f2a458d" (UID: "75207ba5-c6c6-11e6-8758-06099f2a458d").
Dec 20 15:40:31 ip-10-66-21-87.eu-west-1.compute.internal kubelet-wrapper[1401]: E1220 15:40:31.616962    1401 docker_manager.go:2324] container start failed: RunContainerError: runContainer: operation timeout: context deadline exceeded
Dec 20 15:40:31 ip-10-66-21-87.eu-west-1.compute.internal kubelet-wrapper[1401]: E1220 15:40:31.617011    1401 pod_workers.go:184] Error syncing pod 3d09fff2-c6c7-11e6-8758-06099f2a458d, skipping: failed to "StartContainer" for "prometheus" with RunContainerError: "runContainer: operation timeout: context deadline exceeded"
Dec 20 15:40:31 ip-10-66-21-87.eu-west-1.compute.internal kubelet-wrapper[1401]: I1220 15:40:31.880783    1401 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/3d09fff2-c6c7-11e6-8758-06099f2a458d-default-token-0shxu" (spec.Name: "default-token-0shxu") pod "3d09fff2-c6c7-11e6-8758-06099f2a458d" (UID: "3d09fff2-c6c7-11e6-8758-06099f2a458d").
Dec 20 15:40:31 ip-10-66-21-87.eu-west-1.compute.internal kubelet-wrapper[1401]: I1220 15:40:31.881996    1401 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/configmap/3d09fff2-c6c7-11e6-8758-06099f2a458d-config-volume" (spec.Name: "config-volume") pod "3d09fff2-c6c7-11e6-8758-06099f2a458d" (UID: "3d09fff2-c6c7-11e6-8758-06099f2a458d").

docker ps:

8c2ea1120e4a        prom/prometheus:v1.2.1                                                        "/bin/prometheus -sto"   10 seconds ago       Created                                             k8s_prometheus.edf8dc49_prometheus-227300114-v7z1h_system_3d09fff2-c6c7-11e6-8758-06099f2a458d_af434bd8

No logs on the container:

core@ip-10-66-21-87 ~ $ docker logs 8c2ea1120e4a
core@ip-10-66-21-87 ~ $

docker inspect:

[
    {
        "Id": "8c2ea1120e4a8b41ca2a0c4ef9afca5d19d790c633e1b965434a0ec9fca9cd04",
        "Created": "2016-12-20T15:42:32.840639703Z",
        "Path": "/bin/prometheus",
        "Args": [
            "-storage.local.retention=336h",
            "-storage.local.memory-chunks=262144",
            "-config.file=/etc/prometheus/prometheus.yml",
            "-alertmanager.url=http://alertmanager:9093",
            "-web.external-url=https://prometheus.dev.uw.systems"
        ],
        "State": {
            "Status": "created",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "0001-01-01T00:00:00Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:18eb3e2914849e994e7a22ad84e1ce3c7d41a51eec2ce68584728f36181900bc",
        "ResolvConfPath": "",
        "HostnamePath": "",
        "HostsPath": "",
        "LogPath": "",
        "Name": "/k8s_prometheus.edf8dc49_prometheus-227300114-v7z1h_system_3d09fff2-c6c7-11e6-8758-06099f2a458d_af434bd8",
        "RestartCount": 0,
        "Driver": "overlay",
        "MountLabel": "system_u:object_r:svirt_lxc_file_t:s0:c241,c498",
        "ProcessLabel": "system_u:system_r:svirt_lxc_net_t:s0:c241,c498",
        "AppArmorProfile": "",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": null,
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "",
            "PortBindings": null,
            "RestartPolicy": {
                "Name": "",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": null,
            "DnsOptions": null,
            "DnsSearch": null,
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "StorageOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 0,
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": null,
            "OomKillDisable": null,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "BlkioIOps": 0,
            "BlkioBps": 0,
            "SandboxSize": 0
        },
        "GraphDriver": {
            "Name": "overlay",
            "Data": {
                "LowerDir": "/var/lib/docker/overlay/a84216d9ef3e8348e9296b791fa044617d02ede93add2eaf4564effa5c31644d/root",
                "MergedDir": "/var/lib/docker/overlay/473a3409ce77a1846d1260b943d8861844c1b22a23cc32743587771efd2fcf68/merged",
                "UpperDir": "/var/lib/docker/overlay/473a3409ce77a1846d1260b943d8861844c1b22a23cc32743587771efd2fcf68/upper",
                "WorkDir": "/var/lib/docker/overlay/473a3409ce77a1846d1260b943d8861844c1b22a23cc32743587771efd2fcf68/work"
            }
        },
        "Mounts": [],
        "Config": {
            "Hostname": "8c2ea1120e4a",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "9090/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PROMETHEUS_PORT_9090_TCP_PROTO=tcp",
                "ALERTMANAGER_PORT_9093_TCP=tcp://10.3.0.53:9093",
                "ALERTMANAGER_PORT_9093_TCP_PORT=9093",
                "GITLAB_PORT_80_TCP_ADDR=10.3.0.28",
                "EXTERNAL_INGRESS_PROXY_PORT_80_TCP_PORT=80",
                "PROM_PUSHGATEWAY_PORT_80_TCP_ADDR=10.3.0.251",
                "INTERNAL_INGRESS_PROXY_PORT_8080_TCP_PROTO=tcp",
                "PROMETHEUS_SERVICE_PORT=9090",
                "GITLAB_PORT_80_TCP_PROTO=tcp",
                "EXTERNAL_INGRESS_PROXY_SERVICE_PORT_HTTP_ADMIN=8080",
                "EXTERNAL_INGRESS_PROXY_PORT_8080_TCP_PROTO=tcp",
                "INTERNAL_INGRESS_PROXY_SERVICE_PORT_HTTP=80",
                "INTERNAL_INGRESS_PROXY_SERVICE_PORT_HTTP_ADMIN=8080",
                "GITHUB_SSHKEY_PROVIDER_PORT_80_TCP_PROTO=tcp",
                "PROM_PUSHGATEWAY_SERVICE_PORT_PROMETHEUS=80",
                "PROM_PUSHGATEWAY_PORT=tcp://10.3.0.251:80",
                "GITLAB_SERVICE_PORT_HTTPS=443",
                "GITLAB_PORT_443_TCP_PROTO=tcp",
                "GITHUB_SSHKEY_PROVIDER_PORT_80_TCP_PORT=80",
                "PROMETHEUS_SERVICE_HOST=10.3.0.110",
                "PROMETHEUS_PORT_9090_TCP_PORT=9090",
                "GITHUB_SSHKEY_PROVIDER_PORT_6379_TCP_PORT=6379",
                "PROMETHEUS_SERVICE_PORT_PROMETHEUS=9090",
                "PROMETHEUS_PORT_9090_TCP_ADDR=10.3.0.110",
                "GITHUB_SSHKEY_PROVIDER_PORT_80_TCP=tcp://10.3.0.44:80",
                "GITHUB_SSHKEY_PROVIDER_PORT_6379_TCP_PROTO=tcp",
                "BLACKBOX_EXPORTER_SERVICE_PORT_HTTP=80",
                "GRAFANA_PORT_3000_TCP=tcp://10.3.0.113:3000",
                "GITLAB_PORT_443_TCP=tcp://10.3.0.28:443",
                "GITLAB_PORT_22_TCP_PROTO=tcp",
                "PROM_PUSHGATEWAY_PORT_80_TCP_PORT=80",
                "GITLAB_SERVICE_HOST=10.3.0.28",
                "GITLAB_PORT_22_TCP=tcp://10.3.0.28:22",
                "GITHUB_SSHKEY_PROVIDER_SERVICE_PORT_REDIS=6379",
                "GRAFANA_SERVICE_HOST=10.3.0.113",
                "GITHUB_SSHKEY_PROVIDER_PORT_80_TCP_ADDR=10.3.0.44",
                "EXTERNAL_INGRESS_PROXY_PORT_8080_TCP_ADDR=10.3.0.161",
                "KUBERNETES_PORT_443_TCP_ADDR=10.3.0.1",
                "GITLAB_PORT_443_TCP_ADDR=10.3.0.28",
                "INTERNAL_INGRESS_PROXY_SERVICE_PORT=80",
                "INTERNAL_INGRESS_PROXY_PORT_8080_TCP=tcp://10.3.0.176:8080",
                "PROM_PUSHGATEWAY_SERVICE_HOST=10.3.0.251",
                "KUBERNETES_PORT_443_TCP_PORT=443",
                "KUBERNETES_SERVICE_PORT=443",
                "GITLAB_SERVICE_PORT=80",
                "BLACKBOX_EXPORTER_SERVICE_PORT=80",
                "GRAFANA_PORT_3000_TCP_ADDR=10.3.0.113",
                "GITHUB_SSHKEY_PROVIDER_PORT_6379_TCP=tcp://10.3.0.44:6379",
                "EXTERNAL_INGRESS_PROXY_SERVICE_HOST=10.3.0.161",
                "EXTERNAL_INGRESS_PROXY_SERVICE_PORT_HTTP=80",
                "KUBERNETES_PORT_443_TCP=tcp://10.3.0.1:443",
                "GITLAB_PORT_22_TCP_PORT=22",
                "ALERTMANAGER_PORT_9093_TCP_PROTO=tcp",
                "INTERNAL_INGRESS_PROXY_PORT_8080_TCP_PORT=8080",
                "PROMETHEUS_PORT_9090_TCP=tcp://10.3.0.110:9090",
                "KUBERNETES_PORT_443_TCP_PROTO=tcp",
                "EXTERNAL_INGRESS_PROXY_PORT_80_TCP=tcp://10.3.0.161:80",
                "ALERTMANAGER_SERVICE_HOST=10.3.0.53",
                "BLACKBOX_EXPORTER_PORT_80_TCP_PORT=80",
                "PROM_PUSHGATEWAY_SERVICE_PORT=80",
                "PROMETHEUS_PORT=tcp://10.3.0.110:9090",
                "KUBERNETES_SERVICE_HOST=10.3.0.1",
                "EXTERNAL_INGRESS_PROXY_SERVICE_PORT=80",
                "EXTERNAL_INGRESS_PROXY_PORT=tcp://10.3.0.161:80",
                "GRAFANA_PORT_3000_TCP_PORT=3000",
                "PROM_PUSHGATEWAY_PORT_80_TCP=tcp://10.3.0.251:80",
                "INTERNAL_INGRESS_PROXY_PORT_80_TCP_PORT=80",
                "INTERNAL_INGRESS_PROXY_PORT_8080_TCP_ADDR=10.3.0.176",
                "PROM_PUSHGATEWAY_PORT_80_TCP_PROTO=tcp",
                "BLACKBOX_EXPORTER_SERVICE_HOST=10.3.0.3",
                "BLACKBOX_EXPORTER_PORT_80_TCP=tcp://10.3.0.3:80",
                "ALERTMANAGER_PORT=tcp://10.3.0.53:9093",
                "GRAFANA_PORT=tcp://10.3.0.113:3000",
                "INTERNAL_INGRESS_PROXY_PORT=tcp://10.3.0.176:80",
                "EXTERNAL_INGRESS_PROXY_PORT_80_TCP_PROTO=tcp",
                "EXTERNAL_INGRESS_PROXY_PORT_8080_TCP_PORT=8080",
                "EXTERNAL_INGRESS_PROXY_PORT_80_TCP_ADDR=10.3.0.161",
                "GRAFANA_SERVICE_PORT=3000",
                "ALERTMANAGER_PORT_9093_TCP_ADDR=10.3.0.53",
                "GITLAB_PORT_443_TCP_PORT=443",
                "GITLAB_PORT_22_TCP_ADDR=10.3.0.28",
                "GITHUB_SSHKEY_PROVIDER_PORT=tcp://10.3.0.44:80",
                "INTERNAL_INGRESS_PROXY_SERVICE_HOST=10.3.0.176",
                "KUBERNETES_SERVICE_PORT_HTTPS=443",
                "KUBERNETES_PORT=tcp://10.3.0.1:443",
                "BLACKBOX_EXPORTER_PORT=tcp://10.3.0.3:80",
                "ALERTMANAGER_SERVICE_PORT_ALERTMANAGER=9093",
                "GITLAB_SERVICE_PORT_SSH=22",
                "GITLAB_PORT=tcp://10.3.0.28:80",
                "GITHUB_SSHKEY_PROVIDER_SERVICE_HOST=10.3.0.44",
                "GITHUB_SSHKEY_PROVIDER_PORT_6379_TCP_ADDR=10.3.0.44",
                "INTERNAL_INGRESS_PROXY_PORT_80_TCP=tcp://10.3.0.176:80",
                "GITLAB_PORT_80_TCP_PORT=80",
                "GITHUB_SSHKEY_PROVIDER_SERVICE_PORT=80",
                "EXTERNAL_INGRESS_PROXY_PORT_8080_TCP=tcp://10.3.0.161:8080",
                "GRAFANA_PORT_3000_TCP_PROTO=tcp",
                "INTERNAL_INGRESS_PROXY_PORT_80_TCP_ADDR=10.3.0.176",
                "INTERNAL_INGRESS_PROXY_PORT_80_TCP_PROTO=tcp",
                "GITLAB_SERVICE_PORT_HTTP=80",
                "GITLAB_PORT_80_TCP=tcp://10.3.0.28:80",
                "BLACKBOX_EXPORTER_PORT_80_TCP_PROTO=tcp",
                "BLACKBOX_EXPORTER_PORT_80_TCP_ADDR=10.3.0.3",
                "ALERTMANAGER_SERVICE_PORT=9093",
                "GITHUB_SSHKEY_PROVIDER_SERVICE_PORT_WEB=80",
                "GRAFANA_SERVICE_PORT_GRAFANA=3000",
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "-storage.local.retention=336h",
                "-storage.local.memory-chunks=262144",
                "-config.file=/etc/prometheus/prometheus.yml",
                "-alertmanager.url=http://alertmanager:9093",
                "-web.external-url=https://prometheus.dev.uw.systems"
            ],
            "Image": "prom/prometheus:v1.2.1",
            "Volumes": {
                "/prometheus": {}
            },
            "WorkingDir": "/prometheus",
            "Entrypoint": [
                "/bin/prometheus"
            ],
            "OnBuild": null,
            "Labels": {
                "io.kubernetes.container.hash": "edf8dc49",
                "io.kubernetes.container.name": "prometheus",
                "io.kubernetes.container.ports": "[{\"name\":\"web\",\"containerPort\":9090,\"protocol\":\"TCP\"}]",
                "io.kubernetes.container.restartCount": "11",
                "io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
                "io.kubernetes.pod.name": "prometheus-227300114-v7z1h",
                "io.kubernetes.pod.namespace": "system",
                "io.kubernetes.pod.terminationGracePeriod": "30",
                "io.kubernetes.pod.uid": "3d09fff2-c6c7-11e6-8758-06099f2a458d"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": null,
            "SandboxKey": "",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": null
        }
    }
]

kubectl describe pod:

Name:		prometheus-227300114-v7z1h
Namespace:	system
Node:		ip-10-66-21-87.eu-west-1.compute.internal/10.66.21.87
Start Time:	Tue, 20 Dec 2016 15:16:10 +0000
Labels:		app=prometheus
		pod-template-hash=227300114
Status:		Pending
IP:		10.2.56.5
Controllers:	ReplicaSet/prometheus-227300114
Containers:
  prometheus:
    Container ID:	docker://8c2ea1120e4a8b41ca2a0c4ef9afca5d19d790c633e1b965434a0ec9fca9cd04
    Image:		prom/prometheus:v1.2.1
    Image ID:		docker://sha256:18eb3e2914849e994e7a22ad84e1ce3c7d41a51eec2ce68584728f36181900bc
    Port:		9090/TCP
    Args:
      -storage.local.retention=336h
      -storage.local.memory-chunks=262144
      -config.file=/etc/prometheus/prometheus.yml
      -alertmanager.url=http://alertmanager:9093
      -web.external-url=https://prometheus.dev.uw.systems
    Limits:
      cpu:	1500m
      memory:	6000Mi
    Requests:
      cpu:		250m
      memory:		2000Mi
    State:		Waiting
      Reason:		RunContainerError
    Ready:		False
    Restart Count:	11
    Volume Mounts:
      /etc/kubernetes/ssl from ssl (ro)
      /etc/prometheus from config-volume (rw)
      /prometheus from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-0shxu (ro)
    Environment Variables:	<none>
Conditions:
  Type		Status
  Initialized 	True 
  Ready 	False 
  PodScheduled 	True 
Volumes:
  config-volume:
    Type:	ConfigMap (a volume populated by a ConfigMap)
    Name:	prometheus
  data:
    Type:	PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:	prometheus-ebs-pvc
    ReadOnly:	false
  ssl:
    Type:	HostPath (bare host directory volume)
    Path:	/etc/kubernetes/ssl
  default-token-0shxu:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-0shxu
QoS Class:	Burstable
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  29m		29m		1	{default-scheduler }			Normal		Scheduled	Successfully assigned prometheus-227300114-v7z1h to ip-10-66-21-87.eu-west-1.compute.internal
  29m		27m		8	{controller-manager }			Warning		FailedMount	Failed to attach volume "pvc-64b14071-c5db-11e6-8758-06099f2a458d" on node "ip-10-66-21-87.eu-west-1.compute.internal" with: Error attaching EBS volume "vol-02ff0c2158d03e018" to instance "i-01fb7ff93bad20e38": VolumeInUse: vol-02ff0c2158d03e018 is already attached to an instance
		status code: 400, request id: 
  27m		27m	1	{kubelet ip-10-66-21-87.eu-west-1.compute.internal}					Warning	FailedMount	Unable to mount volumes for pod "prometheus-227300114-v7z1h_system(3d09fff2-c6c7-11e6-8758-06099f2a458d)": timeout expired waiting for volumes to attach/mount for pod "prometheus-227300114-v7z1h"/"system". list of unattached/unmounted volumes=[data]
  27m		27m	1	{kubelet ip-10-66-21-87.eu-west-1.compute.internal}					Warning	FailedSync	Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod "prometheus-227300114-v7z1h"/"system". list of unattached/unmounted volumes=[data]
  24m		24m	1	{kubelet ip-10-66-21-87.eu-west-1.compute.internal}	spec.containers{prometheus}	Normal	Pulling		pulling image "prom/prometheus:v1.2.1"
  24m		24m	1	{kubelet ip-10-66-21-87.eu-west-1.compute.internal}	spec.containers{prometheus}	Normal	Pulled		Successfully pulled image "prom/prometheus:v1.2.1"
  22m		43s	12	{kubelet ip-10-66-21-87.eu-west-1.compute.internal}	spec.containers{prometheus}	Warning	Failed		Failed to create docker container "prometheus" of pod "prometheus-227300114-v7z1h_system(3d09fff2-c6c7-11e6-8758-06099f2a458d)" with error: operation timeout: context deadline exceeded
  22m		43s	12	{kubelet ip-10-66-21-87.eu-west-1.compute.internal}					Warning	FailedSync	Error syncing pod, skipping: failed to "StartContainer" for "prometheus" with RunContainerError: "runContainer: operation timeout: context deadline exceeded"

  22m	42s	12	{kubelet ip-10-66-21-87.eu-west-1.compute.internal}	spec.containers{prometheus}	Normal	Pulled	Container image "prom/prometheus:v1.2.1" already present on machine
@resouer
Copy link
Contributor

resouer commented Dec 21, 2016

It seems the root cause is:

Failed to attach volume "pvc-64b14071-c5db-11e6-8758-06099f2a458d" on node "ip-10-66-21-87.eu-west-1.compute.internal" with: Error attaching EBS volume "vol-02ff0c2158d03e018" to instance "i-01fb7ff93bad20e38": VolumeInUse: vol-02ff0c2158d03e018 is already attached to an instance

Could you please check what happens to your EBS volume: "vol-02ff0c2158d03e018"?

btw, EBS volume requires AWS cloud provider, are you using AWS?

@george-angel
Copy link
Contributor Author

Sorry, forgot to mention, I am, volume was attached (as seen by lsblk) and mounted: xvdbb 202:13568 0 400G 0 disk /var/lib/kubelet/pods/5d12738d-c6d1-11e6-8758-06099f2a458d/volumes/kubernetes.io~aws-ebs/pvc-5d05b2bc-c6d1-11e6-a4c9-0a677903e9ed (thats a different volume now, we had to recreate).

@resouer
Copy link
Contributor

resouer commented Dec 21, 2016

Any could you please provide related kubelet's log? Is there something suspicious?

@george-angel
Copy link
Contributor Author

george-angel commented Dec 21, 2016

I did, from the original post:

Dec 20 15:40:30 ip-10-66-21-87.eu-west-1.compute.internal kubelet-wrapper[1401]: I1220 15:40:30.077807    1401 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/75207ba5-c6c6-11e6-8758-06099f2a458d-default-token-3nexu" (spec.Name: "default-token-3nexu") pod "75207ba5-c6c6-11e6-8758-06099f2a458d" (UID: "75207ba5-c6c6-11e6-8758-06099f2a458d").
Dec 20 15:40:31 ip-10-66-21-87.eu-west-1.compute.internal kubelet-wrapper[1401]: E1220 15:40:31.616962    1401 docker_manager.go:2324] container start failed: RunContainerError: runContainer: operation timeout: context deadline exceeded
Dec 20 15:40:31 ip-10-66-21-87.eu-west-1.compute.internal kubelet-wrapper[1401]: E1220 15:40:31.617011    1401 pod_workers.go:184] Error syncing pod 3d09fff2-c6c7-11e6-8758-06099f2a458d, skipping: failed to "StartContainer" for "prometheus" with RunContainerError: "runContainer: operation timeout: context deadline exceeded"
Dec 20 15:40:31 ip-10-66-21-87.eu-west-1.compute.internal kubelet-wrapper[1401]: I1220 15:40:31.880783    1401 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/3d09fff2-c6c7-11e6-8758-06099f2a458d-default-token-0shxu" (spec.Name: "default-token-0shxu") pod "3d09fff2-c6c7-11e6-8758-06099f2a458d" (UID: "3d09fff2-c6c7-11e6-8758-06099f2a458d").
Dec 20 15:40:31 ip-10-66-21-87.eu-west-1.compute.internal kubelet-wrapper[1401]: I1220 15:40:31.881996    1401 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/configmap/3d09fff2-c6c7-11e6-8758-06099f2a458d-config-volume" (spec.Name: "config-volume") pod "3d09fff2-c6c7-11e6-8758-06099f2a458d" (UID: "3d09fff2-c6c7-11e6-8758-06099f2a458d").

Line 2 and 3 are the error, and I included 1,4 and 5 for context.

@resouer
Copy link
Contributor

resouer commented Dec 21, 2016

The default timeout of Docker operation of kubelet is 2 min, if you are not running in very high density, it should be enough.

So what's the status of this container in docker ps? Can we make sure this prometheus container can work by simple docker run?

@george-angel
Copy link
Contributor Author

Output for docker ps is in the original post:

8c2ea1120e4a        prom/prometheus:v1.2.1                                                        "/bin/prometheus -sto"   10 seconds ago       Created                                             k8s_prometheus.edf8dc49_prometheus-227300114-v7z1h_system_3d09fff2-c6c7-11e6-8758-06099f2a458d_af434bd8

And I could start the container with docker run but it started up without it's data volume.

@resouer
Copy link
Contributor

resouer commented Dec 22, 2016

So we can make sure it is due to EBS volume.

I can see a very suspicious ERROR: VolumeInUse: vol-02ff0c2158d03e018 is already attached to an instance. Can you check if your EBS volume has already attached a different VM other than the right node?

@george-angel
Copy link
Contributor Author

Sadly not, we had to recreate the the deployment and the volume, but I will check next time we run into this.

@dimpavloff
Copy link
Contributor

dimpavloff commented Mar 7, 2017

Hi, we have a very similar setup as above (CoreOS , docker 1.12.6 , k8s 1.5.2, a prometheus Pod with EBS) and are seeing the same issue for the prometheus Pod and any other pods subsequently scheduled on the same node.

After a lot of digging, our hypothesis is that the prometheus process, upon starting, tries to go through all the data in the EBS volume (~24 GB for us) which causes it to max out the provisioned IOPS and thus throttle any IO on that volume. The IOPS are definitely maxed out, we've confirmed via the AWS CW metrics.
We observed a correlated extreme slowdown of docker ps (it takes 5 mins to return in our case. Calling docker ps immediately after the slow call has returned is instantaneous). OTOH, docker inspect continues to be operational throughout the slowdown period and it reports the container as actually running.

Could the kubelet be making a call which hits a code path similar to docker ps ? The worst part is that this affects the entire node whenever it happens -- any other pods that get scheduled on it, also get stuck in ContainerCreating, presumably because the docker runtime is completely unresponsive. In the many attempts for the prometheus pod to startup, it was never eventually successful apart from once. I.e. this was happening in a loop on the node, with backup period breaks in between, and whenever it was happening, it was breaking any other Pod scheduled on it.

Here are the relevant kubelet logs:

Mar 07 15:11:36 kubelet-wrapper[1558]: I0307 15:11:36.010489    1558 reconciler.go:230] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/configmap/c87dfaea-0347-11e7-9c33-024bab486553-config-volume" (spec.Name: "config-volume") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553")
Mar 07 15:11:36 kubelet-wrapper[1558]: I0307 15:11:36.110789    1558 reconciler.go:230] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531" (spec.Name: "pvc-250f9066-fcdd-11e6-84dd-025aa3c7f579") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553")
Mar 07 15:11:36 kubelet-wrapper[1558]: I0307 15:11:36.110859    1558 reconciler.go:230] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/secret/c87dfaea-0347-11e7-9c33-024bab486553-default-token-z32z7" (spec.Name: "default-token-z32z7") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553")
Mar 07 15:11:36 kubelet-wrapper[1558]: E0307 15:11:36.110955    1558 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531\"" failed. No retries permitted until 2017-03-07 15:11:36.610928208 +0000 UTC (durationBeforeRetry 500ms). Error: Volume "kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531" (spec.Name: "pvc-250f9066-fcdd-11e6-84dd-025aa3c7f579") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553") has not yet been added to the list of VolumesInUse in the node's volume status.
Mar 07 15:11:36 kubelet-wrapper[1558]: I0307 15:11:36.115416    1558 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/configmap/c87dfaea-0347-11e7-9c33-024bab486553-config-volume" (spec.Name: "config-volume") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553").
Mar 07 15:11:36 kubelet-wrapper[1558]: I0307 15:11:36.218029    1558 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/c87dfaea-0347-11e7-9c33-024bab486553-default-token-z32z7" (spec.Name: "default-token-z32z7") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553").
Mar 07 15:11:36 kubelet-wrapper[1558]: I0307 15:11:36.612390    1558 reconciler.go:230] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531" (spec.Name: "pvc-250f9066-fcdd-11e6-84dd-025aa3c7f579") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553")
Mar 07 15:11:36 kubelet-wrapper[1558]: E0307 15:11:36.612501    1558 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531\"" failed. No retries permitted until 2017-03-07 15:11:37.612473326 +0000 UTC (durationBeforeRetry 1s). Error: Volume "kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531" (spec.Name: "pvc-250f9066-fcdd-11e6-84dd-025aa3c7f579") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553") has not yet been added to the list of VolumesInUse in the node's volume status.
Mar 07 15:11:36 kubelet-wrapper[1558]: 2017/03/07 15:11:36 Error retriving last reserved ip: Failed to retrieve last reserved ip: open /var/lib/cni/networks/podnet/last_reserved_ip: no such file or directory
Mar 07 15:11:37 kubelet-wrapper[1558]: I0307 15:11:37.614844    1558 reconciler.go:230] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531" (spec.Name: "pvc-250f9066-fcdd-11e6-84dd-025aa3c7f579") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553")
Mar 07 15:11:37 kubelet-wrapper[1558]: E0307 15:11:37.614955    1558 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531\"" failed. No retries permitted until 2017-03-07 15:11:39.61492792 +0000 UTC (durationBeforeRetry 2s). Error: Volume "kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531" (spec.Name: "pvc-250f9066-fcdd-11e6-84dd-025aa3c7f579") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553") has not yet been added to the list of VolumesInUse in the node's volume status.
Mar 07 15:11:39 kubelet-wrapper[1558]: I0307 15:11:39.641143    1558 reconciler.go:230] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531" (spec.Name: "pvc-250f9066-fcdd-11e6-84dd-025aa3c7f579") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553")
Mar 07 15:11:39 kubelet-wrapper[1558]: E0307 15:11:39.644808    1558 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531\"" failed. No retries permitted until 2017-03-07 15:11:43.644786251 +0000 UTC (durationBeforeRetry 4s). Error: Volume "kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531" (spec.Name: "pvc-250f9066-fcdd-11e6-84dd-025aa3c7f579") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553") is not yet attached according to node status.
Mar 07 15:11:43 kubelet-wrapper[1558]: I0307 15:11:43.650793    1558 reconciler.go:230] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531" (spec.Name: "pvc-250f9066-fcdd-11e6-84dd-025aa3c7f579") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553")
Mar 07 15:11:43 kubelet-wrapper[1558]: E0307 15:11:43.654648    1558 nestedpendingoperations.go:262] Operation for "\"kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531\"" failed. No retries permitted until 2017-03-07 15:11:51.65462126 +0000 UTC (durationBeforeRetry 8s). Error: Volume "kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531" (spec.Name: "pvc-250f9066-fcdd-11e6-84dd-025aa3c7f579") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553") is not yet attached according to node status.
Mar 07 15:11:51 kubelet-wrapper[1558]: I0307 15:11:51.671185    1558 reconciler.go:230] VerifyControllerAttachedVolume operation started for volume "kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531" (spec.Name: "pvc-250f9066-fcdd-11e6-84dd-025aa3c7f579") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553")
Mar 07 15:11:51 kubelet-wrapper[1558]: I0307 15:11:51.674794    1558 operation_executor.go:1219] Controller successfully attached volume "kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531" (spec.Name: "pvc-250f9066-fcdd-11e6-84dd-025aa3c7f579") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553") devicePath: "/dev/xvdba"
Mar 07 15:11:51 kubelet-wrapper[1558]: I0307 15:11:51.771526    1558 operation_executor.go:812] Entering MountVolume.WaitForAttach for volume "kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531" (spec.Name: "pvc-250f9066-fcdd-11e6-84dd-025aa3c7f579") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553") DevicePath: "/dev/xvdba"
Mar 07 15:11:52 kubelet-wrapper[1558]: I0307 15:11:52.771792    1558 attacher.go:148] Successfully found attached AWS Volume "aws://eu-west-1c/vol-003f7a49b6edab531".
Mar 07 15:11:52 kubelet-wrapper[1558]: I0307 15:11:52.771830    1558 operation_executor.go:832] MountVolume.WaitForAttach succeeded for volume "kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531" (spec.Name: "pvc-250f9066-fcdd-11e6-84dd-025aa3c7f579") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553").
Mar 07 15:11:52 kubelet-wrapper[1558]: I0307 15:11:52.815703    1558 operation_executor.go:871] MountVolume.MountDevice succeeded for volume "kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531" (spec.Name: "pvc-250f9066-fcdd-11e6-84dd-025aa3c7f579") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553") device mount path "/var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-west-1c/vol-003f7a49b6edab531"
Mar 07 15:11:52 kubelet-wrapper[1558]: I0307 15:11:52.829247    1558 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/aws-ebs/aws://eu-west-1c/vol-003f7a49b6edab531" (spec.Name: "pvc-250f9066-fcdd-11e6-84dd-025aa3c7f579") pod "c87dfaea-0347-11e7-9c33-024bab486553" (UID: "c87dfaea-0347-11e7-9c33-024bab486553").
Mar 07 15:11:53 kubelet-wrapper[1558]: I0307 15:11:53.335414    1558 docker_manager.go:2236] Determined pod ip after infra change: "prometheus-core-2400881639-968kr_monitoring(c87dfaea-0347-11e7-9c33-024bab486553)": "10.150.79.4"
Mar 07 15:11:58 kubelet-wrapper[1558]: W0307 15:11:58.910551    1558 conversion.go:110] Could not get instant cpu stats: different number of cpus
Mar 07 15:12:16 kubelet-wrapper[1558]: I0307 15:12:16.136584    1558 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/56b943a3-0348-11e7-9c33-024bab486553-default-token-z32z7" (spec.Name: "default-token-z32z7") pod "56b943a3-0348-11e7-9c33-024bab486553" (UID: "56b943a3-0348-11e7-9c33-024bab486553").
Mar 07 15:12:17 kubelet-wrapper[1558]: I0307 15:12:17.139057    1558 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/56b943a3-0348-11e7-9c33-024bab486553-default-token-z32z7" (spec.Name: "default-token-z32z7") pod "56b943a3-0348-11e7-9c33-024bab486553" (UID: "56b943a3-0348-11e7-9c33-024bab486553").
Mar 07 15:14:35 kubelet-wrapper[1558]: E0307 15:14:35.456471    1558 docker_manager.go:2324] container start failed: RunContainerError: runContainer: operation timeout: context deadline exceeded
Mar 07 15:14:35 kubelet-wrapper[1558]: E0307 15:14:35.456710    1558 pod_workers.go:184] Error syncing pod c87dfaea-0347-11e7-9c33-024bab486553, skipping: failed to "StartContainer" for "prometheus" with RunContainerError: "runContainer: operation timeout: context deadline exceeded"
Mar 07 15:14:36 kubelet-wrapper[1558]: E0307 15:14:36.291290    1558 generic.go:197] GenericPLEG: Unable to retrieve pods: operation timeout: context deadline exceeded
Mar 07 15:14:36 kubelet-wrapper[1558]: E0307 15:14:36.373902    1558 kubelet_pods.go:710] Error listing containers: dockertools.operationTimeout{err:context.deadlineExceededError{}}
Mar 07 15:14:36 kubelet-wrapper[1558]: E0307 15:14:36.373967    1558 kubelet.go:1860] Failed cleaning pods: operation timeout: context deadline exceeded
Mar 07 15:15:28 kubelet-wrapper[1558]: E0307 15:15:28.380811    1558 kubelet.go:1128] Container garbage collection failed: operation timeout: context deadline exceeded
Mar 07 15:16:28 kubelet-wrapper[1558]: E0307 15:16:28.447975    1558 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Mar 07 15:16:36 kubelet-wrapper[1558]: E0307 15:16:36.374219    1558 kubelet_pods.go:710] Error listing containers: dockertools.operationTimeout{err:context.deadlineExceededError{}}
Mar 07 15:16:36 kubelet-wrapper[1558]: E0307 15:16:36.374260    1558 kubelet.go:1860] Failed cleaning pods: operation timeout: context deadline exceeded
Mar 07 15:16:36 kubelet-wrapper[1558]: I0307 15:16:36.405815    1558 operation_executor.go:917] MountVolume.SetUp succeeded for volume "kubernetes.io/secret/56b943a3-0348-11e7-9c33-024bab486553-default-token-z32z7" (spec.Name: "default-token-z32z7") pod "56b943a3-0348-11e7-9c33-024bab486553" (UID: "56b943a3-0348-11e7-9c33-024bab486553").
Mar 07 15:16:37 kubelet-wrapper[1558]: E0307 15:16:37.291702    1558 generic.go:197] GenericPLEG: Unable to retrieve pods: operation timeout: context deadline exceeded
Mar 07 15:18:28 kubelet-wrapper[1558]: W0307 15:18:28.374679    1558 image_gc_manager.go:164] [imageGCManager] Failed to monitor images: operation timeout: context deadline exceeded
Mar 07 15:18:28 kubelet-wrapper[1558]: E0307 15:18:28.381241    1558 kubelet.go:1128] Container garbage collection failed: operation timeout: context deadline exceeded
Mar 07 15:18:36 kubelet-wrapper[1558]: E0307 15:18:36.374724    1558 kubelet_pods.go:710] Error listing containers: dockertools.operationTimeout{err:context.deadlineExceededError{}}
Mar 07 15:18:36 kubelet-wrapper[1558]: E0307 15:18:36.374846    1558 kubelet.go:1860] Failed cleaning pods: operation timeout: context deadline exceeded
Mar 07 15:18:38 kubelet-wrapper[1558]: E0307 15:18:38.292173    1558 generic.go:197] GenericPLEG: Unable to retrieve pods: operation timeout: context deadline exceeded
Mar 07 15:20:29 kubelet-wrapper[1558]: E0307 15:20:29.701967    1558 kubelet_network.go:278] Failed to ensure marking rule for KUBE-MARK-DROP: error checking rule: exit status 4: iptables: Resource temporarily unavailable.
Mar 07 15:20:36 kubelet-wrapper[1558]: E0307 15:20:36.375206    1558 kubelet_pods.go:710] Error listing containers: dockertools.operationTimeout{err:context.deadlineExceededError{}}
Mar 07 15:20:36 kubelet-wrapper[1558]: E0307 15:20:36.375256    1558 kubelet.go:1860] Failed cleaning pods: operation timeout: context deadline exceeded
Mar 07 15:20:39 kubelet-wrapper[1558]: E0307 15:20:39.292772    1558 generic.go:197] GenericPLEG: Unable to retrieve pods: operation timeout: context deadline exceeded
Mar 07 15:21:28 kubelet-wrapper[1558]: E0307 15:21:28.381654    1558 kubelet.go:1128] Container garbage collection failed: operation timeout: context deadline exceeded
Mar 07 15:21:28 kubelet-wrapper[1558]: E0307 15:21:28.474488    1558 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Mar 07 15:22:36 kubelet-wrapper[1558]: E0307 15:22:36.375676    1558 kubelet_pods.go:710] Error listing containers: dockertools.operationTimeout{err:context.deadlineExceededError{}}
Mar 07 15:22:36 kubelet-wrapper[1558]: E0307 15:22:36.375731    1558 kubelet.go:1860] Failed cleaning pods: operation timeout: context deadline exceeded
Mar 07 15:22:40 kubelet-wrapper[1558]: E0307 15:22:40.293299    1558 generic.go:197] GenericPLEG: Unable to retrieve pods: operation timeout: context deadline exceeded
Mar 07 15:24:28 kubelet-wrapper[1558]: E0307 15:24:28.382215    1558 kubelet.go:1128] Container garbage collection failed: operation timeout: context deadline exceeded
Mar 07 15:24:36 kubelet-wrapper[1558]: E0307 15:24:36.376020    1558 kubelet_pods.go:710] Error listing containers: dockertools.operationTimeout{err:context.deadlineExceededError{}}
Mar 07 15:24:36 kubelet-wrapper[1558]: E0307 15:24:36.376055    1558 kubelet.go:1860] Failed cleaning pods: operation timeout: context deadline exceeded
Mar 07 15:24:41 kubelet-wrapper[1558]: E0307 15:24:41.293879    1558 generic.go:197] GenericPLEG: Unable to retrieve pods: operation timeout: context deadline exceeded
Mar 07 15:25:28 kubelet-wrapper[1558]: W0307 15:25:28.378311    1558 image_gc_manager.go:164] [imageGCManager] Failed to monitor images: operation timeout: context deadline exceeded
Mar 07 15:26:28 kubelet-wrapper[1558]: E0307 15:26:28.507476    1558 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory
Mar 07 15:26:36 kubelet-wrapper[1558]: E0307 15:26:36.376392    1558 kubelet_pods.go:710] Error listing containers: dockertools.operationTimeout{err:context.deadlineExceededError{}}
Mar 07 15:26:36 kubelet-wrapper[1558]: E0307 15:26:36.376439    1558 kubelet.go:1860] Failed cleaning pods: operation timeout: context deadline exceeded
Mar 07 15:26:42 kubelet-wrapper[1558]: E0307 15:26:42.294340    1558 generic.go:197] GenericPLEG: Unable to retrieve pods: operation timeout: context deadline exceeded
Mar 07 15:27:28 kubelet-wrapper[1558]: E0307 15:27:28.382669    1558 kubelet.go:1128] Container garbage collection failed: operation timeout: context deadline exceeded
Mar 07 15:28:36 kubelet-wrapper[1558]: E0307 15:28:36.376683    1558 kubelet_pods.go:710] Error listing containers: dockertools.operationTimeout{err:context.deadlineExceededError{}}
Mar 07 15:28:36 kubelet-wrapper[1558]: E0307 15:28:36.376721    1558 kubelet.go:1860] Failed cleaning pods: operation timeout: context deadline exceeded
Mar 07 15:28:43 kubelet-wrapper[1558]: E0307 15:28:43.294790    1558 generic.go:197] GenericPLEG: Unable to retrieve pods: operation timeout: context deadline exceeded
Mar 07 15:30:28 kubelet-wrapper[1558]: E0307 15:30:28.383069    1558 kubelet.go:1128] Container garbage collection failed: operation timeout: context deadline exceeded
Mar 07 15:30:36 kubelet-wrapper[1558]: E0307 15:30:36.377225    1558 kubelet_pods.go:710] Error listing containers: dockertools.operationTimeout{err:context.deadlineExceededError{}}
Mar 07 15:30:36 kubelet-wrapper[1558]: E0307 15:30:36.377262    1558 kubelet.go:1860] Failed cleaning pods: operation timeout: context deadline exceeded
Mar 07 15:30:44 kubelet-wrapper[1558]: E0307 15:30:44.295303    1558 generic.go:197] GenericPLEG: Unable to retrieve pods: operation timeout: context deadline exceeded
Mar 07 15:31:28 kubelet-wrapper[1558]: E0307 15:31:28.541567    1558 container_manager_linux.go:625] error opening pid file /run/docker/libcontainerd/docker-containerd.pid: open /run/docker/libcontainerd/docker-containerd.pid: no such file or directory

@paolomainardi
Copy link

We have the same problems with a cluster on google gke, the solution i found so far is to provision again the node, but is becoming quite unacceptable as we are using the cluster to do continuos integration of several projects and there is no automation that can handle the docker failure on a node.
@dimpavloff have you found any solutions ?

@guoshimin
Copy link
Contributor

We are also hitting this issue after upgrading to v1.5.4 from v1.4.7. We noticed that in v1.5.4, kubelet adds a :Z to each bind mount spec if SELinux is enabled on the system. We patched kubelet to disable this behavior and the problem went away.

Our running theory is that this causes libcontainer to try to recursively label all the files in the prometheus volume, which has a lot of files, and the container create request times out.

@paolomainardi
Copy link

paolomainardi commented Mar 24, 2017 via email

@dmcnaught
Copy link

For me restarting docker on the node is quicker that provisioning a new node @paolomainardi
I am seeing this mostly on the node running a prometheus container (kube-prometheus)

@paolomainardi
Copy link

paolomainardi commented Mar 24, 2017 via email

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 31, 2017
@derekwaynecarr derekwaynecarr added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 5, 2017
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 26, 2017
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 25, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

10 participants