Missing container metrics in kubelet (cAdvisor) in v1.5.1 #39812

ichekrygin · 2017-01-12T17:22:41Z

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.): No, this looks like a regression in v1.5.1

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): kubelet, metrics, cAdvisor

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:57:05Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:52:01Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: aws
OS (e.g. from /etc/os-release): os_image="Container Linux by CoreOS 1235.5.0 (Ladybug)
Kernel (e.g. uname -a): Linux ip-10-72-161-5 4.7.3-coreos-r2 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Sun Jan 8 00:32:25 UTC 2017 x86_64 Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz GenuineIntel GNU/Linux
Install tools:
Others:

What happened: after upgrading to v1.5.1 cAdvisor does not show any subcontainers, resulting in missing ALL container system metrics. For example:

core@ip-10-72-161-5 ~ $ curl localhost:10255/metrics | grep container_cpu_user_seconds_total
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 44371  100 44371    0     0  5264# HELP container_cpu_user_seconds_total Cumulative user cpu time consumed in seconds.
k     # TYPE container_cpu_user_seconds_total counter
 0 container_cpu_user_seconds_total{id="/"} 0
--:--:-- --:--:-- --:--:-- 6190k

What you expected to happen: cAdviser returns sub-containers and container metrics like so:

core@ip-10-72-6-143 ~ $ curl localhost:10255/metrics | grep container_cpu_user_seconds_total | more
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP container_cpu_user_seconds_total Cumulative user cpu time consumed in seconds.
# TYPE container_cpu_user_seconds_total counter
container_cpu_user_seconds_total{id="/"} 2.74665981e+06
container_cpu_user_seconds_total{id="/docker"} 2.33384234e+06
container_cpu_user_seconds_total{id="/init.scope"} 6.26

How to reproduce it (as minimally and precisely as possible):
upgrade kubelet to v1.5.1 and check metrics:
via metrics endpoint: curl localhost:10255/metrics
or via cAdviser UI: http://10.72.20.134:4194/containers/

Anything else do we need to know:

docker version:

docker version
Client:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   34a2ead
 Built:        
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   34a2ead
 Built:        
 OS/Arch:      linux/amd64

The text was updated successfully, but these errors were encountered:

pires · 2017-01-14T12:11:54Z

Seems to be fixed in 1.5.2.

core@node-01 ~ $ curl 172.17.8.102:10255/metrics | grep container_cpu_user_seconds_total | more
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP container_cpu_user_seconds_total Cumulative user cpu time consumed in seconds.
# TYPE container_cpu_user_seconds_total counter
container_cpu_user_seconds_total{id="/"} 281.72
container_cpu_user_seconds_total{id="/docker"} 241.4
container_cpu_user_seconds_total{id="/init.scope"} 0.45
container_cpu_user_seconds_total{id="/system.slice"} 38.68
container_cpu_user_seconds_total{id="/system.slice/audit-rules.service"} 0
container_cpu_user_seconds_total{id="/system.slice/containerd.service"} 0.68
container_cpu_user_seconds_total{id="/system.slice/coreos-setup-environment.service"} 0
container_cpu_user_seconds_total{id="/system.slice/dbus.service"} 0.2
container_cpu_user_seconds_total{id="/system.slice/docker.service"} 29.61
container_cpu_user_seconds_total{id="/system.slice/etcd2.service"} 3.2
container_cpu_user_seconds_total{id="/system.slice/flanneld.service"} 1.62

jakexks · 2017-01-26T15:26:00Z

It's not fixed in 1.5.2. In my experience it works after restarting the kubelet, but then gets into the state where it doesn't report container metrics again somehow.

Exactly the same setup as the original poster:

$ uname -a
Linux qa1-worker0 4.7.3-coreos-r2 #1 SMP Sun Jan 8 00:32:25 UTC 2017 x86_64 Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz GenuineIntel GNU/Linux

Container Linux by CoreOS 1235.5.0 (Ladybug)

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:52:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:52:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

$ curl -s qa1-worker0:10255/metrics | grep container_cpu_user_seconds_total
# HELP container_cpu_user_seconds_total Cumulative user cpu time consumed in seconds.
# TYPE container_cpu_user_seconds_total counter
container_cpu_user_seconds_total{id="/"} 0

$ docker version
Client:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   34a2ead
 Built:        
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.3
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   34a2ead
 Built:        
 OS/Arch:      linux/amd64

piosz · 2017-01-27T13:57:19Z

cc @dchen1107 @dashpole @timstclair

piosz · 2017-01-27T13:57:48Z

@kubernetes/sig-node-bugs

piosz · 2017-01-27T14:32:19Z

@ichekrygin @jakexks does kubelet summary endpoint have those informations? To verify this you can use:

curl -s http://localhost:10255/stats/summary

dashpole · 2017-01-27T17:08:56Z

We have had some issues in the past with stats disappearing from certain coreOS distributions:
#32304, #30939, cadvisor#1344
cc @euank (coreOS), any ideas?

dashpole · 2017-01-27T17:12:12Z

@justinsb, is #30939 fixed for aws?

jakexks · 2017-01-30T11:23:30Z

@piosz There's nothing in the kubelet summary either:

$ curl -s http://localhost:10255/stats/summary
{
  "node": {
   "nodeName": "qa1-worker0",
   "startTime": null,
   "memory": {
    "time": "2017-01-30T11:21:23Z",
    "availableBytes": 16831172608,
    "usageBytes": 0,
    "workingSetBytes": 0,
    "rssBytes": 0,
    "pageFaults": 0,
    "majorPageFaults": 0
   },
   "fs": {
    "availableBytes": 4938715136,
    "capacityBytes": 6350921728,
    "usedBytes": 1064087552,
    "inodesFree": 1627641,
    "inodes": 1628800,
    "inodesUsed": 1159
   },
   "runtime": {
    "imageFs": {
     "availableBytes": 13845848064,
     "capacityBytes": 21003628544,
     "usedBytes": 1793178765,
     "inodesFree": 771906,
     "inodes": 1310720,
     "inodesUsed": 538814
    }
   }
  },
  "pods": []
 }

(There are many pods running on the node)

dadux · 2017-02-02T02:31:48Z

Same issue with latest CoreOS stable. Metrics seem to disappear after a few hours.

$ curl -sk  https://localhost:10250/metrics | head
# HELP cadvisor_version_info A metric with a constant '1' value labeled by kernel version, OS version, docker version, cadvisor version & cadvisor revision.
# TYPE cadvisor_version_info gauge
cadvisor_version_info{cadvisorRevision="",cadvisorVersion="",dockerVersion="1.12.6",kernelVersion="4.7.3-coreos-r2",osVersion="Container Linux by CoreOS 1235.8.0 (Ladybug)"} 1
# HELP container_cpu_system_seconds_total Cumulative system cpu time consumed in seconds.
# TYPE container_cpu_system_seconds_total counter
container_cpu_system_seconds_total{id="/"} 0
# HELP container_cpu_user_seconds_total Cumulative user cpu time consumed in seconds.
# TYPE container_cpu_user_seconds_total counter
container_cpu_user_seconds_total{id="/"} 0

I think #33192 might be related ?

ichekrygin · 2017-02-03T17:08:13Z

@dadux yes, it is related to #33192.

ichekrygin · 2017-02-03T17:09:59Z

@piosz I think this issue impacts HPA, since with missing metrics HPA reports somewhat incorrect CPU usage. @jakexks, did you notice anything of this kind?

piosz · 2017-02-03T18:22:13Z

FYI @fgrzadkowski @jszczepkowski @DirectXMan12

philk · 2017-02-03T19:00:28Z

It absolutely affects HPA (stops HPAs with pods on broken nodes from scaling either direction). That's how we initially discovered this issue on our systems.

ichekrygin · 2017-02-03T21:44:17Z

@philk just curious, how r u guys coping w/ it. I am at the point of setting up a cronjob that restarts kube-kubelet.servcie every 12 hrs

philk · 2017-02-04T16:38:03Z

@ichekrygin just a simple script that runs every 5 minutes, curl -s localhost:10255/stats/summary | jq -Mr '.pods | any' then systemctl restart kubelet

justinsb · 2017-02-05T17:54:26Z

@dashpole I believe it is fixed. I closed that issue (I'm sure someone will comment if not), but I have been unable to reproduce (so far) on our 1.5 AWS images.

ichekrygin · 2017-02-07T14:44:35Z

@justinsb Do you mean on 1.5 AWS that are after 1.5.2?
@jakexks has repro on 1.5.2

ichekrygin · 2017-02-07T14:49:11Z

This is what I see in kube-kubelet.service log when cAdvisor loses container metrics:

Feb 07 13:50:15 ip-10-72-161-195 kubelet[2401]: W0207 13:50:15.144290    2401 raw.go:87] Error while processing event ("/var/lib/rkt/pods/run/cd3ace64-d3de-4cb4-88ea-140752d3b570/stage1/rootfs/opt/stage2/flannel/
rootfs/sys/fs/cgroup/cpu,cpuacct/system.slice/var-lib-docker-overlay-d7ec9fb3aaa33f4865b82c9682b4fd80461751043b4e8ec31a3561d08a72f1a4-merged.mount": 0x40000100 == IN_CREATE|IN_ISDIR): open /var/lib/rkt/pods/run/c
d3ace64-d3de-4cb4-88ea-140752d3b570/stage1/rootfs/opt/stage2/flannel/rootfs/sys/fs/cgroup/cpu,cpuacct/system.slice/var-lib-docker-overlay-d7ec9fb3aaa33f4865b82c9682b4fd80461751043b4e8ec31a3561d08a72f1a4-merged.mo
unt: no such file or directory
Feb 07 13:50:15 ip-10-72-161-195 kubelet[2401]: W0207 13:50:15.144765    2401 raw.go:87] Error while processing event ("/var/lib/rkt/pods/run/cd3ace64-d3de-4cb4-88ea-140752d3b570/stage1/rootfs/opt/stage2/flannel/
rootfs/sys/fs/cgroup/blkio/system.slice/var-lib-docker-overlay-d7ec9fb3aaa33f4865b82c9682b4fd80461751043b4e8ec31a3561d08a72f1a4-merged.mount": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /var/lib/rkt/pod
s/run/cd3ace64-d3de-4cb4-88ea-140752d3b570/stage1/rootfs/opt/stage2/flannel/rootfs/sys/fs/cgroup/blkio/system.slice/var-lib-docker-overlay-d7ec9fb3aaa33f4865b82c9682b4fd80461751043b4e8ec31a3561d08a72f1a4-merged.m
ount: no such file or directory
Feb 07 13:50:15 ip-10-72-161-195 kubelet[2401]: W0207 13:50:15.144816    2401 raw.go:87] Error while processing event ("/var/lib/rkt/pods/run/cd3ace64-d3de-4cb4-88ea-140752d3b570/stage1/rootfs/opt/stage2/flannel/
rootfs/sys/fs/cgroup/memory/system.slice/var-lib-docker-overlay-d7ec9fb3aaa33f4865b82c9682b4fd80461751043b4e8ec31a3561d08a72f1a4-merged.mount": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /var/lib/rkt/po
ds/run/cd3ace64-d3de-4cb4-88ea-140752d3b570/stage1/rootfs/opt/stage2/flannel/rootfs/sys/fs/cgroup/memory/system.slice/var-lib-docker-overlay-d7ec9fb3aaa33f4865b82c9682b4fd80461751043b4e8ec31a3561d08a72f1a4-merged
.mount: no such file or directory

fiksn · 2017-02-23T12:39:14Z

I have the same problem on CoreOS 1235.9.0 with Kubernetes 1.5.3.

@philk workaround works like a charm, but the problem is that I don't want to restart kubelet on a (temporarily) cordoned host (where pods section is empty) all the time. So I tried with something like this:

kubelet-periodic-check.service

[Unit]
Description=Kubelet health check
Documentation=https://github.com/kubernetes/kubernetes/issues/33192 https://github.com/kubernetes/kubernetes/issues/39812

[Service]
Type=oneshot
ExecStart=/bin/sh -c "curl --connect-timeout 5 --max-time 10 http://127.0.0.1:4194/containers/system.slice/etcd2.service 2>/dev/null | grep -q failed && systemctl restart kubelet"

kubelet-periodic-check.timer

[Unit]
Description=Kubelet health check cron
Documentation=https://github.com/kubernetes/kubernetes/issues/33192 https://github.com/kubernetes/kubernetes/issues/39812

[Timer]
OnBootSec=13min
OnUnitActiveSec=13m
Unit=kubelet-periodic-check.service

[Install]
WantedBy=timers.target

I (ab)use the fact that when the problem happens http://127.0.0.1:4194/containers/system.slice/etcd2.service returns:

failed to get container "/system.slice/etcd2.service" with error: unknown container "/system.slice/etcd2.service"

and etcd2 should usually run all the time on my nodes. Perhaps this will help somebody. Hopefully this issue will get resolved properly soon tho.

ajaybhande · 2017-02-28T10:25:52Z

This issue seems to be with older versions as well. Refer #33192

andrewsykim · 2017-03-09T17:05:06Z

I'm seeing this issue as well and its blocking HPA from doing any scaling in my production clusters. I would like to avoid adding hacks such as periodically restarting the kubelet if possible. Is anyone currently working on a fix for this? If not I don't mind digging into it a bit, if someone can point me in the right direction for where to start that would be great. Seems like cAdvisor code in kubelet would be a good start?

vishh · 2017-03-09T19:07:35Z

cc @dashpole Can you investigate this issue?

…

On Thu, Mar 9, 2017 at 9:05 AM, Andrew Sy Kim ***@***.***> wrote: I'm seeing this issue as well and its blocking HPA from doing any scaling in my production clusters. I would like to avoid adding hacks such as periodically restarting the kubelet if possible. Is anyone currently working on a fix for this? If not I don't mind digging into it a bit, if someone can point me in the right direction for where to start that would be great. Seems like cAdvisor code in kubelet would be a good start? — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#39812 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGvIKDfKmaXvtbvovmT6xs9my2_JtGYXks5rkDFggaJpZM4LiA8_> .

dashpole · 2017-03-09T19:27:19Z

Ill check it out.

dashpole · 2017-03-09T22:04:28Z

This may already be fixed, although I am not sure if it has made it into kubernetes yet.
Looks like what cadvisor#1573 fixes

andrewsykim · 2017-03-10T16:53:55Z

@dashpole any chance we can get a vendor update for the cadvisor in for the next release?

mindw · 2017-03-10T17:36:18Z

What are the chances for fixing this for 1.4.x / 1.5.x?

Thanks!

dashpole · 2017-03-10T17:54:50Z

So this will definitely be in 1.6. It was actually added two months ago #40095. However, adding this to 1.5 or 1.4 would require cherrypicking #40095, which may not happen, since it includes updating the aws dependencies and it is ~80k lines of code.

timstclair · 2017-03-10T19:48:22Z

@dashpole we should cherrypick the cAdvisor fix into the release-v0.24 branch for cherrypicking to k8s 1.4/1.5 branches.

dashpole · 2017-03-16T15:46:12Z

cherrypick to 1.5: #43113

andrewsykim · 2017-03-30T14:04:13Z

There was a fix that went out for this in 1.5.6, I upgraded my worker nodes to run 1.5.6 and I still see this bug.

piosz · 2017-04-04T12:52:07Z

@dashpole @timstclair can you please take a look?

piosz · 2017-04-04T12:52:12Z

@dashpole @timstclair can you please take a look?

dashpole · 2017-04-05T14:35:32Z

@andrewsykim are you running on a systemd-based system?

andrewsykim · 2017-04-05T16:44:56Z

@dashpole yes I am (CoreOS)

dashpole · 2017-04-07T16:04:32Z

@andrewsykim would you mind opening an issue to cAdvisor, and giving us some of the error messages you see in your logs? I have no experience with systemd, but I can find someone who does to check out your specific case, since it wasnt fixed by #1573

nicklan · 2017-08-01T17:32:24Z

We ran into this issue when running kubelet in a rkt container. We were seeing error messages of the form:
Aug 01 00:50:30 [nodename] kubelet-wrapper[20169]: E0801 00:50:30.626256 20169 manager.go:1031] Failed to create existing container: /docker/1f3994fd716cc132015b6059d47e84425370ce39d0d05016ba5c7f321c3b4f18: failed to identify the read-write layer ID for container "1f3994fd716cc132015b6059d47e84425370ce39d0d05016ba5c7f321c3b4f18". - open /storage/docker/image/overlay/layerdb/mounts/1f3994fd716cc132015b6059d47e84425370ce39d0d05016ba5c7f321c3b4f18/mount-id: no such file or directory

It appears cAdvisor tries to read those files, but inside the rkt container /storage/docker was not mounted. The solution was to bind mount that dir when starting kubelet in rkt with:

--volume dockerstorage,kind=host,source=/storage/docker,readOnly=true --mount volume=dockerstorage,target=/storage/docker

Interestingly, we only saw errors on our master nodes (i.e those running kube-apiserver). Worker nodes don't have the issue even though the docker dir isn't mounted, and can still export pod metrics. No idea why that is the case ¯\(ツ)/¯

d-shi · 2017-08-18T23:37:21Z

Is there an update on this? I'm running k8s 1.7.2 and am also noticing that port 10255/metrics does not show pod metrics. However port 10255/stats/summary does. Restarting kubelet does not change anything.

d-shi · 2017-08-19T00:02:05Z

Actually nevermind... apparently the behavior of cadvisor has changed in 1.7. I was able to get Prometheus to grab pod metrics by following the setup mentioned here: https://raw.githubusercontent.com/prometheus/prometheus/master/documentation/examples/prometheus-kubernetes.yml

fejta-bot · 2018-01-03T09:19:41Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot · 2018-02-08T06:46:08Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-03-10T07:31:36Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

bboreham · 2018-04-13T10:38:43Z

/reopen

We have seen this on two clusters in recent weeks, running k8s v1.9.3.

It appears that kubelet's view of the universe has diverged significantly from Docker's hence it does not have the metadata to tag container metrics. Lots of these in kubelet logs:

Apr 13 10:32:38 ip-172-20-2-209 kubelet[885]: E0413 10:32:38.395138     885 manager.go:1103] Failed to create existing container: /kubepods/besteffort/pod52908ce2-3c8a-11e8-9d5c-0a41257e78e8/f6276ace92468f578cddd61be4babd0ea3e03c3ad79735137a54ea4e72fdee08: failed to identify the read-write layer ID for container "f6276ace92468f578cddd61be4babd0ea3e03c3ad79735137a54ea4e72fdee08". - open /var/lib/docker/image/overlay2/layerdb/mounts/f6276ace92468f578cddd61be4babd0ea3e03c3ad79735137a54ea4e72fdee08/mount-id: no such file or directory
Apr 13 10:32:38 ip-172-20-2-209 kubelet[885]: E0413 10:32:38.395788     885 manager.go:1103] Failed to create existing container: /kubepods/burstable/pod97d16950-3d9f-11e8-9d5c-0a41257e78e8/2a21b4da59f8c7294e1551783637320cec4b51be4d3ed230274472565fa3d143: failed to identify the read-write layer ID for container "2a21b4da59f8c7294e1551783637320cec4b51be4d3ed230274472565fa3d143". - open /var/lib/docker/image/overlay2/layerdb/mounts/2a21b4da59f8c7294e1551783637320cec4b51be4d3ed230274472565fa3d143/mount-id: no such file or directory
Apr 13 10:32:38 ip-172-20-2-209 kubelet[885]: E0413 10:32:38.396443     885 manager.go:1103] Failed to create existing container: /kubepods/burstable/pod461ad3a8-3f01-11e8-9d5c-0a41257e78e8/9616bf49faab8e61fe9652796d0f37072addd8474b5da9171fdef0c933e02b07: failed to identify the read-write layer ID for container "9616bf49faab8e61fe9652796d0f37072addd8474b5da9171fdef0c933e02b07". - open /var/lib/docker/image/overlay2/layerdb/mounts/9616bf49faab8e61fe9652796d0f37072addd8474b5da9171fdef0c933e02b07/mount-id: no such file or directory

kubelet is running as a systemd service, not in a container.

Restarting kubelet fixed the issue for today's instance. I'm told on another occasion it was necessary to drain, delete all docker files and restart.

k8s-ci-robot · 2018-04-13T10:38:44Z

@bboreham: you can't re-open an issue/PR unless you authored it or you are assigned to it.

In response to this:

/reopen

We have seen this on two clusters in recent weeks, running k8s v1.9.3.

It appears that kubelet's view of the universe has diverged significantly from Docker's hence it does not have the metadata to tag container metrics. Lots of these in kubelet logs:

Apr 13 10:32:38 ip-172-20-2-209 kubelet[885]: E0413 10:32:38.395138     885 manager.go:1103] Failed to create existing container: /kubepods/besteffort/pod52908ce2-3c8a-11e8-9d5c-0a41257e78e8/f6276ace92468f578cddd61be4babd0ea3e03c3ad79735137a54ea4e72fdee08: failed to identify the read-write layer ID for container "f6276ace92468f578cddd61be4babd0ea3e03c3ad79735137a54ea4e72fdee08". - open /var/lib/docker/image/overlay2/layerdb/mounts/f6276ace92468f578cddd61be4babd0ea3e03c3ad79735137a54ea4e72fdee08/mount-id: no such file or directory
Apr 13 10:32:38 ip-172-20-2-209 kubelet[885]: E0413 10:32:38.395788     885 manager.go:1103] Failed to create existing container: /kubepods/burstable/pod97d16950-3d9f-11e8-9d5c-0a41257e78e8/2a21b4da59f8c7294e1551783637320cec4b51be4d3ed230274472565fa3d143: failed to identify the read-write layer ID for container "2a21b4da59f8c7294e1551783637320cec4b51be4d3ed230274472565fa3d143". - open /var/lib/docker/image/overlay2/layerdb/mounts/2a21b4da59f8c7294e1551783637320cec4b51be4d3ed230274472565fa3d143/mount-id: no such file or directory
Apr 13 10:32:38 ip-172-20-2-209 kubelet[885]: E0413 10:32:38.396443     885 manager.go:1103] Failed to create existing container: /kubepods/burstable/pod461ad3a8-3f01-11e8-9d5c-0a41257e78e8/9616bf49faab8e61fe9652796d0f37072addd8474b5da9171fdef0c933e02b07: failed to identify the read-write layer ID for container "9616bf49faab8e61fe9652796d0f37072addd8474b5da9171fdef0c933e02b07". - open /var/lib/docker/image/overlay2/layerdb/mounts/9616bf49faab8e61fe9652796d0f37072addd8474b5da9171fdef0c933e02b07/mount-id: no such file or directory

kubelet is running as a systemd service, not in a container.

Restarting kubelet fixed the issue for today's instance. I'm told on another occasion it was necessary to drain, delete all docker files and restart.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dashpole · 2018-04-13T16:36:01Z

@bboreham can you open an issue with cAdvisor? We can debug it there.

dchen1107 added area/os/coreos area/cadvisor sig/node Categorizes an issue or PR as relevant to SIG Node. labels Mar 9, 2017

andrewsykim mentioned this issue Mar 30, 2017

[Release 1.5] Update cadvisor godeps to v0.25.0 #43113

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 3, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 8, 2018

k8s-ci-robot closed this as completed Mar 10, 2018

bboreham mentioned this issue Apr 18, 2018

cAdvisor occasionally gets into a state where it has no container metadata google/cadvisor#1932

Open

Missing container metrics in kubelet (cAdvisor) in v1.5.1 #39812

Missing container metrics in kubelet (cAdvisor) in v1.5.1 #39812

Comments

ichekrygin commented Jan 12, 2017 • edited

pires commented Jan 14, 2017

jakexks commented Jan 26, 2017 • edited

piosz commented Jan 27, 2017

piosz commented Jan 27, 2017

piosz commented Jan 27, 2017

dashpole commented Jan 27, 2017

dashpole commented Jan 27, 2017

jakexks commented Jan 30, 2017

dadux commented Feb 2, 2017

ichekrygin commented Feb 3, 2017

ichekrygin commented Feb 3, 2017

piosz commented Feb 3, 2017

philk commented Feb 3, 2017

ichekrygin commented Feb 3, 2017

philk commented Feb 4, 2017

justinsb commented Feb 5, 2017

ichekrygin commented Feb 7, 2017

ichekrygin commented Feb 7, 2017

fiksn commented Feb 23, 2017

ajaybhande commented Feb 28, 2017

andrewsykim commented Mar 9, 2017

vishh commented Mar 9, 2017 via email

dashpole commented Mar 9, 2017

dashpole commented Mar 9, 2017

andrewsykim commented Mar 10, 2017

mindw commented Mar 10, 2017

dashpole commented Mar 10, 2017

timstclair commented Mar 10, 2017

dashpole commented Mar 16, 2017

andrewsykim commented Mar 30, 2017

piosz commented Apr 4, 2017

piosz commented Apr 4, 2017

dashpole commented Apr 5, 2017

andrewsykim commented Apr 5, 2017

dashpole commented Apr 7, 2017

nicklan commented Aug 1, 2017

d-shi commented Aug 18, 2017

d-shi commented Aug 19, 2017

fejta-bot commented Jan 3, 2018

fejta-bot commented Feb 8, 2018

fejta-bot commented Mar 10, 2018

bboreham commented Apr 13, 2018

k8s-ci-robot commented Apr 13, 2018

dashpole commented Apr 13, 2018

ichekrygin commented Jan 12, 2017 •

edited

jakexks commented Jan 26, 2017 •

edited