kubelet Image garbage collection failed: unable to find data for container / #26000

azell · 2016-05-20T22:52:15Z

Cluster 1.2.2 settings:

AWS_DEFAULT_PROFILE=default

export DOCKER_STORAGE=btrfs
export KUBERNETES_PROVIDER=aws
export KUBE_AWS_ZONE=us-west-2a
export KUBE_ENABLE_CLUSTER_LOGGING=false
export KUBE_ENABLE_CLUSTER_MONITORING=none
export MULTIZONE=1
export NODE_ROOT_DISK_SIZE=32
export NODE_SIZE=m4.xlarge
export NUM_NODES=5

When the node disk space is low enough to trigger GC, nothing happens.

# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1       32G   28G  3.0G  91% /
udev             10M     0   10M   0% /dev
tmpfs           3.2G  282M  2.9G   9% /run
tmpfs           7.9G  1.1M  7.9G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           7.9G     0  7.9G   0% /sys/fs/cgroup

# journalctl -u kubelet | grep -i garbage
Apr 20 17:48:20 ip-172-20-0-149 kubelet[4441]: E0420 17:48:20.680505    4441 kubelet.go:956] Image garbage collection failed: unable to find data for container /
May 19 18:36:30 ip-172-20-0-149 kubelet[27507]: E0519 18:36:30.108168   27507 kubelet.go:956] Image garbage collection failed: unable to find data for container /

cAdvisor output looks OK:

# curl http://127.0.0.1:4194/validate/
...
Docker driver setup: [Supported and recommended]
    Docker exec driver is native-0.2. Storage driver is aufs.
    Docker container state directory is at "/var/lib/docker/containers" and is accessible.


Block device setup: [Supported and recommended]
    At least one device supports 'cfq' I/O scheduler. Some disk stats can be reported.
     Disk "xvda" Scheduler type "cfq".
...

Everything else in the cluster seems to be working. Any ideas on how to debug? For now I manually removed dangling and older Docker images.

The text was updated successfully, but these errors were encountered:

vishh · 2016-05-20T22:57:45Z

I suspect that this might be due to btrfs storage driver. We only test
against aufs, overlayfs and device mapper AFAIK.

On Fri, May 20, 2016 at 3:52 PM, Adam Zell notifications@github.com wrote:

Cluster 1.2.2 settings:

AWS_DEFAULT_PROFILE=default

export DOCKER_STORAGE=btrfs
export KUBERNETES_PROVIDER=aws
export KUBE_AWS_ZONE=us-west-2a
export KUBE_ENABLE_CLUSTER_LOGGING=false
export KUBE_ENABLE_CLUSTER_MONITORING=none
export MULTIZONE=1
export NODE_ROOT_DISK_SIZE=32
export NODE_SIZE=m4.xlarge
export NUM_NODES=5

When the node disk space is low enough to trigger GC, nothing happens.

df -h

Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 32G 28G 3.0G 91% /
udev 10M 0 10M 0% /dev
tmpfs 3.2G 282M 2.9G 9% /run
tmpfs 7.9G 1.1M 7.9G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup

journalctl -u kubelet | grep -i garbage

Apr 20 17:48:20 ip-172-20-0-149 kubelet[4441]: E0420 17:48:20.680505 4441 kubelet.go:956] Image garbage collection failed: unable to find data for container /
May 19 18:36:30 ip-172-20-0-149 kubelet[27507]: E0519 18:36:30.108168 27507 kubelet.go:956] Image garbage collection failed: unable to find data for container /

cAdvisor output looks OK:

curl http://127.0.0.1:4194/validate/

...
Docker driver setup: [Supported and recommended]
Docker exec driver is native-0.2. Storage driver is aufs.
Docker container state directory is at "/var/lib/docker/containers" and is accessible.

Block device setup: [Supported and recommended]
At least one device supports 'cfq' I/O scheduler. Some disk stats can be reported.
Disk "xvda" Scheduler type "cfq".
...

Everything else in the cluster seems to be working. Any ideas on how to
debug? For now I manually removed dangling and older Docker images.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#26000

matthughes · 2016-07-12T14:39:04Z

I'm getting the same thing with CoreOS / k8s 1.2.4 using default overlay storage driver:

docker info
Containers: 6
 Running: 6
 Paused: 0
 Stopped: 0
Images: 336
Server Version: 1.10.3
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
 Volume: local
 Network: null host bridge
Kernel Version: 4.5.7-coreos
Operating System: CoreOS 1010.6.0 (MoreOS)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.864 GiB
Name: ndc_cmc_e-worker0
ID: NEFH:JM4E:OIIX:OJB6:7Y4Q:ALF3:4NTW:T3QO:QR6R:3I2Z:5DNI:7Q5L

vishh · 2016-07-12T19:17:02Z

cc @ronnielai

ronnielai · 2016-07-12T19:40:14Z

The error seems to come from cadvisor. cc @timstclair

@matthughes. could you provide the output of localhost:4194/api/v2.1/storage

mdshuai · 2016-08-10T03:04:55Z

Also get same error log on node. I use the overlay storage driver.
Aug 9 21:09:35 ip-172-18-11-227 atomic-openshift-node: E0809 21:09:35.301022 18211 kubelet.go:934] Image garbage collection failed: unable to find data for container /

[root@ip-172-18-11-227 ~]# docker info
Containers: 16
 Running: 8
 Paused: 0
 Stopped: 8
Images: 8
Server Version: 1.10.3
Storage Driver: overlay
 Backing Filesystem: xfs
Execution Driver: native-0.2
Logging Driver: json-file
Plugins: 
 Volume: local
 Network: null host bridge
 Authorization: rhel-push-plugin
Kernel Version: 3.10.0-327.28.2.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.2 (Maipo)
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 2
CPUs: 1
Total Memory: 3.518 GiB
Name: ip-172-18-11-227.ec2.internal
ID: TCBW:MYWO:ALID:K5YR:BK3I:D3PK:VOBC:GC2A:ZVAI:AWJV:RGUA:4COS
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Registries: registry.ops.openshift.com (insecure), registry.access.redhat.com (secure), docker.io (secure)

victorgp · 2016-10-25T02:44:24Z

Same here, kubernetes version 1.3.4 using CoreOS with default overlay fs

chbatey · 2016-11-18T12:07:00Z

Getting this running kuberntes v1.4.6, Ubuntu with AUFS

xmik · 2016-12-12T20:26:26Z

Also seeing this with kubernetes v1.5.0-beta.3, Ubuntu 16.04.1, docker storage driver: overlay, docker version: 1.12.3.

But sometimes image garbage collection succeeds:

root@cluster1-k8s-node-1a:/home/ubuntu# journalctl -u kubelet | grep -i garbage
Dec 11 15:05:37 cluster1-k8s-node-1a kubelet[1151]: E1211 15:05:37.175130    1151 kubelet.go:1145] Image garbage collection failed: unable to find data for container /
Dec 11 15:10:37 cluster1-k8s-node-1a kubelet[1151]: I1211 15:10:37.176284    1151 kubelet.go:1155] Image garbage collection succeeded
Dec 11 17:42:48 cluster1-k8s-node-1a kubelet[23107]: E1211 17:42:48.104934   23107 kubelet.go:1145] Image garbage collection failed: unable to find data for container /
Dec 11 17:47:48 cluster1-k8s-node-1a kubelet[23107]: I1211 17:47:48.105446   23107 kubelet.go:1155] Image garbage collection succeeded
Dec 12 14:03:25 cluster1-k8s-node-1a kubelet[18062]: E1212 14:03:25.161337   18062 kubelet.go:1145] Image garbage collection failed: unable to find data for container /
Dec 12 14:03:26 cluster1-k8s-node-1a kubelet[18130]: E1212 14:03:26.532640   18130 kubelet.go:1145] Image garbage collection failed: unable to find data for container /
Dec 12 14:08:26 cluster1-k8s-node-1a kubelet[18130]: I1212 14:08:26.532863   18130 kubelet.go:1155] Image garbage collection succeeded
Dec 12 19:41:27 cluster1-k8s-node-1a kubelet[31006]: E1212 19:41:27.364426   31006 kubelet.go:1145] Image garbage collection failed: unable to find data for container /
Dec 12 19:46:27 cluster1-k8s-node-1a kubelet[31006]: I1212 19:46:27.364837   31006 kubelet.go:1155] Image garbage collection succeeded

@ronnielai asked for localhost:4194/api/v2.1/storage , so:

root@cluster1-k8s-node-1a:/home/ubuntu# curl localhost:4194/api/v2.1/storage
[{"device":"/dev/vda1","mountpoint":"/","capacity":20749852672,"available":16741720064,"usage":3991355392,"labels":["docker-images","root"],"inodes":2560000,"inodes_free":2454482}]

bamb00 · 2017-03-07T16:08:42Z

I'm getting the same garbage collection failed error (kubernetes server v1.5.3 & docker 1.12.6),

Mar 06 16:22:36 ip-10-43-0-20 kubelet[813]: E0306 16:22:36.439499 813 kubelet.go:1145] Image garbage collection failed: unable to find data for container /

vishh · 2017-03-07T17:09:20Z

Is this an one off error?

…

On Tue, Mar 7, 2017 at 8:09 AM, bamb00 ***@***.***> wrote: I'm getting the same garbage collction failed issue (kubernetes server v1.5.3), Mar 06 16:22:36 ip-10-43-0-20 kubelet[813]: E0306 16:22:36.439499 813 kubelet.go:1145] Image garbage collection failed: unable to find data for container / — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#26000 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGvIKMFU0vNOw1Dzl9MwhR02sOq_vQrFks5rjYEpgaJpZM4Ijn2e> .

bamb00 · 2017-03-07T17:33:07Z

@vishh, I don't understand your question?

vishh · 2017-03-07T18:21:33Z

Is GC failing continuously or is it failing at arbitrary times?

…

On Tue, Mar 7, 2017 at 9:33 AM, bamb00 ***@***.***> wrote: @vishh <https://github.com/vishh>, I don't understand your question? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#26000 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGvIKGoDwFXkHp0MBJ4yYh1VW7-eFHaIks5rjZTvgaJpZM4Ijn2e> .

bamb00 · 2017-03-07T20:17:28Z

Corrections...
Looks like this is failing continously (every 1 min). Here is the complete logs,

I0307 20:12:05.897288 9095 manager.go:204] Version: {KernelVersion:4.4.0-45-generic ContainerOsVersion:Ubuntu 16.04.1 LTS DockerVersion:1.12.6 CadvisorVersion: CadvisorRevision:}
I0307 20:12:05.897945 9095 cadvisor_linux.go:152] Failed to register cAdvisor on port 4194, retrying. Error: listen tcp :4194: bind: address already in use
W0307 20:12:05.899031 9095 server.go:669] No api server defined - no events will be sent to API server.
W0307 20:12:05.900525 9095 kubelet_network.go:69] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
I0307 20:12:05.900548 9095 kubelet.go:477] Hairpin mode set to "hairpin-veth"
I0307 20:12:05.904911 9095 docker_manager.go:257] Setting dockerRoot to /var/lib/docker
I0307 20:12:05.904928 9095 docker_manager.go:260] Setting cgroupDriver to cgroupfs
I0307 20:12:05.905706 9095 server.go:770] Started kubelet v1.5.2
E0307 20:12:05.905780 9095 server.go:481] Starting health server failed: listen tcp 127.0.0.1:10248: bind: address already in use
E0307 20:12:05.905819 9095 kubelet.go:1145] Image garbage collection failed: unable to find data for container /
W0307 20:12:05.905853 9095 kubelet.go:1224] No api server defined - no node status update will be sent.
I0307 20:12:05.905878 9095 server.go:123] Starting to listen on 0.0.0.0:10250
I0307 20:12:05.905958 9095 kubelet_node_status.go:204] Setting node annotation to enable volume controller attach/detach
F0307 20:12:05.906897 9095 server.go:148] listen tcp 0.0.0.0:10255: bind: address already in use

I0307 20:41:46.658932 11568 manager.go:204] Version: {KernelVersion:4.4.0-45-generic ContainerOsVersion:Ubuntu 16.04.1 LTS DockerVersion:1.12.6 CadvisorVersion: CadvisorRevision:}
I0307 20:41:46.659568 11568 cadvisor_linux.go:152] Failed to register cAdvisor on port 4194, retrying. Error: listen tcp :4194: bind: address already in use
W0307 20:41:46.660545 11568 server.go:669] No api server defined - no events will be sent to API server.
W0307 20:41:46.661890 11568 kubelet_network.go:69] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
I0307 20:41:46.661915 11568 kubelet.go:477] Hairpin mode set to "hairpin-veth"
I0307 20:41:46.667057 11568 docker_manager.go:257] Setting dockerRoot to /var/lib/docker
I0307 20:41:46.667072 11568 docker_manager.go:260] Setting cgroupDriver to cgroupfs
I0307 20:41:46.667824 11568 server.go:770] Started kubelet v1.5.2
E0307 20:41:46.667834 11568 kubelet.go:1145] Image garbage collection failed: unable to find data for container /
E0307 20:41:46.667877 11568 server.go:481] Starting health server failed: listen tcp 127.0.0.1:10248: bind: address already in use
I0307 20:41:46.667877 11568 server.go:123] Starting to listen on 0.0.0.0:10250
W0307 20:41:46.667915 11568 kubelet.go:1224] No api server defined - no node status update will be sent.
I0307 20:41:46.668133 11568 kubelet_node_status.go:204] Setting node annotation to enable volume controller attach/detach
F0307 20:41:46.668528 11568 server.go:148] listen tcp 0.0.0.0:10255: bind: address already in use

bamb00 · 2017-03-09T00:22:33Z

I am not clear on how to determine if garbage collection is failing. For example, if you execute the command "kubelet logs" every minutes you will see the message,

Started kubelet v1.5.2
Image garbage collection failed: unable to find data for container /

So does that mean the kubelet process die then restart every minutes. Then I check kubelet process elapsed time and the uptime is (/usr/bin/kubelet --kubeconfig=/etc/kubernetes/kubelet.conf) 25 mins.

Is there logs that can basically show when, why garbage collection for image & container failed?

Thanks in Advance.

vishh · 2017-03-09T01:03:25Z

cc @dashpole

…

On Wed, Mar 8, 2017 at 4:23 PM, bamb00 ***@***.***> wrote: I am not clear on how to determine if garbage collection is failing. For example, if you execute the command "kubelet logs" every minuted you will see the message, - Started kubelet v1.5.2 - Image garbage collection failed: unable to find data for container / So does that mean the kubelet process die then restart every minutes. Then I check kubelet process elapsed time is 25 min. Is there logs that can basically show when, why garbage collection for image & container failed? Thanks in Advance. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#26000 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGvIKBMIJcnj8xNZVnBgPNVHrE7VEkVjks5rj0ZkgaJpZM4Ijn2e> .

r0bj · 2017-03-09T20:22:17Z

kubernetes: 1.5.3
OS: ubuntu 14.04
docker: 1.12.6.

I'm not sure if it's related but there is / in Managed containers cadvisor section:

# curl http://127.0.0.1:4194/validate/
[...]
	/docker/3e6f17e68d05ae2ddbd75f34edfc0b6d5daae9ba8a7a63501855e128f57db226
		Namespace: docker
		Aliases:
			k8s_POD.d8dbe16c_fluentd-node-agent-hvd4v_kube-system_1025219a-04f3-11e7-aa27-525400146bee_d53fa8c7
			3e6f17e68d05ae2ddbd75f34edfc0b6d5daae9ba8a7a63501855e128f57db226
	/
	/docker/d2c0f8ebe5671313db964afbe7b7de37ff59a51be4ea4054b0117a5a364e7426
		Namespace: docker
		Aliases:
			k8s_POD.d8dbe16c_node-problem-detector-v0.3-7hjcw_kube-system_1029e0a4-04f3-11e7-aa27-525400146bee_985d23e7
			d2c0f8ebe5671313db964afbe7b7de37ff59a51be4ea4054b0117a5a364e7426
[...]

Full cadvisor validate report:
https://gist.github.com/r0bj/68ae481f89a68cb004208e55f2d5a403

log:

Warning		ImageGCFailed		unable to find data for container /

dashpole · 2017-03-10T21:14:41Z

This error may or may not be benign. This error usually occurs when the kubelet tries to get metrics before the first metrics have been collected. This is normally not a problem, as the kubelet eventually retries, and should succeed once metrics collection has started.

@bamb00, by best guess is that this is benign in your case, since I see Started kubelet v1.5.2 right before each error. This indicates to me that the kubelet just started. If the kubelet is restarting every minute continuously, you may have other problems. If you are continuously getting this error when the kubelet has not recently started, then there may still be issues with metrics collection.

For anyone else who thinks they may be having metrics collection issues, look for the following log lines (in kubelet.log) to help debug:
This indicates that we have started metrics collection: Start housekeeping for container "/"
This indicates that stats collection failed, and may be a sign of problems: Failed to update stats for container "/"

dashpole · 2017-03-10T21:19:42Z

Regardless of if we find bugs with garbage collection, Ill update the error message to make this more obvious that this frequently occurs during initialization.

dashpole · 2017-03-10T21:23:48Z

@xmik you also appear to have a restarting kubelet. Note that the process numbers each time you see the error message are different.

dashpole · 2017-03-10T21:24:36Z

also, for anyone debugging this, success is only recorded once after a failure. This was done to reduce log spamming.

dashpole · 2017-03-10T21:27:41Z

@azell, your kubelet also appears to have restarted, as the process numbers are different in each log.

xmik · 2017-03-15T19:54:09Z

Thanks @dashpole for the explanation. I confirm that I see this message right after the kubelet was just started and after 5 minutes garbage collection succeeds:

$ journalctl -u kubelet | grep -i garbage -A 1 -B 1
k8s-master-1 kubelet[8612]: I0315 16:45:33.440237    8612 server.go:770] Started kubelet v1.5.3
k8s-master-1 kubelet[8612]: E0315 16:45:33.442828    8612 kubelet.go:1145] Image garbage collection failed: unable to find data for container /
k8s-master-1 kubelet[8612]: I0315 16:45:33.444018    8612 kubelet_node_status.go:204] Setting node annotation to enable volume controller attach/detach
--
k8s-master-1 kubelet[8612]: I0315 16:50:29.437446    8612 conversion.go:134] failed to handle multiple devices for container. Skipping Filesystem stats
k8s-master-1 kubelet[8612]: I0315 16:50:33.444414    8612 kubelet.go:1155] Image garbage collection succeeded
k8s-master-1 kubelet[8612]: I0315 16:50:34.271999    8612 container_manager_linux.go:434] Discovered runtime cgroups name: /system.slice/docker.service

There are no more such messages since this vm was started. The same on k8s worker vm. (And I don't have restarting kubelet anymore).

I see however another log message, repeated in this manner:

k8s-master-1 kubelet[8612]: I0315 20:39:25.206217    8612 conversion.go:134] failed to handle multiple devices for container. Skipping Filesystem stats
k8s-master-1 kubelet[8612]: I0315 20:39:25.206850    8612 conversion.go:134] failed to handle multiple devices for container. Skipping Filesystem stats
k8s-master-1 kubelet[8612]: I0315 20:39:35.305632    8612 conversion.go:134] failed to handle multiple devices for container. Skipping Filesystem stats
k8s-master-1 kubelet[8612]: I0315 20:39:35.310915    8612 conversion.go:134] failed to handle multiple devices for container. Skipping Filesystem stats
k8s-master-1 kubelet[8612]: I0315 20:39:45.382430    8612 conversion.go:134] failed to handle multiple devices for container. Skipping Filesystem stats
k8s-master-1 kubelet[8612]: I0315 20:39:45.383145    8612 conversion.go:134] failed to handle multiple devices for container. Skipping Filesystem stats

I thought maybe it is connected to this issue, because it concerns stats. But you @dashpole already commented on this here which answered my concerns and I will happily wait for your PR to be cherrypicked.

xmik · 2017-03-15T20:05:07Z

To say it more clearly: I had an unstable cluster back then when I saw a lot of unable to find data for container / messages. Kubelet was restarted many times then. Now, there is 1 kubelet process running for a long time and there is only 1 such message.

Automatic merge from submit-queue Clearer ImageGC failure errors. Fewer events. Addresses kubernetes#26000. Kubelet often "fails" image garbage collection if cAdvisor has not completed the first round of stats collection. Don't create events for a single failure, and make log messages more specific. @kubernetes/sig-node-bugs

dashpole · 2017-04-04T23:56:27Z

/close
via #42916

ichekrygin · 2017-05-31T22:48:07Z

Hi, I am getting this on v1.6.4

curl http://127.0.0.1:4194/validate/
cAdvisor version: 

OS version: Container Linux by CoreOS 1353.7.0 (Ladybug)

Kernel version: [Supported and recommended]
        Kernel version is 4.9.24-coreos. Versions >= 2.6 are supported. 3.0+ are recommended.


Cgroup setup: [Supported and recommended]
        Available cgroups: map[cpuacct:1 devices:1 freezer:1 net_cls:1 perf_event:1 pids:1 cpuset:1 cpu:1 blkio:1 memory:1 net_prio:1 hugetlb:1]
        Following cgroups are required: [cpu cpuacct]
        Following other cgroups are recommended: [memory blkio cpuset devices freezer]
        Hierarchical memory accounting enabled. Reported memory usage includes memory used by child containers.
...
Docker version: [Supported and recommended]
        Docker version is 1.12.6. Versions >= 1.0 are supported. 1.2+ are recommended.


Docker driver setup: [Supported and recommended]
        Docker exec driver is . Storage driver is overlay.


Block device setup: [Supported, but not recommended]
        None of the devices support 'cfq' I/O scheduler. No disk stats can be reported.
         Disk "dm-0" Scheduler type "none".
         Disk "xvda" Scheduler type "none".
...

phagunbaya · 2017-08-04T00:03:19Z

Experiencing same on kubernetes v1.6.2 on azure.

curl http://127.0.0.1:4194/validate/ never returns response.

xmik · 2017-08-04T08:21:11Z

@ichekrygin , @phagunbaya if you look into kubernetes source code across tags, you'll see that #42916 was merged in v1.7.0-alpha.1.

xiongraorao · 2018-05-25T05:20:12Z

today, I meet the same issue. So I searched any possible issue in kubelet logfile(/var/log/upstart/kubelet.log) by google, but I got nothing to solve this issue, the node was still "NotReady" status.

like this:

It occurred to me that I installed "lxc" and "lxd" software in my ubuntu system, and the container's matched volume had been used up, It seemed that I found the possible problem......

next is the filled volumes

solutions:

remove the "snap" "lxc" "lxd" software
umount the /dev/loop# volumes, and remove its' matched files
restart kubelet

Then, you execute "kubectl get nodes" command, you will find the node's status has changed to "Ready"

Although this problem has been solved, but I still don't understand why this operation will make it, maybe today is my lucky day. Emm, I must be.

code4happylife · 2019-07-13T09:12:08Z

I encounter the same issue like this

dparv · 2020-03-19T09:25:02Z

same issue with 1.15.10

Mar 19 08:13:44 XXXXXXXX kubelet.daemon[6354]: E0319 08:13:44.820776 6354 kubelet.go:1294] Image garbage collection failed once. Stats initialization may not have completed yet: failed to garbage collect required amount of images. Wanted to free 788529152 bytes, but freed 0 bytes

gitnapn · 2023-05-17T02:24:02Z

I encounter the same issue like this
kubelet v1.16.6
docker v18.09.2
[ ]
- [ ]
[ ]
[ ]
May 17 10:16:24 localhost kubelet: I0517 10:16:24.640951 35643 docker_service.go:255] Docker cri networking managed by kubernetes.io/no-op
May 17 10:16:24 localhost kubelet: I0517 10:16:24.649077 35643 docker_service.go:260] Docker Info: &{ID:FTD3:SELZ:ELT6:SSHT:FS4M:DYKK:BDOF:INLB:6L2G:USMS:EOKK:IZED Containers:0 ContainersRunning:0 ContainersPaused:0 ContainersStopped:0 Images:0 Driver:overlay2 DriverStatus:[[Backing Filesystem xfs] [Supports d_type true] [Native Overlay Diff true]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host macvlan null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true KernelMemoryTCP:false CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true PidsLimit:false IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:22 OomKillDisable:true NGoroutines:37 SystemTime:2023-05-17T10:16:24.641726946+08:00 LoggingDriver:json-file CgroupDriver:cgroupfs NEventsListener:0 KernelVersion:6.3.1-1.el7.elrepo.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc000540310 NCPU:2 MemTotal:8314265600 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:kubernetes-node01 Labels:[] ExperimentalBuild:false ServerVersion:18.09.2 ClusterStore: ClusterAdvertise: Runtimes:map[runc:{Path:runc Args:[]}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster: Warnings:[]} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:3dce8eb055cbb6872793272b4f20ed16117344f8 Expected:3dce8eb055cbb6872793272b4f20ed16117344f8} RuncCommit:{ID:N/A Expected:N/A} InitCommit:{ID:fec3683 Expected:fec3683} SecurityOptions:[name=seccomp,profile=default] ProductLicense:Community Engine Warnings:[]}
May 17 10:16:24 localhost kubelet: I0517 10:16:24.649153 35643 docker_service.go:273] Setting cgroupDriver to cgroupfs
May 17 10:16:24 localhost kubelet: I0517 10:16:24.649266 35643 kubelet.go:641] Starting the GRPC server for the docker CRI shim.
May 17 10:16:24 localhost kubelet: I0517 10:16:24.649281 35643 docker_server.go:59] Start dockershim grpc server
May 17 10:16:24 localhost kubelet: I0517 10:16:24.659172 35643 remote_runtime.go:59] parsed scheme: ""
May 17 10:16:24 localhost kubelet: I0517 10:16:24.659187 35643 remote_runtime.go:59] scheme "" not registered, fallback to default scheme
May 17 10:16:24 localhost kubelet: I0517 10:16:24.659204 35643 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/run/dockershim.sock 0 }] }
May 17 10:16:24 localhost kubelet: I0517 10:16:24.659210 35643 clientconn.go:577] ClientConn switching balancer to "pick_first"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.659323 35643 remote_image.go:50] parsed scheme: ""
May 17 10:16:24 localhost kubelet: I0517 10:16:24.659331 35643 remote_image.go:50] scheme "" not registered, fallback to default scheme
May 17 10:16:24 localhost kubelet: I0517 10:16:24.659338 35643 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/run/dockershim.sock 0 }] }
May 17 10:16:24 localhost kubelet: I0517 10:16:24.659342 35643 clientconn.go:577] ClientConn switching balancer to "pick_first"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.659389 35643 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc0009311f0, CONNECTING
May 17 10:16:24 localhost kubelet: I0517 10:16:24.659998 35643 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc0009313d0, CONNECTING
May 17 10:16:24 localhost kubelet: I0517 10:16:24.663878 35643 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc0009311f0, READY
May 17 10:16:24 localhost kubelet: I0517 10:16:24.663890 35643 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc0009313d0, READY
May 17 10:16:24 localhost kubelet: E0517 10:16:24.948164 35643 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
May 17 10:16:24 localhost kubelet: For verbose messaging see aws.Config.CredentialsChainVerboseErrors
May 17 10:16:24 localhost kubelet: I0517 10:16:24.950724 35643 kuberuntime_manager.go:207] Container runtime docker initialized, version: 18.09.2, apiVersion: 1.39.0
May 17 10:16:24 localhost kubelet: I0517 10:16:24.951177 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/aws-ebs"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.951191 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/azure-disk"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952257 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/azure-file"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952279 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/cinder"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952293 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/gce-pd"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952299 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/vsphere-volume"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952308 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/empty-dir"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952314 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/git-repo"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952323 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/host-path"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952329 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/nfs"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952335 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/secret"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952343 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/iscsi"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952352 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/glusterfs"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952359 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/rbd"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952364 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/quobyte"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952370 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/cephfs"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952378 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/downward-api"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952384 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/fc"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952390 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/flocker"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952396 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/configmap"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952402 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/projected"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952419 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/portworx-volume"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952428 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/scaleio"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952434 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/local-volume"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952440 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/storageos"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.952454 35643 plugins.go:630] Loaded volume plugin "kubernetes.io/csi"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.953465 35643 server.go:1065] Started kubelet
May 17 10:16:24 localhost kubelet: E0517 10:16:24.954370 35643 kubelet.go:1302] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache
May 17 10:16:24 localhost kubelet: I0517 10:16:24.955454 35643 certificate_manager.go:254] Certificate rotation is enabled.
May 17 10:16:24 localhost kubelet: I0517 10:16:24.955486 35643 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer
May 17 10:16:24 localhost kubelet: I0517 10:16:24.955502 35643 status_manager.go:156] Starting to sync pod status with apiserver
May 17 10:16:24 localhost kubelet: I0517 10:16:24.955513 35643 kubelet.go:1822] Starting kubelet main sync loop.
May 17 10:16:24 localhost kubelet: I0517 10:16:24.955575 35643 kubelet.go:1839] skipping pod synchronization - [container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]
May 17 10:16:24 localhost kubelet: I0517 10:16:24.955636 35643 server.go:145] Starting to listen on 10.0.0.30:10250
May 17 10:16:24 localhost kubelet: I0517 10:16:24.956344 35643 server.go:354] Adding debug handlers to kubelet server.
May 17 10:16:24 localhost kubelet: I0517 10:16:24.960555 35643 volume_manager.go:247] The desired_state_of_world populator starts
May 17 10:16:24 localhost kubelet: I0517 10:16:24.960570 35643 volume_manager.go:249] Starting Kubelet Volume Manager
May 17 10:16:24 localhost kubelet: I0517 10:16:24.964393 35643 desired_state_of_world_populator.go:131] Desired state populator starts to run
May 17 10:16:24 localhost kubelet: I0517 10:16:24.968845 35643 clientconn.go:104] parsed scheme: "unix"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.968855 35643 clientconn.go:104] scheme "unix" not registered, fallback to default scheme
May 17 10:16:24 localhost kubelet: I0517 10:16:24.969061 35643 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0 }] }
May 17 10:16:24 localhost kubelet: I0517 10:16:24.969072 35643 clientconn.go:577] ClientConn switching balancer to "pick_first"
May 17 10:16:24 localhost kubelet: I0517 10:16:24.969109 35643 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc000aed050, CONNECTING
May 17 10:16:24 localhost kubelet: I0517 10:16:24.969371 35643 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc000aed050, READY
May 17 10:16:24 localhost kubelet: I0517 10:16:24.971740 35643 factory.go:137] Registering containerd factory
May 17 10:16:24 localhost kubelet: I0517 10:16:24.985355 35643 factory.go:356] Registering Docker factory
May 17 10:16:24 localhost kubelet: I0517 10:16:24.985535 35643 factory.go:54] Registering systemd factory
May 17 10:16:24 localhost kubelet: I0517 10:16:24.985641 35643 factory.go:101] Registering Raw factory
May 17 10:16:24 localhost kubelet: I0517 10:16:24.985977 35643 manager.go:1159] Started watching for new ooms in manager
May 17 10:16:24 localhost kubelet: I0517 10:16:24.988847 35643 manager.go:272] Starting recovery of all containers
May 17 10:16:24 localhost dockerd: time="2023-05-17T10:16:24.980550112+08:00" level=warning msg="failed to retrieve runc version: unknown output format: runc version 1.1.7\ncommit: v1.1.7-0-g860f061\nspec: 1.0.2-dev\ngo: go1.19.9\nlibseccomp: 2.3.1\n"
May 17 10:16:25 localhost kubelet: I0517 10:16:25.001896 35643 manager.go:277] Recovery completed
May 17 10:16:25 localhost kubelet: W0517 10:16:25.003226 35643 container.go:523] Failed to update stats for container "/": strconv.ParseUint: parsing "(unknown)": invalid syntax, continuing to push stats
May 17 10:16:25 localhost kubelet: I0517 10:16:25.007865 35643 kubelet_network_linux.go:111] Not using --random-fully in the MASQUERADE rule for iptables because the local version of iptables does not support it
May 17 10:16:25 localhost kubelet: I0517 10:16:25.044843 35643 cpu_manager.go:161] [cpumanager] starting with none policy
May 17 10:16:25 localhost kubelet: I0517 10:16:25.044878 35643 cpu_manager.go:162] [cpumanager] reconciling every 10s
May 17 10:16:25 localhost kubelet: I0517 10:16:25.044885 35643 policy_none.go:42] [cpumanager] none policy: Start
May 17 10:16:25 localhost kubelet: F0517 10:16:25.044900 35643 kubelet.go:1380] Failed to start ContainerManager failed to get rootfs info: unable to find data in memory cache
May 17 10:16:25 localhost systemd: kubelet.service: main process exited, code=exited, status=255/n/a
May 17 10:16:25 localhost systemd: Unit kubelet.service entered failed state.
May 17 10:16:25 localhost systemd: kubelet.service failed.
[ ]
[ ]
[ ]

mikedanese added the sig/node Categorizes an issue or PR as relevant to SIG Node. label May 25, 2016

MaikuMori mentioned this issue Sep 24, 2016

Image GC fails: Unable to remove filesystem - device or resource busy #33433

Closed

vishh assigned dashpole Mar 9, 2017

dashpole mentioned this issue Mar 10, 2017

Clearer ImageGC failure errors. Fewer events. #42916

Merged

k8s-ci-robot closed this as completed Apr 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet Image garbage collection failed: unable to find data for container / #26000

kubelet Image garbage collection failed: unable to find data for container / #26000

azell commented May 20, 2016

vishh commented May 20, 2016

df -h

journalctl -u kubelet | grep -i garbage

curl http://127.0.0.1:4194/validate/

matthughes commented Jul 12, 2016

vishh commented Jul 12, 2016

ronnielai commented Jul 12, 2016 •

edited

mdshuai commented Aug 10, 2016 •

edited

victorgp commented Oct 25, 2016

chbatey commented Nov 18, 2016

xmik commented Dec 12, 2016

bamb00 commented Mar 7, 2017 •

edited

vishh commented Mar 7, 2017 via email

bamb00 commented Mar 7, 2017

vishh commented Mar 7, 2017 via email

bamb00 commented Mar 7, 2017 •

edited

bamb00 commented Mar 9, 2017 •

edited

vishh commented Mar 9, 2017 via email

r0bj commented Mar 9, 2017 •

edited

dashpole commented Mar 10, 2017

dashpole commented Mar 10, 2017

dashpole commented Mar 10, 2017 •

edited

dashpole commented Mar 10, 2017

dashpole commented Mar 10, 2017

xmik commented Mar 15, 2017

xmik commented Mar 15, 2017

dashpole commented Apr 4, 2017

ichekrygin commented May 31, 2017

phagunbaya commented Aug 4, 2017

xmik commented Aug 4, 2017

xiongraorao commented May 25, 2018 •

edited

code4happylife commented Jul 13, 2019

dparv commented Mar 19, 2020 •

edited

gitnapn commented May 17, 2023 •

edited

kubelet Image garbage collection failed: unable to find data for container / #26000

kubelet Image garbage collection failed: unable to find data for container / #26000

Comments

azell commented May 20, 2016

vishh commented May 20, 2016

df -h

journalctl -u kubelet | grep -i garbage

curl http://127.0.0.1:4194/validate/

matthughes commented Jul 12, 2016

vishh commented Jul 12, 2016

ronnielai commented Jul 12, 2016 • edited

mdshuai commented Aug 10, 2016 • edited

victorgp commented Oct 25, 2016

chbatey commented Nov 18, 2016

xmik commented Dec 12, 2016

bamb00 commented Mar 7, 2017 • edited

vishh commented Mar 7, 2017 via email

bamb00 commented Mar 7, 2017

vishh commented Mar 7, 2017 via email

bamb00 commented Mar 7, 2017 • edited

bamb00 commented Mar 9, 2017 • edited

vishh commented Mar 9, 2017 via email

r0bj commented Mar 9, 2017 • edited

dashpole commented Mar 10, 2017

dashpole commented Mar 10, 2017

dashpole commented Mar 10, 2017 • edited

dashpole commented Mar 10, 2017

dashpole commented Mar 10, 2017

xmik commented Mar 15, 2017

xmik commented Mar 15, 2017

dashpole commented Apr 4, 2017

ichekrygin commented May 31, 2017

phagunbaya commented Aug 4, 2017

xmik commented Aug 4, 2017

xiongraorao commented May 25, 2018 • edited

code4happylife commented Jul 13, 2019

dparv commented Mar 19, 2020 • edited

gitnapn commented May 17, 2023 • edited

ronnielai commented Jul 12, 2016 •

edited

mdshuai commented Aug 10, 2016 •

edited

bamb00 commented Mar 7, 2017 •

edited

bamb00 commented Mar 7, 2017 •

edited

bamb00 commented Mar 9, 2017 •

edited

r0bj commented Mar 9, 2017 •

edited

dashpole commented Mar 10, 2017 •

edited

xiongraorao commented May 25, 2018 •

edited

dparv commented Mar 19, 2020 •

edited

gitnapn commented May 17, 2023 •

edited