New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet Image garbage collection failed: unable to find data for container / #26000
Comments
I suspect that this might be due to On Fri, May 20, 2016 at 3:52 PM, Adam Zell notifications@github.com wrote:
|
I'm getting the same thing with CoreOS / k8s 1.2.4 using default overlay storage driver:
|
cc @ronnielai |
The error seems to come from cadvisor. cc @timstclair @matthughes. could you provide the output of localhost:4194/api/v2.1/storage |
Also get same error log on node. I use the overlay storage driver.
|
Same here, kubernetes version 1.3.4 using CoreOS with default overlay fs |
Getting this running kuberntes v1.4.6, Ubuntu with AUFS |
Also seeing this with kubernetes v1.5.0-beta.3, Ubuntu 16.04.1, docker storage driver: overlay, docker version: 1.12.3. But sometimes image garbage collection succeeds:
@ronnielai asked for localhost:4194/api/v2.1/storage , so:
|
I'm getting the same garbage collection failed error (kubernetes server v1.5.3 & docker 1.12.6), Mar 06 16:22:36 ip-10-43-0-20 kubelet[813]: E0306 16:22:36.439499 813 kubelet.go:1145] Image garbage collection failed: unable to find data for container / |
Is this an one off error?
…On Tue, Mar 7, 2017 at 8:09 AM, bamb00 ***@***.***> wrote:
I'm getting the same garbage collction failed issue (kubernetes server
v1.5.3),
Mar 06 16:22:36 ip-10-43-0-20 kubelet[813]: E0306 16:22:36.439499 813
kubelet.go:1145] Image garbage collection failed: unable to find data for
container /
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#26000 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGvIKMFU0vNOw1Dzl9MwhR02sOq_vQrFks5rjYEpgaJpZM4Ijn2e>
.
|
@vishh, I don't understand your question? |
Is GC failing continuously or is it failing at arbitrary times?
…On Tue, Mar 7, 2017 at 9:33 AM, bamb00 ***@***.***> wrote:
@vishh <https://github.com/vishh>, I don't understand your question?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26000 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGvIKGoDwFXkHp0MBJ4yYh1VW7-eFHaIks5rjZTvgaJpZM4Ijn2e>
.
|
Corrections... I0307 20:12:05.897288 9095 manager.go:204] Version: {KernelVersion:4.4.0-45-generic ContainerOsVersion:Ubuntu 16.04.1 LTS DockerVersion:1.12.6 CadvisorVersion: CadvisorRevision:} I0307 20:41:46.658932 11568 manager.go:204] Version: {KernelVersion:4.4.0-45-generic ContainerOsVersion:Ubuntu 16.04.1 LTS DockerVersion:1.12.6 CadvisorVersion: CadvisorRevision:} |
I am not clear on how to determine if garbage collection is failing. For example, if you execute the command "kubelet logs" every minutes you will see the message,
So does that mean the kubelet process die then restart every minutes. Then I check kubelet process elapsed time and the uptime is (/usr/bin/kubelet --kubeconfig=/etc/kubernetes/kubelet.conf) 25 mins. Is there logs that can basically show when, why garbage collection for image & container failed? Thanks in Advance. |
cc @dashpole
…On Wed, Mar 8, 2017 at 4:23 PM, bamb00 ***@***.***> wrote:
I am not clear on how to determine if garbage collection is failing. For
example, if you execute the command "kubelet logs" every minuted you will
see the message,
- Started kubelet v1.5.2
- Image garbage collection failed: unable to find data for container /
So does that mean the kubelet process die then restart every minutes.
Then I check kubelet process elapsed time is 25 min.
Is there logs that can basically show when, why garbage collection for
image & container failed?
Thanks in Advance.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26000 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGvIKBMIJcnj8xNZVnBgPNVHrE7VEkVjks5rj0ZkgaJpZM4Ijn2e>
.
|
kubernetes: 1.5.3 I'm not sure if it's related but there is
Full cadvisor validate report: log:
|
This error may or may not be benign. This error usually occurs when the kubelet tries to get metrics before the first metrics have been collected. This is normally not a problem, as the kubelet eventually retries, and should succeed once metrics collection has started. @bamb00, by best guess is that this is benign in your case, since I see For anyone else who thinks they may be having metrics collection issues, look for the following log lines (in kubelet.log) to help debug: |
Regardless of if we find bugs with garbage collection, Ill update the error message to make this more obvious that this frequently occurs during initialization. |
@xmik you also appear to have a restarting kubelet. Note that the process numbers each time you see the error message are different. |
also, for anyone debugging this, success is only recorded once after a failure. This was done to reduce log spamming. |
@azell, your kubelet also appears to have restarted, as the process numbers are different in each log. |
Thanks @dashpole for the explanation. I confirm that I see this message right after the kubelet was just started and after 5 minutes garbage collection succeeds:
There are no more such messages since this vm was started. The same on k8s worker vm. (And I don't have restarting kubelet anymore). I see however another log message, repeated in this manner:
I thought maybe it is connected to this issue, because it concerns stats. But you @dashpole already commented on this here which answered my concerns and I will happily wait for your PR to be cherrypicked. |
To say it more clearly: I had an unstable cluster back then when I saw a lot of |
Automatic merge from submit-queue Clearer ImageGC failure errors. Fewer events. Addresses kubernetes#26000. Kubelet often "fails" image garbage collection if cAdvisor has not completed the first round of stats collection. Don't create events for a single failure, and make log messages more specific. @kubernetes/sig-node-bugs
/close |
Hi, I am getting this on
|
Experiencing same on kubernetes v1.6.2 on azure.
|
@ichekrygin , @phagunbaya if you look into kubernetes source code across tags, you'll see that #42916 was merged in v1.7.0-alpha.1. |
I encounter the same issue like this |
same issue with 1.15.10 Mar 19 08:13:44 XXXXXXXX kubelet.daemon[6354]: E0319 08:13:44.820776 6354 kubelet.go:1294] Image garbage collection failed once. Stats initialization may not have completed yet: failed to garbage collect required amount of images. Wanted to free 788529152 bytes, but freed 0 bytes |
|
Cluster 1.2.2 settings:
When the node disk space is low enough to trigger GC, nothing happens.
cAdvisor output looks OK:
Everything else in the cluster seems to be working. Any ideas on how to debug? For now I manually removed dangling and older Docker images.
The text was updated successfully, but these errors were encountered: