Error message not helpful when there is no space left on device #74169

itadventurer · 2019-02-16T21:04:10Z

After hours of debugging why Kubernetes boots up successfully but crashes after few seconds/minutes I finally nailed the issue:

What happened:

Start kubelet
```
systemctl start kubelet
```
Control pane boots up
Wait a 30s - few minutes
All pods/docker containers are stopped (even kubeapi, etcd, …)

The only mention of not enough space can be found somewhere deep in the logs of kubelet (journalctl --unit kubelet)

kubelet[27516]: W0216 eviction_manager.go:329] eviction manager: attempting to reclaim ephemeral-storage
kubelet[27516]: I0216 eviction_manager.go:340] eviction manager: must evict pod(s) to reclaim ephemeral-storage
kubelet[27516]: I0216 eviction_manager.go:358] eviction manager: pods ranked for eviction: etcd-***_kube-system(939d
kubelet[27516]: I0216 eviction_manager.go:563] eviction manager: pod etcd-***_kube-system(939d4dd36e909e6a6bbc5874ae
kubelet[27516]: I0216 eviction_manager.go:187] eviction manager: pods etcd-***_kube-system(939d4dd36e909e6a6bbc5874a

The underlying issue is, that the disk was full (97%, 300MB free) and thus kubernetes evicted the whole node to get more space for ephemeral storage.

What you expected to happen:

A more helpful message would be nice. This one is very hard to find and understand.

Maybe this should be added as a preflight check to kubeadm?

How to reproduce it (as minimally and precisely as possible):

Install Kubernetes on a drive with a too few free disk space.

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): v1.13.3
Cloud provider or hardware configuration: Bare metal
OS (e.g. from /etc/os-release): CentOS 7
Kernel (e.g. uname -a): 3.10.0
Install tools: kubeadm
Others:

The text was updated successfully, but these errors were encountered:

neolit123 · 2019-02-18T18:49:37Z

/sig node

zouyee · 2019-02-23T07:35:42Z

kubectl describe node {NODE-NAME}, which could output DiskPressure condition

itadventurer · 2019-02-24T14:26:13Z

Yes that would be a way to go. But bear in mind, that in my case I had only one master which is up for ~30s. So I would need to start the kubelet service and be fast enough to enter the command.

fejta-bot · 2019-05-25T14:51:35Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-06-24T15:39:04Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-07-24T16:25:02Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-07-24T16:25:10Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

itadventurer added the kind/bug Categorizes issue or PR as related to a bug. label Feb 16, 2019

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Feb 16, 2019

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 18, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 25, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 24, 2019

k8s-ci-robot closed this as completed Jul 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error message not helpful when there is no space left on device #74169

Error message not helpful when there is no space left on device #74169

itadventurer commented Feb 16, 2019

neolit123 commented Feb 18, 2019

zouyee commented Feb 23, 2019

itadventurer commented Feb 24, 2019

fejta-bot commented May 25, 2019

fejta-bot commented Jun 24, 2019

fejta-bot commented Jul 24, 2019

k8s-ci-robot commented Jul 24, 2019

Error message not helpful when there is no space left on device #74169

Error message not helpful when there is no space left on device #74169

Comments

itadventurer commented Feb 16, 2019

neolit123 commented Feb 18, 2019

zouyee commented Feb 23, 2019

itadventurer commented Feb 24, 2019

fejta-bot commented May 25, 2019

fejta-bot commented Jun 24, 2019

fejta-bot commented Jul 24, 2019

k8s-ci-robot commented Jul 24, 2019