Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet cannot find device for dir /var/lib/kubelet in cached partitions map #38337

Closed
dmrub opened this issue Dec 8, 2016 · 11 comments
Closed

Comments

@dmrub
Copy link

dmrub commented Dec 8, 2016

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.):

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): "btrfs kubelet"


Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version): v1.4.6

Environment:

What happened:
kubelet continuously reports following error messages:

Dec 06 01:27:26 vilnus kubelet[2447]: E1206 01:27:26.798004    2447 kubelet.go:2132] Failed to check if disk space is available on the root partition: failed to get fs info for "root": error trying to get filesystem Device for dir /var/lib/kubelet: err: could not find device with major: 0, minor: 37 in cached partitions map

As (I assume) a side effect heapster pod fails with following error messages:

2016-12-07T22:48:05.074258000Z E1207 22:48:05.073677       1 summary.go:114] error while getting metrics summary from Kubelet antego(192.168.81.104:10255): request failed - "500 Internal Server Error", response: "Internal Error: failed RootFsInfo: error trying to get filesystem Device for dir /var/lib/kubelet: err: could not find device with major: 0, minor: 36 in cached partitions map"

When kubelet starts there is no device with major:minor ID '0:37':

Dec 06 01:27:21 vilnus kubelet[2447]: I1206 01:27:21.304451    2447 fs.go:117] Filesystem partitions: map[/dev/sda4:{mountpoint:/ major:0 minor:35 fsType:btrfs blockSize:0} /dev/sda2:{mountp
oint:/boot major:8 minor:2 fsType:xfs blockSize:0}]

stat tool does not report device with 0:35 but 0:37 (i.e. 25h/37d):

[root@vilnus rubinstein]# stat /
  File: ‘/’
  Size: 152       	Blocks: 0          IO Block: 4096   directory
Device: 25h/37d	Inode: 256         Links: 1
Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:root_t:s0
Access: 2016-12-07 23:49:42.368681876 +0100
Modify: 2016-12-06 01:27:50.993617031 +0100
Change: 2016-12-06 01:27:50.993617031 +0100
 Birth: -

Here is also /etc/fstab

#
# /etc/fstab
# Created by anaconda on Tue Nov  8 15:25:05 2016
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=4706dd7f-81aa-4bb6-bcb6-e742aae08456 /                       btrfs   subvol=root     0 0
UUID=72a8c6a4-b442-48ef-a6ad-a8da2146fab6 /boot                   xfs     defaults        0 0
UUID=039B-CD7E          /boot/efi               vfat    umask=0077,shortname=winnt 0 0
UUID=56f454ff-1d8e-4900-b034-1f777df92c30 swap                    swap    defaults        0 0
UUID=4706dd7f-81aa-4bb6-bcb6-e742aae08456 /mnt/disk-4706dd7f-81aa-4bb6-bcb6-e742aae08456 btrfs   subvol=/ 0 0

Same errors appears on all nodes and all nodes have / mounted to btrfs partition.

What you expected to happen:

No errors, kubernetes 1.2.0 installed from CentOS 7 package had no such errors.

How to reproduce it (as minimally and precisely as possible):

Kubernetes binaries were installed from here :
https://storage.googleapis.com/kubernetes-release/release/v1.4.6/bin/linux/amd64

Run kubelet on btrfs root partition.

Anything else do we need to know:

I would like at least to have recommendations how to figure out what exactly issue is.

@dmrub
Copy link
Author

dmrub commented Dec 8, 2016

looking into /proc/PID/mountinfo file of the kubelet process I see 0:35 device, but no 0:37

# grep '0:35' /proc/24371/mountinfo
62 1 0:35 /root / rw,relatime shared:1 - btrfs /dev/sda4 rw,seclabel,ssd,space_cache
74 62 0:35 / /mnt/disk-4706dd7f-81aa-4bb6-bcb6-e742aae08456 rw,relatime shared:28 - btrfs /dev/sda4 rw,seclabel,ssd,space_cache

Looks like an issue with btrfs subvolumes related to this:
https://www.spinics.net/lists/linux-btrfs/msg58908.html
https://www.spinics.net/lists/linux-btrfs/msg59039.html

dmrub added a commit to dmrub/kubernetes that referenced this issue Dec 9, 2016
@ligc
Copy link

ligc commented Jan 13, 2017

HI, we ran into this issue on SLES 12 nodes, and the patch dc8b6cc does fix the issue, thanks, any plan on integrating the patch to Kubernetes?

dmrub added a commit to dmrub/kubernetes that referenced this issue Jan 13, 2017
@dmrub
Copy link
Author

dmrub commented Jan 13, 2017

I created pull request, but still need to sign CLA.

@dmrub
Copy link
Author

dmrub commented Jan 15, 2017

I created pull request for cadvisor google/cadvisor#1574

dmrub added a commit to dmrub/kubernetes that referenced this issue Feb 10, 2017
dmrub added a commit to dmrub/cadvisor that referenced this issue Mar 15, 2017
@dashpole
Copy link
Contributor

This will hopefully make it into one of the first few patch releases of 1.6.

@mcluseau
Copy link
Contributor

mcluseau commented Apr 7, 2017

The same issue happens with tmpfs roots:

E0407 05:28:05.147388       1 summary.go:97] error while getting metrics summary from Kubelet 10.109.1.4(10.109.1.4:10255): request failed - "500 Internal Server Error", response: "Internal Error: failed RootFsInfo: error trying to get filesystem Device for dir /var/lib/kubelet: err: could not find device with major: 0, minor: 35 in cached partitions map"
# grep 0:35.*kubelet /proc/self/mountinfo 
462 199 0:35 /var/lib/kubelet /var/lib/kubelet rw,relatime shared:1 - tmpfs tmpfs rw,seclabel

In my case, I use CoreOS's kubelet-wrapper (rkt with the "fly" stage0) and /var/lib/kubelet is bind-mounted rshared.

@dashpole
Copy link
Contributor

dashpole commented Apr 7, 2017

@MikaelCluseau, your bug looks more similar to #44059, since the major and minor numbers all match up. @dmrub, I would prefer closing this issue, as it was related to brtfs, and moving discussion on @MikaelCluseau's bug to #44059. For anyone experiencing the original problem, @dmrub's solution is included in the v1.5.6 patch release.

@dmrub dmrub closed this as completed Apr 7, 2017
@mcluseau
Copy link
Contributor

mcluseau commented Apr 8, 2017

Thanks @dashpole I'll move there

flavio added a commit to flavio/kubernetes that referenced this issue Jun 7, 2017
This commit fixes the warning messages reported by kubelet when
checking for the disk space on a btrfs `/` which has `/var/lib/kubelet`
inside of a btrfs sub-volume.

This fix follows the same principle adopted to fix issue kubernetes#38337 with
commit dc8b6cc.

This commit fixes issue 47046.

Signed-off-by: Flavio Castelli <fcastelli@suse.com>
@naevtamarkus
Copy link

Does anybody know if this is still an issue?

@zdzichu
Copy link

zdzichu commented Dec 16, 2020

Yes, still happening with v1.19.4+k3s-fadc5a80 on Fedora 33 with btrfs /.

@adambkaplan
Copy link

It appears there was a regression from v1.18 to 1.19 for systems with btrfs. It appears a fix to cAdvisor was required and will be in k8s 1.21, backports to 1.19 are in progress.

See also #95826 and #94335.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants