Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do NOT install Rook in a Kind/k3os cluster(s) (it'll destroy your volumes) #9470

Closed
ghost opened this issue Dec 19, 2021 · 37 comments · Fixed by #10230
Closed

Do NOT install Rook in a Kind/k3os cluster(s) (it'll destroy your volumes) #9470

ghost opened this issue Dec 19, 2021 · 37 comments · Fixed by #10230

Comments

@ghost
Copy link

ghost commented Dec 19, 2021

I was testing out rook in a kind cluster on my Linux laptop. I setup the CRDs and installed cluster-test.yaml last night. Today, I noticed that my drives started to behave weird (couldnt decrypt anymore, couldnt mount). I decided to reboot. My root volume and my other volume were converted into ceph_bluestore volumes, and therefore unmountable. Literally destroyed over 3TB of data.

A wild guess is that since my user is part of the docker group, the underlying system seen these volumes and decided to convert them all to ceph volumes. This is either a major bug or feature.

I have had to re-install my OS and restore a volume from backup due to this.

$ sudo cryptsetup luksOpen /dev/sda1 vol
[sudo] password for x:         
Device /dev/sda1 is not a valid LUKS device.

$ sudo lsblk -f
NAME                FSTYPE         LABEL UUID                                   FSAVAIL FSUSE% MOUNTPOINT
sda                                                                                            
└─sda1              ceph_bluestore       

Probably here: https://github.com/rook/rook/blob/master/deploy/examples/cluster-test.yaml#L43-L45

  storage:
    useAllNodes: true
    useAllDevices: true # <- this is probably not safe for kind test clusters
@ghost ghost added the bug label Dec 19, 2021
@leseb
Copy link
Member

leseb commented Dec 20, 2021

Hi @withernet, sorry about this, indeed we had similar reports in the past with Kind. Our dev env guide reflects this https://rook.io/docs/rook/v1.8/development-environment.html#minikube.
We would like to avoid such issues as much as possible and always encourage users to use virtual machines instead of testing directly on their machines.

By any chance, do you have the logs from the prepare job? So that we could try to understand why the root disk was picked?
Thanks and glad to see you had a backup.

@ghost
Copy link
Author

ghost commented Dec 20, 2021

Unfortunately I do not. I do remember seeing in the web UI that there was 3.9TB available and I thought that was weird. But... both the volumes I had added up to about 3.9TB.

I also want to say that it was all attached disks. It reformatted my NVMe root volume and my USB drive.

@leseb
Copy link
Member

leseb commented Dec 21, 2021

Unfortunately I do not. I do remember seeing in the web UI that there was 3.9TB available and I thought that was weird. But... both the volumes I had added up to about 3.9TB.

Is Kind aggregating the storage and presenting it differently somewhat?

I also want to say that it was all attached disks. It reformatted my NVMe root volume and my USB drive.

Yes, that makes sense with useAllDevices: true.

@longtian
Copy link

longtian commented Jan 7, 2022

😿 same problem last year on my desktop. I lost all ext4 drivers while NTFS drivers remain untouched

@ghost
Copy link
Author

ghost commented Jan 7, 2022

Finally, independent confirmation ;)

@leseb leseb pinned this issue Jan 7, 2022
@preeefix
Copy link

Leaving my mark here.

Environment: Bare Metal + k3os
Spun up a new instance, applied MetalLB, and

kubectl apply -f crds.yaml
kubectl apply -f common.yaml
kubectl apply -f operator.yaml
kubectl apply -f cluster-test.yaml

Went to go grab a drink from the fridge and chat with the fiancee. Came back to the console (iDRAC) spitting errors that k3s couldn't log to /var/log/.

Rebooted and got the grub screen.

@withernet Can you modify the title a bit to be specific to useAllDevices: true ? Seems like this isn't kind specific, but rather the prepare job isn't happy with the docker passthrough of the devices.

@leseb Mind if I make a PR to change the cluster-test.yaml to have something like an example device selector and update the documentation accordingly?

@subhamkrai
Copy link
Contributor

Leaving my mark here.

Environment: Bare Metal + k3os Spun up a new instance, applied MetalLB, and

kubectl apply -f crds.yaml
kubectl apply -f common.yaml
kubectl apply -f operator.yaml
kubectl apply -f cluster-test.yaml

Went to go grab a drink from the fridge and chat with the fiancee. Came back to the console (iDRAC) spitting errors that k3s couldn't log to /var/log/.

Rebooted and got the grub screen.

@withernet Can you modify the title a bit to be specific to useAllDevices: true ? Seems like this isn't kind specific, but rather the prepare job isn't happy with the docker passthrough of the devices.

@leseb Mind if I make a PR to change the cluster-test.yaml to have something like an example device selector and update the documentation accordingly?

@crdnl sorry to hear that.
Regarding adding deviceSelector, we have field deviceSelector which accept regex. See this. Did you look at that?

@preeefix
Copy link

@subhamkrai

@crdnl sorry to hear that.
Regarding adding deviceSelector, we have field deviceSelector which accept regex. See this. Did you look at that?

Thankfully, it was a dev cluster so there wasn't anything on it (yet). I did add a devices block to my nodes config.

I was more-so referencing modifying the provided sample file (in deploy/examples/) to not use useAllDevices: true by default, but set it to something sane like adding a default device selector.


As for why it happened, I did some digging through my prepare logs on a new cluster, and paid attention to the preparer this time. Logs are at https://gist.github.com/crdnl/f63bc578e5108f72f4bfa0b405564368

On line #53, it does note that the device sda has a child, sda1 and considers it instead. Following that, on L178, it does indeed mark it as available and if I didn't put a device selector on, it would've consumed it (again).

By tweaking the toolbox container (setting privileged, mounting /dev, and setting runAsUser: 0): I got access to the ceph tooling and running ceph-volume inventory --format json-pretty, I do get available:

{
    "available": true,
    "device_id": "",
    "lsm_data": {},
    "lvs": [],
    "path": "/dev/sda1",
    "rejected_reasons": [],
    "sys_api": {
        "holders": [],
        "human_readable_size": "279.40 GB",
        "sectors": "585935452",
        "sectorsize": 512,
        "size": 299998951424.0,
        "start": "2048"
    }
}

To add on, the output of lsblk for /dev/sda:

ouroboros [~]$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop1    7:1    0  49.3M  1 loop /usr
loop2    7:2    0 312.4M  0 loop /usr/src
                                 /usr/lib/firmware
                                 /usr/lib/modules
sda      8:0    0 279.4G  0 disk
└─sda1   8:1    0 279.4G  0 part /k3os/system
                                 /boot
                                 /
sdb      8:16   0 279.4G  0 disk
<snip>

TL; DR;
My educated guess is that because the "root" of the container isn't actually mounted on /dev/sda or /dev/sda1, ceph-volume inventory isn't flagging it as locked like it probably should.

@leseb
Copy link
Member

leseb commented Jan 12, 2022

@crdnl Thanks for the detailed analysis. It's surprising that even if /dev/sda1 is mounted it appears as available. If you still have access to this reproducer. Could you also add the CEPH_VOLUME_DEBUG env var, run ceph-volume again and collect the ceph-volume.log content? That will be really useful to understand why this device is made available.

@preeefix
Copy link

@leseb Just ran CEPH_VOLUME_DEBUG=1 ceph-volume inventory /dev/sda1

[2022-01-14 21:27:55,502][ceph_volume.main][INFO  ] Running command: ceph-volume  inventory /dev/sda1
[2022-01-14 21:27:55,503][ceph_volume.process][INFO  ] Running command: /usr/bin/lsblk -plno KNAME,NAME,TYPE
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/loop1 /dev/loop1                                                                                                      loop
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/loop2 /dev/loop2                                                                                                      loop
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sda   /dev/sda                                                                                                        disk
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sda1  /dev/sda1                                                                                                       part
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sdb   /dev/sdb                                                                                                        disk
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sdc   /dev/sdc                                                                                                        disk
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sdd   /dev/sdd                                                                                                        disk
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sde   /dev/sde                                                                                                        disk
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sdf   /dev/sdf                                                                                                        disk
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sdg   /dev/sdg                                                                                                        disk
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sdh   /dev/sdh                                                                                                        disk
[2022-01-14 21:27:55,509][ceph_volume.process][INFO  ] stdout /dev/sr0   /dev/sr0                                                                                                        rom
[2022-01-14 21:27:55,509][ceph_volume.process][INFO  ] stdout /dev/dm-0  /dev/mapper/ceph--f47c2c56--e8b1--43a9--8ef3--b7dbc48a939a-osd--block--89b9a18f--11f2--47bd--8b34--8fdb83a82242 lvm
[2022-01-14 21:27:55,509][ceph_volume.process][INFO  ] stdout /dev/dm-1  /dev/mapper/ceph--d4c98bae--148e--41d5--9e21--a0d0cf74024d-osd--block--e1aa46ad--8146--4118--8620--1966f7c9f61b lvm
[2022-01-14 21:27:55,509][ceph_volume.process][INFO  ] stdout /dev/dm-2  /dev/mapper/ceph--98f8cccf--0e83--489d--a138--9fd056f75e85-osd--block--629ae4b6--c966--4e68--804f--0f70acbfcf1d lvm
[2022-01-14 21:27:55,509][ceph_volume.process][INFO  ] stdout /dev/dm-3  /dev/mapper/ceph--47294791--355b--437e--aaa2--2d1d216b0db2-osd--block--237c1c3d--a0de--428c--a2ed--59d288cad08b lvm
[2022-01-14 21:27:55,509][ceph_volume.process][INFO  ] stdout /dev/dm-4  /dev/mapper/ceph--1797c0da--6c89--4409--a253--e9226df1bef9-osd--block--02062c5f--c8f0--43f4--a20e--bb7ed15e9899 lvm
[2022-01-14 21:27:55,509][ceph_volume.process][INFO  ] stdout /dev/dm-5  /dev/mapper/ceph--470f8894--67e8--43f4--a86e--697435a70af0-osd--block--ba3a5eb2--96a1--42e7--82f3--90a5da18cc90 lvm
[2022-01-14 21:27:55,509][ceph_volume.process][INFO  ] stdout /dev/dm-6  /dev/mapper/ceph--2aa44172--c97a--4e7c--ad91--5b25c6572980-osd--block--d90e0056--a7b2--4f22--a248--06292eb541e1 lvm
[2022-01-14 21:27:55,515][ceph_volume.process][INFO  ] Running command: /usr/sbin/lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S lv_path=/dev/sda1 -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2022-01-14 21:27:55,621][ceph_volume.process][INFO  ] Running command: /usr/bin/lsblk --nodeps -P -o NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL /dev/sda1
[2022-01-14 21:27:55,625][ceph_volume.process][INFO  ] stdout NAME="sda1" KNAME="sda1" MAJ:MIN="8:1" FSTYPE="" MOUNTPOINT="/etc/resolv.conf" LABEL="" UUID="" RO="0" RM="0" MODEL="" SIZE="279.4G" STATE="" OWNER="root" GROUP="disk" MODE="brw-rw----" ALIGNMENT="0" PHY-SEC="512" LOG-SEC="512" ROTA="1" SCHED="mq-deadline" TYPE="part" DISC-ALN="0" DISC-GRAN="0B" DISC-MAX="0B" DISC-ZERO="0" PKNAME="sda" PARTLABEL=""
[2022-01-14 21:27:55,625][ceph_volume.process][INFO  ] Running command: /usr/sbin/blkid -c /dev/null -p /dev/sda1
[2022-01-14 21:27:55,627][ceph_volume.process][INFO  ] stdout /dev/sda1: LABEL="K3OS_STATE" UUID="5a9e85be-290b-4d7d-9fac-98209739e3d5" VERSION="1.0" BLOCK_SIZE="4096" TYPE="ext4" USAGE="filesystem" PART_ENTRY_SCHEME="dos" PART_ENTRY_UUID="12113f49-01" PART_ENTRY_TYPE="0x83" PART_ENTRY_FLAGS="0x80" PART_ENTRY_NUMBER="1" PART_ENTRY_OFFSET="2048" PART_ENTRY_SIZE="585935452" PART_ENTRY_DISK="8:0"
[2022-01-14 21:27:55,628][ceph_volume.process][INFO  ] Running command: /usr/sbin/pvs --noheadings --readonly --units=b --nosuffix --separator=";" -o vg_name,pv_count,lv_count,vg_attr,vg_extent_count,vg_free_count,vg_extent_size /dev/sda1
[2022-01-14 21:27:55,728][ceph_volume.process][INFO  ] stderr Failed to find physical volume "/dev/sda1".
[2022-01-14 21:27:55,729][ceph_volume.util.disk][INFO  ] opening device /dev/sda1 to check for BlueStore label
[2022-01-14 21:27:55,729][ceph_volume.util.disk][INFO  ] opening device /dev/sda to check for BlueStore label
[2022-01-14 21:27:55,729][ceph_volume.util.disk][INFO  ] opening device /dev/sda1 to check for BlueStore label
[2022-01-14 21:27:55,729][ceph_volume.util.disk][INFO  ] opening device /dev/sda to check for BlueStore label
[2022-01-14 21:27:55,729][ceph_volume.process][INFO  ] Running command: /usr/sbin/udevadm info --query=property /dev/sda1
[2022-01-14 21:27:55,736][ceph_volume.process][INFO  ] stdout DEVNAME=/dev/sda1
[2022-01-14 21:27:55,736][ceph_volume.process][INFO  ] stdout DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:03:00.0/host11/target11:0:0/11:0:0:0/block/sda/sda1
[2022-01-14 21:27:55,736][ceph_volume.process][INFO  ] stdout DEVTYPE=partition
[2022-01-14 21:27:55,736][ceph_volume.process][INFO  ] stdout MAJOR=8
[2022-01-14 21:27:55,736][ceph_volume.process][INFO  ] stdout MINOR=1
[2022-01-14 21:27:55,736][ceph_volume.process][INFO  ] stdout PARTN=1
[2022-01-14 21:27:55,736][ceph_volume.process][INFO  ] stdout SUBSYSTEM=block

@ghost ghost changed the title Do NOT install Rook in a Kind cluster (it'll destroy your volumes) Do NOT install Rook in a Kind/k3os cluster(s) (it'll destroy your volumes) Jan 14, 2022
@leseb
Copy link
Member

leseb commented Jan 17, 2022

@crdnl thanks but this isn't the full output, is it?

@ozzieba
Copy link

ozzieba commented Jan 19, 2022

Just ran into this as well on k3os, will try to provide a log after I reinstall the affected hosts; a couple hosts were unaffected, probably because they used GPT.
Some preliminary analysis:

  • rook is calling ceph-volume, which checks for locked volumes by trying to open the device in read-write/exclusive mode
  • k3os does some strange things to mount its volumes. The root volume is mounted with --make-rprivate followed by pivot_root; /boot and /k3os/system mount parts of the same volume using bind mounts
  • I am able to mount my device read-write on a different directory after it has been mounted by k3os, not sure how to open it in exclusive mode without hosing that drive as well (note k3os has very few tools, eg no Python; I should be able to use crictl to try things, but figured I'd report what I've seen so far)

@leseb
Copy link
Member

leseb commented Jan 24, 2022

Just ran into this as well on k3os, will try to provide a log after I reinstall the affected hosts; a couple hosts were unaffected, probably because they used GPT. Some preliminary analysis:

  • rook is calling ceph-volume, which checks for locked volumes by trying to open the device in read-write/exclusive mode
  • k3os does some strange things to mount its volumes. The root volume is mounted with --make-rprivate followed by pivot_root; /boot and /k3os/system mount parts of the same volume using bind mounts
  • I am able to mount my device read-write on a different directory after it has been mounted by k3os, not sure how to open it in exclusive mode without hosing that drive as well (note k3os has very few tools, eg no Python; I should be able to use crictl to try things, but figured I'd report what I've seen so far)

Thanks for the report. If we don't even have Python it will be hard to simulate an O_EXCL on a device. But honestly what we need is a DEBUG log from the ceph-volume inventory, to really understand why the device is marked as available.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions
Copy link

github-actions bot commented Apr 1, 2022

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@github-actions github-actions bot closed this as completed Apr 1, 2022
@mateuszdrab
Copy link

This definitely shouldn't be closed

@travisn
Copy link
Member

travisn commented Apr 11, 2022

We really need to find more checks to ensure unintended devices aren't picked up...

@travisn travisn reopened this Apr 11, 2022
@galexrt
Copy link
Member

galexrt commented Apr 11, 2022

As a quick "fix" to improve the docs, we could look into adding a red warning box to the Rook Ceph prerequisites and quickstart doc pages.
The red warning box would look like this (screenshot contains the "you are on the latest docs" warning box):
image
We can change the color as well.
What do you think regarding the warning box?

Still as @travisn mentioned in a quick chat, we should find out why ceph-volume thinks the volumes are not in use even though they are in use.
From the comments in this thread so far, this feels like somewhere in the "volumes" being mounted inside the container of Kind/k3os that the info is "lost" / "not correctly accessible, making it seem the volumes/disks are unused.

@travisn
Copy link
Member

travisn commented Apr 11, 2022

Since this issue was opened, we added a sentence in bold text to the opening section of the Quickstart guide. I wonder if it would be sufficient just to add a "WARNING:" prefix to make it more obvious, or move it to be the first thing in the prerequisites section.

@ghost
Copy link
Author

ghost commented Apr 11, 2022

I think this error occurs when you use a tool to play with rook that isnt minikube; it may be better to just make it abundantly clear to only use minikube and that usage of any other tool is at the users own risk and then to reference this bug.

@fzyzcjy
Copy link

fzyzcjy commented Apr 25, 2022

Hi, is there any updates? This sounds like a big problem.

@travisn
Copy link
Member

travisn commented Apr 25, 2022

Hi, is there any updates? This sounds like a big problem.

Would the clarification in the docs suggested here be sufficient from your view?

@fzyzcjy
Copy link

fzyzcjy commented Apr 25, 2022

@travisn Well, not really... Without reading this issue, I almost forgot that sentence (because you know we have to read a lot of documentation and compare multiple storage engine etc in a short time). If it is in a big red box it may be better - but that may make people think Rook very dangerous.

@fzyzcjy
Copy link

fzyzcjy commented Apr 25, 2022

Thus, IMHO this bug should be fixed (and of high priority - it is quite dangerous). It can be either fixed in the Rook level, or fixed by adding some protection in Helm chart, etc.

@travisn
Copy link
Member

travisn commented Apr 26, 2022

Thus, IMHO this bug should be fixed (and of high priority - it is quite dangerous). It can be either fixed in the Rook level, or fixed by adding some protection in Helm chart, etc.

Fixing it of course is desired, but detecting when it's an invalid environment is the challenge. Do you have a suggestion on how rook can detect when to prevent installation? Or we need to gather more logging as mentioned here.

@fzyzcjy
Copy link

fzyzcjy commented Apr 27, 2022

@travisn I am not an expert here. But I guess Rook should not destroy volumes that originally have data inside it?

@travisn
Copy link
Member

travisn commented Apr 27, 2022

@travisn I am not an expert here. But I guess Rook should not destroy volumes that originally have data inside it?

Rook does check for existing filesystems or partitions on a device, and will not allow using such devices. So the remaining question is how we can detect other cases of disks that should be skipped where those checks are not sufficient.

@fzyzcjy
Copy link

fzyzcjy commented Apr 27, 2022

@travisn Well I am a bit confused. This bug report seems to mention the files are destroyed. So that is existing filesystem or partitions, so Rook should not allow using them.

@galexrt
Copy link
Member

galexrt commented Apr 27, 2022

@fzyzcjy I'll try to clear up the confusion, the problem seems to be when using Kind/k3os (and similar environment where the host's disks can be seen from the node), the disks look like they are "empty" for some reason and are therefore used by Rook Ceph as OSDs, e.g., #9470 (comment).

@leseb To get the ceph-volume debug logs, do I just need to set the ROOK_LOG_LEVEL: DEBUG on the operator or anything else?

@leseb
Copy link
Member

leseb commented Apr 27, 2022

@fzyzcjy I'll try to clear up the confusion, the problem seems to be when using Kind/k3os (and similar environment where the host's disks can be seen from the node), the disks look like they are "empty" for some reason and are therefore used by Rook Ceph as OSDs, e.g., #9470 (comment).

@leseb To get the ceph-volume debug logs, do I just need to set the ROOK_LOG_LEVEL: DEBUG on the operator or anything else?

Whereever the CLI runs ROOK_LOG_LEVEL: DEBUG is needed to get DEBUG level logs.

@spdfnet
Copy link
Contributor

spdfnet commented May 1, 2022

Rook has always ignored existing ext4 filesystems while using 'useAllDevices: true'.

@galexrt
Copy link
Member

galexrt commented May 6, 2022

So... the good news I was able to reproduce the issue on first try, the bad news I need to reinstall my laptop and wasn't able to save the logs...

I'll probably have the ceph-volume debug logs from the rook-ceph-osd-prepare Pod in a few more hours available when I have my laptop ready for round 2.

@leseb Are you available today? I can gladly give you SSH access for you to get more debug info as my laptop seemed quite responsive even with the disk being "re-organized" to being multiple Ceph OSDs.

@leseb
Copy link
Member

leseb commented May 6, 2022

So... the good news I was able to reproduce the issue on first try, the bad news I need to reinstall my laptop and wasn't able to save the logs...

I'll probably have the ceph-volume debug logs from the rook-ceph-osd-prepare Pod in a few more hours available when I have my laptop ready for round 2.

@leseb Are you available today? I can gladly give you SSH access for you to get more debug info as my laptop seemed quite responsive even with the disk being "re-organized" to being multiple Ceph OSDs.

Logs should be enough for now :), please post them here.

@galexrt
Copy link
Member

galexrt commented May 6, 2022

@leseb Here you go:

rook-operator-pod.log
rook-ceph-osd-prepare-pod.log

Quick look from my side shows that ceph osd prepare reports the disk as empty (no partitions).

rook-discover correctly on the other detects that nvme0n1 has partitions:

kubectl logs -n rook-ceph -f rook-discover-wjl5h                                                                                                                                                                                  ✔  kind-kind/rook-ceph ⎈  atrost@alexander-21a0007pge
2022-05-06 13:42:43.212580 I | rookcmd: starting Rook v1.9.2 with arguments '/usr/local/bin/rook discover --discover-interval 60m --use-ceph-volume'
2022-05-06 13:42:43.212652 I | rookcmd: flag values: --discover-interval=1h0m0s, --help=false, --log-level=INFO, --operator-image=, --service-account=, --use-ceph-volume=true
2022-05-06 13:42:43.213225 I | rook-discover: updating device configmap
2022-05-06 13:42:43.216638 E | sys: failed to execute lsblk. output: .
2022-05-06 13:42:43.216681 W | inventory: skipping device "loop0". exit status 32
2022-05-06 13:42:43.218013 E | sys: failed to execute lsblk. output: .
2022-05-06 13:42:43.218032 W | inventory: skipping device "loop1". exit status 32
2022-05-06 13:42:43.219239 E | sys: failed to execute lsblk. output: .
2022-05-06 13:42:43.219256 W | inventory: skipping device "loop2". exit status 32
2022-05-06 13:42:43.220562 E | sys: failed to execute lsblk. output: .
2022-05-06 13:42:43.220582 W | inventory: skipping device "loop3". exit status 32
2022-05-06 13:42:43.221820 E | sys: failed to execute lsblk. output: .
2022-05-06 13:42:43.221839 W | inventory: skipping device "loop4". exit status 32
2022-05-06 13:42:43.223266 E | sys: failed to execute lsblk. output: .
2022-05-06 13:42:43.223284 W | inventory: skipping device "loop5". exit status 32
2022-05-06 13:42:43.224544 E | sys: failed to execute lsblk. output: .
2022-05-06 13:42:43.224562 W | inventory: skipping device "loop6". exit status 32
2022-05-06 13:42:43.225716 E | sys: failed to execute lsblk. output: .
2022-05-06 13:42:43.225734 W | inventory: skipping device "loop7". exit status 32
2022-05-06 13:42:43.234470 I | inventory: skipping device "nvme0n1" because it has child, considering the child instead.
2022-05-06 13:42:43.243723 I | rook-discover: localdevices: "nvme0n1p1, nvme0n1p2"
2022-05-06 13:42:43.243741 I | rook-discover: Getting ceph-volume inventory information
2022-05-06 13:42:44.797669 I | rook-discover: available devices: []
2022-05-06 13:42:44.812450 I | rook-discover: using the regular expressions ["(?i)dm-[0-9]+" "(?i)rbd[0-9]+" "(?i)nbd[0-9]+"]

@galexrt
Copy link
Member

galexrt commented May 6, 2022

Added the ceph-volume.log file from the /var/lib/rook dir: ceph-volume-node-log.log The log file got truncated adding it in a new comment.

@galexrt
Copy link
Member

galexrt commented May 6, 2022

@leseb
Copy link
Member

leseb commented May 9, 2022

The issue is still in ceph-volume not detecting the device property correctly.

It looks like this will be fixed by ceph/ceph@9f4b830. There is a pending backport so the next Pacific/Quincy should get the fix.

Also, Rook should have been capable of handling this too, I found a bug so I'm fixing it too.

leseb added a commit to leseb/rook that referenced this issue May 9, 2022
When detecting the device property the filesystem was not passed but
used later to validate if the device should be taken or not. Now we read
the filesystem info from "lsblk" and populate it in the device type.

Closes: rook#9470
Signed-off-by: Sébastien Han <seb@redhat.com>
@leseb leseb added this to To do in v1.9 via automation May 9, 2022
leseb added a commit to leseb/rook that referenced this issue May 9, 2022
When detecting the device property the filesystem was not passed but
used later to validate if the device should be taken or not. Now we read
the filesystem info from "lsblk" and populate it in the device type.

Closes: rook#9470
Signed-off-by: Sébastien Han <seb@redhat.com>
mergify bot pushed a commit that referenced this issue May 10, 2022
When detecting the device property the filesystem was not passed but
used later to validate if the device should be taken or not. Now we read
the filesystem info from "lsblk" and populate it in the device type.

Closes: #9470
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 8c1bf0f)
mergify bot pushed a commit that referenced this issue May 10, 2022
When detecting the device property the filesystem was not passed but
used later to validate if the device should be taken or not. Now we read
the filesystem info from "lsblk" and populate it in the device type.

Closes: #9470
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 8c1bf0f)
guits pushed a commit that referenced this issue May 10, 2022
When detecting the device property the filesystem was not passed but
used later to validate if the device should be taken or not. Now we read
the filesystem info from "lsblk" and populate it in the device type.

Closes: #9470
Signed-off-by: Sébastien Han <seb@redhat.com>
(cherry picked from commit 8c1bf0f)
@travisn travisn moved this from To do to Done in v1.9 May 11, 2022
@alimaredia alimaredia unpinned this issue Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
v1.9
Done
Development

Successfully merging a pull request may close this issue.

10 participants