Do NOT install Rook in a Kind/k3os cluster(s) (it'll destroy your volumes) #9470

ghost · 2021-12-19T01:24:49Z

I was testing out rook in a kind cluster on my Linux laptop. I setup the CRDs and installed cluster-test.yaml last night. Today, I noticed that my drives started to behave weird (couldnt decrypt anymore, couldnt mount). I decided to reboot. My root volume and my other volume were converted into ceph_bluestore volumes, and therefore unmountable. Literally destroyed over 3TB of data.

A wild guess is that since my user is part of the docker group, the underlying system seen these volumes and decided to convert them all to ceph volumes. This is either a major bug or feature.

I have had to re-install my OS and restore a volume from backup due to this.

$ sudo cryptsetup luksOpen /dev/sda1 vol
[sudo] password for x:         
Device /dev/sda1 is not a valid LUKS device.

$ sudo lsblk -f
NAME                FSTYPE         LABEL UUID                                   FSAVAIL FSUSE% MOUNTPOINT
sda                                                                                            
└─sda1              ceph_bluestore

Probably here: https://github.com/rook/rook/blob/master/deploy/examples/cluster-test.yaml#L43-L45

  storage:
    useAllNodes: true
    useAllDevices: true # <- this is probably not safe for kind test clusters

The text was updated successfully, but these errors were encountered:

leseb · 2021-12-20T08:47:50Z

Hi @withernet, sorry about this, indeed we had similar reports in the past with Kind. Our dev env guide reflects this https://rook.io/docs/rook/v1.8/development-environment.html#minikube.
We would like to avoid such issues as much as possible and always encourage users to use virtual machines instead of testing directly on their machines.

By any chance, do you have the logs from the prepare job? So that we could try to understand why the root disk was picked?
Thanks and glad to see you had a backup.

ghost · 2021-12-20T18:50:51Z

Unfortunately I do not. I do remember seeing in the web UI that there was 3.9TB available and I thought that was weird. But... both the volumes I had added up to about 3.9TB.

I also want to say that it was all attached disks. It reformatted my NVMe root volume and my USB drive.

leseb · 2021-12-21T08:47:15Z

Unfortunately I do not. I do remember seeing in the web UI that there was 3.9TB available and I thought that was weird. But... both the volumes I had added up to about 3.9TB.

Is Kind aggregating the storage and presenting it differently somewhat?

I also want to say that it was all attached disks. It reformatted my NVMe root volume and my USB drive.

Yes, that makes sense with useAllDevices: true.

longtian · 2022-01-07T02:34:46Z

😿 same problem last year on my desktop. I lost all ext4 drivers while NTFS drivers remain untouched

ghost · 2022-01-07T02:35:30Z

Finally, independent confirmation ;)

preeefix · 2022-01-10T07:36:05Z

Leaving my mark here.

Environment: Bare Metal + k3os
Spun up a new instance, applied MetalLB, and

kubectl apply -f crds.yaml
kubectl apply -f common.yaml
kubectl apply -f operator.yaml
kubectl apply -f cluster-test.yaml

Went to go grab a drink from the fridge and chat with the fiancee. Came back to the console (iDRAC) spitting errors that k3s couldn't log to /var/log/.

Rebooted and got the grub screen.

@withernet Can you modify the title a bit to be specific to useAllDevices: true ? Seems like this isn't kind specific, but rather the prepare job isn't happy with the docker passthrough of the devices.

@leseb Mind if I make a PR to change the cluster-test.yaml to have something like an example device selector and update the documentation accordingly?

subhamkrai · 2022-01-10T08:20:29Z

Leaving my mark here.

Environment: Bare Metal + k3os Spun up a new instance, applied MetalLB, and
kubectl apply -f crds.yaml
kubectl apply -f common.yaml
kubectl apply -f operator.yaml
kubectl apply -f cluster-test.yaml
Went to go grab a drink from the fridge and chat with the fiancee. Came back to the console (iDRAC) spitting errors that k3s couldn't log to /var/log/.

Rebooted and got the grub screen.

@withernet Can you modify the title a bit to be specific to useAllDevices: true ? Seems like this isn't kind specific, but rather the prepare job isn't happy with the docker passthrough of the devices.

@leseb Mind if I make a PR to change the cluster-test.yaml to have something like an example device selector and update the documentation accordingly?

@crdnl sorry to hear that.
Regarding adding deviceSelector, we have field deviceSelector which accept regex. See this. Did you look at that?

preeefix · 2022-01-10T08:40:12Z

@subhamkrai

@crdnl sorry to hear that.
Regarding adding deviceSelector, we have field deviceSelector which accept regex. See this. Did you look at that?

Thankfully, it was a dev cluster so there wasn't anything on it (yet). I did add a devices block to my nodes config.

I was more-so referencing modifying the provided sample file (in deploy/examples/) to not use useAllDevices: true by default, but set it to something sane like adding a default device selector.

As for why it happened, I did some digging through my prepare logs on a new cluster, and paid attention to the preparer this time. Logs are at https://gist.github.com/crdnl/f63bc578e5108f72f4bfa0b405564368

On line #53, it does note that the device sda has a child, sda1 and considers it instead. Following that, on L178, it does indeed mark it as available and if I didn't put a device selector on, it would've consumed it (again).

By tweaking the toolbox container (setting privileged, mounting /dev, and setting runAsUser: 0): I got access to the ceph tooling and running ceph-volume inventory --format json-pretty, I do get available:

{
    "available": true,
    "device_id": "",
    "lsm_data": {},
    "lvs": [],
    "path": "/dev/sda1",
    "rejected_reasons": [],
    "sys_api": {
        "holders": [],
        "human_readable_size": "279.40 GB",
        "sectors": "585935452",
        "sectorsize": 512,
        "size": 299998951424.0,
        "start": "2048"
    }
}

To add on, the output of lsblk for /dev/sda:

ouroboros [~]$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop1    7:1    0  49.3M  1 loop /usr
loop2    7:2    0 312.4M  0 loop /usr/src
                                 /usr/lib/firmware
                                 /usr/lib/modules
sda      8:0    0 279.4G  0 disk
└─sda1   8:1    0 279.4G  0 part /k3os/system
                                 /boot
                                 /
sdb      8:16   0 279.4G  0 disk
<snip>

TL; DR;
My educated guess is that because the "root" of the container isn't actually mounted on /dev/sda or /dev/sda1, ceph-volume inventory isn't flagging it as locked like it probably should.

leseb · 2022-01-12T13:59:12Z

@crdnl Thanks for the detailed analysis. It's surprising that even if /dev/sda1 is mounted it appears as available. If you still have access to this reproducer. Could you also add the CEPH_VOLUME_DEBUG env var, run ceph-volume again and collect the ceph-volume.log content? That will be really useful to understand why this device is made available.

preeefix · 2022-01-14T21:30:36Z

@leseb Just ran CEPH_VOLUME_DEBUG=1 ceph-volume inventory /dev/sda1

[2022-01-14 21:27:55,502][ceph_volume.main][INFO  ] Running command: ceph-volume  inventory /dev/sda1
[2022-01-14 21:27:55,503][ceph_volume.process][INFO  ] Running command: /usr/bin/lsblk -plno KNAME,NAME,TYPE
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/loop1 /dev/loop1                                                                                                      loop
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/loop2 /dev/loop2                                                                                                      loop
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sda   /dev/sda                                                                                                        disk
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sda1  /dev/sda1                                                                                                       part
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sdb   /dev/sdb                                                                                                        disk
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sdc   /dev/sdc                                                                                                        disk
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sdd   /dev/sdd                                                                                                        disk
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sde   /dev/sde                                                                                                        disk
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sdf   /dev/sdf                                                                                                        disk
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sdg   /dev/sdg                                                                                                        disk
[2022-01-14 21:27:55,508][ceph_volume.process][INFO  ] stdout /dev/sdh   /dev/sdh                                                                                                        disk
[2022-01-14 21:27:55,509][ceph_volume.process][INFO  ] stdout /dev/sr0   /dev/sr0                                                                                                        rom
[2022-01-14 21:27:55,509][ceph_volume.process][INFO  ] stdout /dev/dm-0  /dev/mapper/ceph--f47c2c56--e8b1--43a9--8ef3--b7dbc48a939a-osd--block--89b9a18f--11f2--47bd--8b34--8fdb83a82242 lvm
[2022-01-14 21:27:55,509][ceph_volume.process][INFO  ] stdout /dev/dm-1  /dev/mapper/ceph--d4c98bae--148e--41d5--9e21--a0d0cf74024d-osd--block--e1aa46ad--8146--4118--8620--1966f7c9f61b lvm
[2022-01-14 21:27:55,509][ceph_volume.process][INFO  ] stdout /dev/dm-2  /dev/mapper/ceph--98f8cccf--0e83--489d--a138--9fd056f75e85-osd--block--629ae4b6--c966--4e68--804f--0f70acbfcf1d lvm
[2022-01-14 21:27:55,509][ceph_volume.process][INFO  ] stdout /dev/dm-3  /dev/mapper/ceph--47294791--355b--437e--aaa2--2d1d216b0db2-osd--block--237c1c3d--a0de--428c--a2ed--59d288cad08b lvm
[2022-01-14 21:27:55,509][ceph_volume.process][INFO  ] stdout /dev/dm-4  /dev/mapper/ceph--1797c0da--6c89--4409--a253--e9226df1bef9-osd--block--02062c5f--c8f0--43f4--a20e--bb7ed15e9899 lvm
[2022-01-14 21:27:55,509][ceph_volume.process][INFO  ] stdout /dev/dm-5  /dev/mapper/ceph--470f8894--67e8--43f4--a86e--697435a70af0-osd--block--ba3a5eb2--96a1--42e7--82f3--90a5da18cc90 lvm
[2022-01-14 21:27:55,509][ceph_volume.process][INFO  ] stdout /dev/dm-6  /dev/mapper/ceph--2aa44172--c97a--4e7c--ad91--5b25c6572980-osd--block--d90e0056--a7b2--4f22--a248--06292eb541e1 lvm
[2022-01-14 21:27:55,515][ceph_volume.process][INFO  ] Running command: /usr/sbin/lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S lv_path=/dev/sda1 -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2022-01-14 21:27:55,621][ceph_volume.process][INFO  ] Running command: /usr/bin/lsblk --nodeps -P -o NAME,KNAME,MAJ:MIN,FSTYPE,MOUNTPOINT,LABEL,UUID,RO,RM,MODEL,SIZE,STATE,OWNER,GROUP,MODE,ALIGNMENT,PHY-SEC,LOG-SEC,ROTA,SCHED,TYPE,DISC-ALN,DISC-GRAN,DISC-MAX,DISC-ZERO,PKNAME,PARTLABEL /dev/sda1
[2022-01-14 21:27:55,625][ceph_volume.process][INFO  ] stdout NAME="sda1" KNAME="sda1" MAJ:MIN="8:1" FSTYPE="" MOUNTPOINT="/etc/resolv.conf" LABEL="" UUID="" RO="0" RM="0" MODEL="" SIZE="279.4G" STATE="" OWNER="root" GROUP="disk" MODE="brw-rw----" ALIGNMENT="0" PHY-SEC="512" LOG-SEC="512" ROTA="1" SCHED="mq-deadline" TYPE="part" DISC-ALN="0" DISC-GRAN="0B" DISC-MAX="0B" DISC-ZERO="0" PKNAME="sda" PARTLABEL=""
[2022-01-14 21:27:55,625][ceph_volume.process][INFO  ] Running command: /usr/sbin/blkid -c /dev/null -p /dev/sda1
[2022-01-14 21:27:55,627][ceph_volume.process][INFO  ] stdout /dev/sda1: LABEL="K3OS_STATE" UUID="5a9e85be-290b-4d7d-9fac-98209739e3d5" VERSION="1.0" BLOCK_SIZE="4096" TYPE="ext4" USAGE="filesystem" PART_ENTRY_SCHEME="dos" PART_ENTRY_UUID="12113f49-01" PART_ENTRY_TYPE="0x83" PART_ENTRY_FLAGS="0x80" PART_ENTRY_NUMBER="1" PART_ENTRY_OFFSET="2048" PART_ENTRY_SIZE="585935452" PART_ENTRY_DISK="8:0"
[2022-01-14 21:27:55,628][ceph_volume.process][INFO  ] Running command: /usr/sbin/pvs --noheadings --readonly --units=b --nosuffix --separator=";" -o vg_name,pv_count,lv_count,vg_attr,vg_extent_count,vg_free_count,vg_extent_size /dev/sda1
[2022-01-14 21:27:55,728][ceph_volume.process][INFO  ] stderr Failed to find physical volume "/dev/sda1".
[2022-01-14 21:27:55,729][ceph_volume.util.disk][INFO  ] opening device /dev/sda1 to check for BlueStore label
[2022-01-14 21:27:55,729][ceph_volume.util.disk][INFO  ] opening device /dev/sda to check for BlueStore label
[2022-01-14 21:27:55,729][ceph_volume.util.disk][INFO  ] opening device /dev/sda1 to check for BlueStore label
[2022-01-14 21:27:55,729][ceph_volume.util.disk][INFO  ] opening device /dev/sda to check for BlueStore label
[2022-01-14 21:27:55,729][ceph_volume.process][INFO  ] Running command: /usr/sbin/udevadm info --query=property /dev/sda1
[2022-01-14 21:27:55,736][ceph_volume.process][INFO  ] stdout DEVNAME=/dev/sda1
[2022-01-14 21:27:55,736][ceph_volume.process][INFO  ] stdout DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:03:00.0/host11/target11:0:0/11:0:0:0/block/sda/sda1
[2022-01-14 21:27:55,736][ceph_volume.process][INFO  ] stdout DEVTYPE=partition
[2022-01-14 21:27:55,736][ceph_volume.process][INFO  ] stdout MAJOR=8
[2022-01-14 21:27:55,736][ceph_volume.process][INFO  ] stdout MINOR=1
[2022-01-14 21:27:55,736][ceph_volume.process][INFO  ] stdout PARTN=1
[2022-01-14 21:27:55,736][ceph_volume.process][INFO  ] stdout SUBSYSTEM=block

leseb · 2022-01-17T09:56:55Z

@crdnl thanks but this isn't the full output, is it?

ozzieba · 2022-01-19T18:44:17Z

Just ran into this as well on k3os, will try to provide a log after I reinstall the affected hosts; a couple hosts were unaffected, probably because they used GPT.
Some preliminary analysis:

rook is calling ceph-volume, which checks for locked volumes by trying to open the device in read-write/exclusive mode
k3os does some strange things to mount its volumes. The root volume is mounted with --make-rprivate followed by pivot_root; /boot and /k3os/system mount parts of the same volume using bind mounts
I am able to mount my device read-write on a different directory after it has been mounted by k3os, not sure how to open it in exclusive mode without hosing that drive as well (note k3os has very few tools, eg no Python; I should be able to use crictl to try things, but figured I'd report what I've seen so far)

leseb · 2022-01-24T14:06:33Z

Just ran into this as well on k3os, will try to provide a log after I reinstall the affected hosts; a couple hosts were unaffected, probably because they used GPT. Some preliminary analysis:

rook is calling ceph-volume, which checks for locked volumes by trying to open the device in read-write/exclusive mode

k3os does some strange things to mount its volumes. The root volume is mounted with --make-rprivate followed by pivot_root; /boot and /k3os/system mount parts of the same volume using bind mounts

I am able to mount my device read-write on a different directory after it has been mounted by k3os, not sure how to open it in exclusive mode without hosing that drive as well (note k3os has very few tools, eg no Python; I should be able to use crictl to try things, but figured I'd report what I've seen so far)

Thanks for the report. If we don't even have Python it will be hard to simulate an O_EXCL on a device. But honestly what we need is a DEBUG log from the ceph-volume inventory, to really understand why the device is marked as available.

github-actions · 2022-03-25T20:02:08Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

github-actions · 2022-04-01T20:02:21Z

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

mateuszdrab · 2022-04-08T23:52:54Z

This definitely shouldn't be closed

travisn · 2022-04-11T17:26:29Z

We really need to find more checks to ensure unintended devices aren't picked up...

galexrt · 2022-04-11T20:28:50Z

As a quick "fix" to improve the docs, we could look into adding a red warning box to the Rook Ceph prerequisites and quickstart doc pages.
The red warning box would look like this (screenshot contains the "you are on the latest docs" warning box):

We can change the color as well.
What do you think regarding the warning box?

Still as @travisn mentioned in a quick chat, we should find out why ceph-volume thinks the volumes are not in use even though they are in use.
From the comments in this thread so far, this feels like somewhere in the "volumes" being mounted inside the container of Kind/k3os that the info is "lost" / "not correctly accessible, making it seem the volumes/disks are unused.

travisn · 2022-04-11T21:32:22Z

Since this issue was opened, we added a sentence in bold text to the opening section of the Quickstart guide. I wonder if it would be sufficient just to add a "WARNING:" prefix to make it more obvious, or move it to be the first thing in the prerequisites section.

ghost · 2022-04-11T22:11:34Z

I think this error occurs when you use a tool to play with rook that isnt minikube; it may be better to just make it abundantly clear to only use minikube and that usage of any other tool is at the users own risk and then to reference this bug.

fzyzcjy · 2022-04-25T12:06:14Z

Hi, is there any updates? This sounds like a big problem.

travisn · 2022-04-25T17:54:14Z

Hi, is there any updates? This sounds like a big problem.

Would the clarification in the docs suggested here be sufficient from your view?

fzyzcjy · 2022-04-25T23:44:42Z

@travisn Well, not really... Without reading this issue, I almost forgot that sentence (because you know we have to read a lot of documentation and compare multiple storage engine etc in a short time). If it is in a big red box it may be better - but that may make people think Rook very dangerous.

fzyzcjy · 2022-04-25T23:46:14Z

Thus, IMHO this bug should be fixed (and of high priority - it is quite dangerous). It can be either fixed in the Rook level, or fixed by adding some protection in Helm chart, etc.

travisn · 2022-04-26T21:11:37Z

Thus, IMHO this bug should be fixed (and of high priority - it is quite dangerous). It can be either fixed in the Rook level, or fixed by adding some protection in Helm chart, etc.

Fixing it of course is desired, but detecting when it's an invalid environment is the challenge. Do you have a suggestion on how rook can detect when to prevent installation? Or we need to gather more logging as mentioned here.

fzyzcjy · 2022-04-27T00:49:31Z

@travisn I am not an expert here. But I guess Rook should not destroy volumes that originally have data inside it?

travisn · 2022-04-27T14:00:51Z

@travisn I am not an expert here. But I guess Rook should not destroy volumes that originally have data inside it?

Rook does check for existing filesystems or partitions on a device, and will not allow using such devices. So the remaining question is how we can detect other cases of disks that should be skipped where those checks are not sufficient.

fzyzcjy · 2022-04-27T14:40:05Z

@travisn Well I am a bit confused. This bug report seems to mention the files are destroyed. So that is existing filesystem or partitions, so Rook should not allow using them.

galexrt · 2022-04-27T14:59:41Z

@fzyzcjy I'll try to clear up the confusion, the problem seems to be when using Kind/k3os (and similar environment where the host's disks can be seen from the node), the disks look like they are "empty" for some reason and are therefore used by Rook Ceph as OSDs, e.g., #9470 (comment).

@leseb To get the ceph-volume debug logs, do I just need to set the ROOK_LOG_LEVEL: DEBUG on the operator or anything else?

leseb · 2022-04-27T15:01:01Z

@fzyzcjy I'll try to clear up the confusion, the problem seems to be when using Kind/k3os (and similar environment where the host's disks can be seen from the node), the disks look like they are "empty" for some reason and are therefore used by Rook Ceph as OSDs, e.g., #9470 (comment).

@leseb To get the ceph-volume debug logs, do I just need to set the ROOK_LOG_LEVEL: DEBUG on the operator or anything else?

Whereever the CLI runs ROOK_LOG_LEVEL: DEBUG is needed to get DEBUG level logs.

spdfnet · 2022-05-01T08:46:48Z

Rook has always ignored existing ext4 filesystems while using 'useAllDevices: true'.

galexrt · 2022-05-06T12:06:40Z

So... the good news I was able to reproduce the issue on first try, the bad news I need to reinstall my laptop and wasn't able to save the logs...

I'll probably have the ceph-volume debug logs from the rook-ceph-osd-prepare Pod in a few more hours available when I have my laptop ready for round 2.

@leseb Are you available today? I can gladly give you SSH access for you to get more debug info as my laptop seemed quite responsive even with the disk being "re-organized" to being multiple Ceph OSDs.

leseb · 2022-05-06T12:15:07Z

So... the good news I was able to reproduce the issue on first try, the bad news I need to reinstall my laptop and wasn't able to save the logs...

I'll probably have the ceph-volume debug logs from the rook-ceph-osd-prepare Pod in a few more hours available when I have my laptop ready for round 2.

@leseb Are you available today? I can gladly give you SSH access for you to get more debug info as my laptop seemed quite responsive even with the disk being "re-organized" to being multiple Ceph OSDs.

Logs should be enough for now :), please post them here.

galexrt · 2022-05-06T13:50:11Z

@leseb Here you go:

rook-operator-pod.log
rook-ceph-osd-prepare-pod.log

Quick look from my side shows that ceph osd prepare reports the disk as empty (no partitions).

rook-discover correctly on the other detects that nvme0n1 has partitions:

kubectl logs -n rook-ceph -f rook-discover-wjl5h                                                                                                                                                                                  ✔  kind-kind/rook-ceph ⎈  atrost@alexander-21a0007pge
2022-05-06 13:42:43.212580 I | rookcmd: starting Rook v1.9.2 with arguments '/usr/local/bin/rook discover --discover-interval 60m --use-ceph-volume'
2022-05-06 13:42:43.212652 I | rookcmd: flag values: --discover-interval=1h0m0s, --help=false, --log-level=INFO, --operator-image=, --service-account=, --use-ceph-volume=true
2022-05-06 13:42:43.213225 I | rook-discover: updating device configmap
2022-05-06 13:42:43.216638 E | sys: failed to execute lsblk. output: .
2022-05-06 13:42:43.216681 W | inventory: skipping device "loop0". exit status 32
2022-05-06 13:42:43.218013 E | sys: failed to execute lsblk. output: .
2022-05-06 13:42:43.218032 W | inventory: skipping device "loop1". exit status 32
2022-05-06 13:42:43.219239 E | sys: failed to execute lsblk. output: .
2022-05-06 13:42:43.219256 W | inventory: skipping device "loop2". exit status 32
2022-05-06 13:42:43.220562 E | sys: failed to execute lsblk. output: .
2022-05-06 13:42:43.220582 W | inventory: skipping device "loop3". exit status 32
2022-05-06 13:42:43.221820 E | sys: failed to execute lsblk. output: .
2022-05-06 13:42:43.221839 W | inventory: skipping device "loop4". exit status 32
2022-05-06 13:42:43.223266 E | sys: failed to execute lsblk. output: .
2022-05-06 13:42:43.223284 W | inventory: skipping device "loop5". exit status 32
2022-05-06 13:42:43.224544 E | sys: failed to execute lsblk. output: .
2022-05-06 13:42:43.224562 W | inventory: skipping device "loop6". exit status 32
2022-05-06 13:42:43.225716 E | sys: failed to execute lsblk. output: .
2022-05-06 13:42:43.225734 W | inventory: skipping device "loop7". exit status 32
2022-05-06 13:42:43.234470 I | inventory: skipping device "nvme0n1" because it has child, considering the child instead.
2022-05-06 13:42:43.243723 I | rook-discover: localdevices: "nvme0n1p1, nvme0n1p2"
2022-05-06 13:42:43.243741 I | rook-discover: Getting ceph-volume inventory information
2022-05-06 13:42:44.797669 I | rook-discover: available devices: []
2022-05-06 13:42:44.812450 I | rook-discover: using the regular expressions ["(?i)dm-[0-9]+" "(?i)rbd[0-9]+" "(?i)nbd[0-9]+"]

galexrt · 2022-05-06T14:07:24Z

~~Added the ceph-volume.log file from the /var/lib/rook dir: ceph-volume-node-log.log~~ The log file got truncated adding it in a new comment.

galexrt · 2022-05-06T14:09:10Z

ceph-volume-node-log-full.log

leseb · 2022-05-09T09:59:55Z

The issue is still in ceph-volume not detecting the device property correctly.

It looks like this will be fixed by ceph/ceph@9f4b830. There is a pending backport so the next Pacific/Quincy should get the fix.

Also, Rook should have been capable of handling this too, I found a bug so I'm fixing it too.

When detecting the device property the filesystem was not passed but used later to validate if the device should be taken or not. Now we read the filesystem info from "lsblk" and populate it in the device type. Closes: rook#9470 Signed-off-by: Sébastien Han <seb@redhat.com>

When detecting the device property the filesystem was not passed but used later to validate if the device should be taken or not. Now we read the filesystem info from "lsblk" and populate it in the device type. Closes: #9470 Signed-off-by: Sébastien Han <seb@redhat.com> (cherry picked from commit 8c1bf0f)

ghost added the bug label Dec 19, 2021

leseb pinned this issue Jan 7, 2022

preeefix mentioned this issue Jan 10, 2022

[CAUTION] Rook/Ceph has privileges to wipe your host's disks as it has access to all devices by default k3d-io/k3d#859

Closed

ghost changed the title ~~Do NOT install Rook in a Kind cluster (it'll destroy your volumes)~~ Do NOT install Rook in a Kind/k3os cluster(s) (it'll destroy your volumes) Jan 14, 2022

github-actions bot added the wontfix label Mar 25, 2022

github-actions bot closed this as completed Apr 1, 2022

travisn added reliability and removed wontfix labels Apr 11, 2022

travisn reopened this Apr 11, 2022

leseb mentioned this issue May 9, 2022

core: append filesystem property on disk #10230

Merged

7 tasks

leseb added this to To do in v1.9 via automation May 9, 2022

satoru-takeuchi closed this as completed in #10230 May 10, 2022

travisn moved this from To do to Done in v1.9 May 11, 2022

alimaredia unpinned this issue Jun 3, 2022

dpeckett mentioned this issue Jun 2, 2023

Rook on KinD/K3D will happily erase LVM partitions #12321

Closed

Do NOT install Rook in a Kind/k3os cluster(s) (it'll destroy your volumes) #9470

Do NOT install Rook in a Kind/k3os cluster(s) (it'll destroy your volumes) #9470

Comments

ghost commented Dec 19, 2021 • edited by ghost

leseb commented Dec 20, 2021

ghost commented Dec 20, 2021 • edited by ghost

leseb commented Dec 21, 2021

longtian commented Jan 7, 2022

ghost commented Jan 7, 2022

preeefix commented Jan 10, 2022

subhamkrai commented Jan 10, 2022

preeefix commented Jan 10, 2022

leseb commented Jan 12, 2022

preeefix commented Jan 14, 2022

leseb commented Jan 17, 2022

ozzieba commented Jan 19, 2022

leseb commented Jan 24, 2022

github-actions bot commented Mar 25, 2022

github-actions bot commented Apr 1, 2022

mateuszdrab commented Apr 8, 2022

travisn commented Apr 11, 2022

galexrt commented Apr 11, 2022

travisn commented Apr 11, 2022

ghost commented Apr 11, 2022

fzyzcjy commented Apr 25, 2022

travisn commented Apr 25, 2022

fzyzcjy commented Apr 25, 2022

fzyzcjy commented Apr 25, 2022

travisn commented Apr 26, 2022

fzyzcjy commented Apr 27, 2022

travisn commented Apr 27, 2022

fzyzcjy commented Apr 27, 2022

galexrt commented Apr 27, 2022

leseb commented Apr 27, 2022

spdfnet commented May 1, 2022

galexrt commented May 6, 2022

leseb commented May 6, 2022

galexrt commented May 6, 2022 • edited

galexrt commented May 6, 2022 • edited

galexrt commented May 6, 2022

leseb commented May 9, 2022

ghost commented Dec 19, 2021 •

edited by ghost

ghost commented Dec 20, 2021 •

edited by ghost

galexrt commented May 6, 2022 •

edited

galexrt commented May 6, 2022 •

edited