Certain disks skipped for being too small, when they are large enough. #11474

MrDrMcCoy · 2022-12-22T04:56:55Z

Deviation from expected behavior:
Rook / Ceph believes that a certain disk type used on all my nodes is smaller than 5GB and refuses to use it, despite its actual size being 256GB.

Prepare pod log message: cephosd: skipping device "sda": ["Insufficient space (<5GB)"].

Actual size of device:

root@rock5b-13:~# lsblk /dev/sda
NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda    8:0    1 238.5G  0 disk

Expected behavior:
Rook / Ceph should use this disk.

How to reproduce it (minimal and precise):

Ensure all traces of Rook are removed from Kubernetes and /var/lib/rook

Nuke the disk:

# Remove partition tables
sgdisk --zap-all /dev/sda
# Remove filesystem identifiers
wipefs --all --force /dev/sda
# Zero first 100MB
dd if=/dev/zero bs=1M count=100 oflag=direct of=/dev/sda
# Zero last 100MB
dd bs=512 if=/dev/zero of=/dev/sda oflag=direct count=204800 seek=$(($(blockdev --getsz /dev/sda) - 204800))
# Inform kernel of changes
partprobe /dev/sda

Create Rook operator via Helm
Create Rook cluster via Helm
Observe that this disk type is skipped, while the NVMe drives are properly picked up

File(s) to submit:

Helm values:
rook-ceph-cluster-v1.10.7-values.yml.txt
rook-ceph-v1.10.7-values.yml.txt

Logs to submit:

rook-ceph-osd-prepare-rock5b-13-dz2fd.log
rook-ceph-operator-5bc5659499-ndq4z.log
rook-discover-xr4tw.log

Cluster Status to submit:

status:
  ceph:
    capacity:
      bytesAvailable: 3072491655168
      bytesTotal: 3072628629504
      bytesUsed: 136974336
      lastUpdated: '2022-12-22T04:38:35Z'
    fsid: 491af2f2-017a-4301-853f-c04fc62b9245
    health: HEALTH_OK
    lastChecked: '2022-12-22T04:38:35Z'
    versions:
      mds:
        ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable): 2
      mgr:
        ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable): 2
      mon:
        ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable): 3
      osd:
        ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable): 3
      overall:
        ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable): 10
  conditions:
    - lastHeartbeatTime: '2022-12-22T04:38:37Z'
      lastTransitionTime: '2022-12-22T03:57:33Z'
      message: Cluster created successfully
      reason: ClusterCreated
      status: 'True'
      type: Ready
  message: Cluster created successfully
  observedGeneration: 2
  phase: Ready
  state: Created
  storage:
    deviceClasses:
      - name: nvme
  version:
    image: quay.io/ceph/ceph:v17.2.5
    version: 17.2.5-0

Environment:

OS (e.g. from /etc/os-release): Debian GNU/Linux Bookworm (Sid, custom built Armbian image, needed for certain kernel modules and btrfs rootfs)
Kernel (e.g. uname -a): Linux rock5b-13 5.10.110-rockchip-rk3588-gadfc1747e7fe-dirty #trunk SMP Tue Dec 20 11:19:50 UTC 2022 aarch64 GNU/Linux
Cloud provider or hardware configuration: Radxa Rock5 Model B
Rook version (use rook version inside of a Rook Pod): v1.10.7
Storage backend version (e.g. for ceph do ceph -v): ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
Kubernetes version (use kubectl version): v1.26.0
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Microk8s
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_OK

The text was updated successfully, but these errors were encountered:

satoru-takeuchi · 2022-12-22T09:03:58Z

"rock5b-13" seems not to specify "/dev/sda".

     - name: rock5b-13
        devicePathFilter: (usb|nvme)
        # deviceFilter: (sd[a-z])
        # devices:
        #   - name: /dev/disk/by-path/platform-fe150000.pcie-pci-0000:01:00.0-nvme-1
        #     deviceClass: nvme
          # - name: /dev/disk/by-path/platform-xhci-hcd.10.auto-usb-0:1:1.0-scsi-0:0:0:0
          #   deviceClass: hdd
          # - name: sda # /dev/disk/by-path/platform-xhci-hcd.11.auto-usb-0:1:1.0-scsi-0:0:0:0
            # deviceClass: hdd

Could you add "/dev/sda" to be picked by Rook (e.g. devicePathFilter: (usb|nvme|sda)) and restart the operator pod (it might not be necessary because you enable discovery daemonset)?

If osd prepare pod won't pick up sda, please show me the log of parepare pod again.

Prepare pod log message: cephosd: skipping device "sda": ["Insufficient space (<5GB)"].

This log was shown while processing "deviceFilter: (sd[a-z])" filter and doesn't directly correspond to the detection of "/dev/sda."
In addition, IIRC, there is a Ceph's bug that recognizes some devices as "Insufficient space (<5GB)."

MrDrMcCoy · 2022-12-22T18:37:18Z

According to the documentation, devicePathFilter is a regex that applies to the udev device path. In the commented out devices list that I had previously tried, you can see that the path (/dev/disk/by-path/platform-xhci-hcd.11.auto-usb-0:1:1.0-scsi-0:0:0:0) contains usb (but not sda) and should be matched by this regex. If that is not the case, would you please clarify what devicePathFilter really matches against?

Here are the logs from setting devicePathFilter to (usb|nvme|sd[a-z]):

rook-ceph-operator-5bc5659499-bw4lp.log
rook-ceph-osd-prepare-rock5b-13-c5vwd.log
rook-ceph-cluster-v1.10.7-values.yml.txt

[rook@rook-ceph-tools-544cb74cb8-6bq64 /]$ ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME           STATUS  REWEIGHT  PRI-AFF
-1         2.79446  root default                                 
-5         0.93149      host rock5b-11                           
 1   nvme  0.93149          osd.1           up   1.00000  1.00000
-7         0.93149      host rock5b-12                           
 2   nvme  0.93149          osd.2           up   1.00000  1.00000
-3         0.93149      host rock5b-13                           
 0   nvme  0.93149          osd.0           up   1.00000  1.00000

Additionally, I had attempted this previously with deviceFilter and the explicit devices mappings, each giving me the same results. I had also tried setting devices to sda and the udev path with no difference observed.

In addition, IIRC, there is a Ceph's bug that recognizes some devices as "Insufficient space (<5GB)."

This sounds promising! Would you please direct me to the details of this bug and perhaps suggest a version not affected by it?

satoru-takeuchi · 2022-12-22T23:10:32Z

I read the new logs and found that this behavior is actually odd. sda is one of the a candidate of the osd creation, has enough space, hasn't any fs, and has a valid device type ("disk").

I'll try to reproduce this problem later (might be the next year.)

This sounds promising! Would you please direct me to the details of this bug and perhaps suggest a version not affected by it?

I'll search the ceph issue tracker and will tell you the issue.

nikhiljha · 2022-12-27T00:38:44Z

We're also running into this. Our full configuration: https://github.com/ocf/kubernetes/blob/main/apps/rook.py

2022-12-27 00:14:14.461309 I | cephosd: old lsblk can't detect bluestore signature, so try to detect here
2022-12-27 00:14:14.461864 D | exec: Running command: lsblk /dev/sda --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-12-27 00:14:14.465145 D | sys: lsblk output: "SIZE=\"16000900661248\" ROTA=\"1\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sda\" KNAME=\"/dev/sda\" MOUNTPOINT=\"\" FSTYPE=\"\""
2022-12-27 00:14:14.465162 D | exec: Running command: ceph-volume inventory --format json /dev/sda
2022-12-27 00:14:14.731023 I | cephosd: skipping device "sda": ["Insufficient space (<5GB)"].
2022-12-27 00:14:14.731035 I | cephosd: old lsblk can't detect bluestore signature, so try to detect here
2022-12-27 00:14:14.739724 D | exec: Running command: lsblk /dev/sdb --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-12-27 00:14:14.743340 D | sys: lsblk output: "SIZE=\"16000900661248\" ROTA=\"1\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sdb\" KNAME=\"/dev/sdb\" MOUNTPOINT=\"\" FSTYPE=\"\""
2022-12-27 00:14:14.743354 D | exec: Running command: ceph-volume inventory --format json /dev/sdb
2022-12-27 00:14:15.004877 I | cephosd: skipping device "sdb": ["Insufficient space (<5GB)"].

Our drives that are 3.5T aren't affected. Also on ceph 17.2.5.

Did a little bit of debugging, it looks like the LHS of this is returning 0 because self.sys_api is {} because sys_info.devices doesn't contain it.

https://github.com/ceph/ceph/blob/8e459c55f4d7afef71d5cdfcba01b9583fb3ff3e/src/ceph-volume/ceph_volume/util/device.py#L586

sys_info.devices doesn't contain it because... (in drive.py)

 if get_file_contents(os.path.join(_sys_block_path, dev, 'removable')) == "1":
            continue

and

sh-4.4# cat /sys/block/sda/removable 
1

I think removable is getting set because this is a hotswap disk? but I actually do want to use the drive in Ceph. Either way this is definitely an upstream Ceph bug because it shouldn't say the drive is too small when actually the issue is that it thinks it is removable.

Funnily enough, this code was actually touched very recently (should be in Ceph 18) but the change won't fix this: ceph/ceph@4b3d174 since it happens before the point where it checks removable.

I tried working around this with a udev rule but it did not work :(

rook/rook#11474 (comment)

nikhiljha · 2022-12-28T03:24:05Z

I ended up doing a kernel patch for fun: ocf/nix@bac6ec2, but this is clearly a hack 🙃

It works great though, my OSDs are all up now 🤣🤣🤣

satoru-takeuchi · 2023-01-04T12:48:07Z

I read Ceph's code. There are two problems.

(a) Your devices couldn't create OSDs because these devices are removable.
(b) The rejected reason is wrong (e.g. "Insufficient space (<5GB).") because this line has a bug. In this case, the "size" value should be compared with 5368709120/sector_size.

I'll submit an Ceph issue about (b) because it's apparently a bug. On the other hand, whether (a) is correct or not depends on Ceph's design. This behavior was introduced in this commit. Since this is not a Rook's matter but a Ceph's matter, @MrDrMcCoy @nikhiljha Could you open a new issue to support removable devices?

hub-bag · 2023-01-20T15:39:39Z

I have the same issue when using 32G USB devices in my Raspberry Pi playground. Is it not possible to use USB devices for a ceph cluster?

satoru-takeuchi · 2023-01-20T23:45:53Z

It's impossible. Please open a new issue(feature request) to Ceph issue tracker if necessary.

hub-bag · 2023-01-21T10:33:58Z

Good to know, thank you. I understand the limitations after reading some more discussion threads but it would be good to have something like this in the documentation if possible? Or is this something that really I should have already had the knowledge for?

satoru-takeuchi · 2023-01-21T10:46:21Z

@hub-bag OK, I'll track your suggestion in the following ticket.

#10859

hub-bag · 2023-01-21T13:56:57Z

Awesome, thank you!

satoru-takeuchi · 2023-01-25T23:56:47Z

FYI, there is a rejected feature request to support removble devices.

ceph-volume does not allow the use of removable disks
https://tracker.ceph.com/issues/38833

satoru-takeuchi · 2023-01-26T12:15:26Z

I'll submit an Ceph issue about (b) because it's apparently a bug.

done.

report "Insufficient space (<5GB)" even when disk size is sufficient
https://tracker.ceph.com/issues/58591

satoru-takeuchi · 2023-03-22T03:08:35Z

This problem is not Rook's problem. It will be fixed in this Ceph'S PR.
ceph/ceph#49954

sfxworks · 2023-06-28T10:00:54Z

Weird, these drives use to be detected on this same motherboard/disk combo. Now only the nvmes are.

Defaulted container "provision" out of: provision, copy-bins (init)
2023-06-28 09:52:51.829403 D | exec: Running command: lsblk /dev/sda --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2023-06-28 09:52:51.834035 D | sys: lsblk output: "SIZE=\"8001563222016\" ROTA=\"1\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sda\" KNAME=\"/dev/sda\" MOUNTPOINT=\"\" FSTYPE=\"\""
2023-06-28 09:52:51.834067 D | exec: Running command: sgdisk --print /dev/sda
2023-06-28 09:52:51.838875 D | exec: Running command: udevadm info --query=property /dev/sda
2023-06-28 09:52:51.846403 D | sys: udevadm info output: "DEVLINKS=/dev/disk/by-path/pci-0000:83:00.0-ata-1 /dev/disk/by-id/wwn-0x5000c500c4e69f12 /dev/disk/by-path/pci-0000:83:00.0-ata-1.0 /dev/disk/by-diskseq/1 /dev/disk/by-id/ata-ST8000NM002A-2KE102_WKD16BWM\nDEVNAME=/dev/sda\nDEVPATH=/devices/pci0000:80/0000:80:08.2/0000:83:00.0/ata1/host0/target0:0:0/0:0:0:0/block/sda\nDEVTYPE=disk\nDISKSEQ=1\nID_ATA=1\nID_ATA_DOWNLOAD_MICROCODE=1\nID_ATA_FEATURE_SET_PM=1\nID_ATA_FEATURE_SET_PM_ENABLED=1\nID_ATA_FEATURE_SET_PUIS=1\nID_ATA_FEATURE_SET_PUIS_ENABLED=0\nID_ATA_FEATURE_SET_SECURITY=1\nID_ATA_FEATURE_SET_SECURITY_ENABLED=0\nID_ATA_FEATURE_SET_SECURITY_ENHANCED_ERASE_UNIT_MIN=66272\nID_ATA_FEATURE_SET_SECURITY_ERASE_UNIT_MIN=66272\nID_ATA_FEATURE_SET_SECURITY_FROZEN=1\nID_ATA_FEATURE_SET_SMART=1\nID_ATA_FEATURE_SET_SMART_ENABLED=1\nID_ATA_ROTATION_RATE_RPM=7200\nID_ATA_SATA=1\nID_ATA_SATA_SIGNAL_RATE_GEN1=1\nID_ATA_SATA_SIGNAL_RATE_GEN2=1\nID_ATA_WRITE_CACHE=1\nID_ATA_WRITE_CACHE_ENABLED=1\nID_BUS=ata\nID_MODEL=ST8000NM002A-2KE102\nID_MODEL_ENC=ST8000NM002A-2KE102\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\nID_PATH=pci-0000:83:00.0-ata-1.0\nID_PATH_ATA_COMPAT=pci-0000:83:00.0-ata-1\nID_PATH_TAG=pci-0000_83_00_0-ata-1_0\nID_REVISION=NN02\nID_SERIAL=ST8000NM002A-2KE102_WKD16BWM\nID_SERIAL_SHORT=WKD16BWM\nID_TYPE=disk\nID_WWN=0x5000c500c4e69f12\nID_WWN_WITH_EXTENSION=0x5000c500c4e69f12\nMAJOR=8\nMINOR=0\nSUBSYSTEM=block\nTAGS=:systemd:\nUSEC_INITIALIZED=5223977"
2023-06-28 09:52:51.846424 D | exec: Running command: lsblk --noheadings --path --list --output NAME /dev/sda
2023-06-28 09:52:52.063667 D | inventory: &{Name:sda Parent: HasChildren:false DevLinks:/dev/disk/by-path/pci-0000:83:00.0-ata-1 /dev/disk/by-id/wwn-0x5000c500c4e69f12 /dev/disk/by-path/pci-0000:83:00.0-ata-1.0 /dev/disk/by-diskseq/1 /dev/disk/by-id/ata-ST8000NM002A-2KE102_WKD16BWM Size:8001563222016 UUID:c09e7d98-88bf-4819-80a7-d070636922e4 Serial:ST8000NM002A-2KE102_WKD16BWM Type:disk Rotational:true Readonly:false Partitions:[] Filesystem: Mountpoint: Vendor: Model:ST8000NM002A-2KE102 WWN:0x5000c500c4e69f12 WWNVendorExtension:0x5000c500c4e69f12 Empty:false CephVolumeData: RealPath:/dev/sda KernelName:sda Encrypted:false}
2023-06-28 09:52:52.063863 D | cephosd: &{Name:sda Parent: HasChildren:false DevLinks:/dev/disk/by-path/pci-0000:83:00.0-ata-1 /dev/disk/by-id/wwn-0x5000c500c4e69f12 /dev/disk/by-path/pci-0000:83:00.0-ata-1.0 /dev/disk/by-diskseq/1 /dev/disk/by-id/ata-ST8000NM002A-2KE102_WKD16BWM Size:8001563222016 UUID:c09e7d98-88bf-4819-80a7-d070636922e4 Serial:ST8000NM002A-2KE102_WKD16BWM Type:disk Rotational:true Readonly:false Partitions:[] Filesystem: Mountpoint: Vendor: Model:ST8000NM002A-2KE102 WWN:0x5000c500c4e69f12 WWNVendorExtension:0x5000c500c4e69f12 Empty:false CephVolumeData: RealPath:/dev/sda KernelName:sda Encrypted:false}
2023-06-28 09:52:52.064407 D | exec: Running command: lsblk /dev/sda --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2023-06-28 09:52:52.068568 D | sys: lsblk output: "SIZE=\"8001563222016\" ROTA=\"1\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sda\" KNAME=\"/dev/sda\" MOUNTPOINT=\"\" FSTYPE=\"\""
2023-06-28 09:52:52.068584 D | exec: Running command: ceph-volume inventory --format json /dev/sda
2023-06-28 09:52:52.835040 I | cephosd: skipping device "sda": ["Insufficient space (<5GB)"].

davidpanic · 2023-07-20T09:31:44Z

Heads up for anyone in the same boat:

Create a lvm volume spanning the entire disk and pass that to ceph.

NOTE: For some bizarre reason Ceph doesn't take into account vg names so it needs different lv names for all disks on the same node, otherwise it won't create any OSDs because of name conflicts.

DISK="/dev/sda"
DISK_NO="1"

VG_NAME="hdd$DISK_NO"
LV_NAME="lv$DISK_NO"

vgcreate $VG_NAME $DISK
lvcreate -n $LV_NAME -l 100%FREE $VG_NAME

cluster.yaml:

nodes:
  - name: node-1
    devices:
    - name: /dev/mapper/hdd1-lv1
      config:
        deviceClass: hdd
    - name: /dev/mapper/hdd2-lv2
      config:
        deviceClass: hdd

oboudry-mvp · 2023-10-31T10:31:42Z

For information, I had the same problem and could relate it to the fact that the drives where flagged as removable. Changing device settings from AHCI to IDE in the BIOS fixed the issue. Ceph now uses the drives. Writing this in case it can be a solution for someone else.

MrDrMcCoy added the bug label Dec 22, 2022

satoru-takeuchi self-assigned this Dec 22, 2022

nikhiljha added a commit to ocf/nix that referenced this issue Dec 27, 2022

hack: mark drives as non-removable to bypass "drive too small" issue

e1a0d44

rook/rook#11474 (comment)

satoru-takeuchi mentioned this issue Jan 21, 2023

Write down a table in doc to show which configurations are allowed to create osds #10859

Closed

webdog mentioned this issue Jan 24, 2023

Half of my disks are detected when OSDs are provisioned, the other half are skipped due to insufficient space #11583

Closed

caffeinism mentioned this issue Feb 1, 2023

ceph-volume: fix a bug in _check_generic_reject_reasons ceph/ceph#49954

Merged

satoru-takeuchi closed this as completed Mar 22, 2023

sp98 mentioned this issue Aug 14, 2023

Ceph incorrectly reporting insufficient space on device #12683

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Certain disks skipped for being too small, when they are large enough. #11474

Certain disks skipped for being too small, when they are large enough. #11474

MrDrMcCoy commented Dec 22, 2022

satoru-takeuchi commented Dec 22, 2022

MrDrMcCoy commented Dec 22, 2022 •

edited

satoru-takeuchi commented Dec 22, 2022

nikhiljha commented Dec 27, 2022 •

edited

nikhiljha commented Dec 28, 2022 •

edited

satoru-takeuchi commented Jan 4, 2023

hub-bag commented Jan 20, 2023

satoru-takeuchi commented Jan 20, 2023

hub-bag commented Jan 21, 2023

satoru-takeuchi commented Jan 21, 2023

hub-bag commented Jan 21, 2023

satoru-takeuchi commented Jan 25, 2023

satoru-takeuchi commented Jan 26, 2023

satoru-takeuchi commented Mar 22, 2023 •

edited

sfxworks commented Jun 28, 2023

davidpanic commented Jul 20, 2023

oboudry-mvp commented Oct 31, 2023

Certain disks skipped for being too small, when they are large enough. #11474

Certain disks skipped for being too small, when they are large enough. #11474

Comments

MrDrMcCoy commented Dec 22, 2022

satoru-takeuchi commented Dec 22, 2022

MrDrMcCoy commented Dec 22, 2022 • edited

satoru-takeuchi commented Dec 22, 2022

nikhiljha commented Dec 27, 2022 • edited

nikhiljha commented Dec 28, 2022 • edited

satoru-takeuchi commented Jan 4, 2023

hub-bag commented Jan 20, 2023

satoru-takeuchi commented Jan 20, 2023

hub-bag commented Jan 21, 2023

satoru-takeuchi commented Jan 21, 2023

hub-bag commented Jan 21, 2023

satoru-takeuchi commented Jan 25, 2023

satoru-takeuchi commented Jan 26, 2023

satoru-takeuchi commented Mar 22, 2023 • edited

sfxworks commented Jun 28, 2023

davidpanic commented Jul 20, 2023

oboudry-mvp commented Oct 31, 2023

MrDrMcCoy commented Dec 22, 2022 •

edited

nikhiljha commented Dec 27, 2022 •

edited

nikhiljha commented Dec 28, 2022 •

edited

satoru-takeuchi commented Mar 22, 2023 •

edited