Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain disks skipped for being too small, when they are large enough. #11474

Closed
MrDrMcCoy opened this issue Dec 22, 2022 · 17 comments
Closed

Certain disks skipped for being too small, when they are large enough. #11474

MrDrMcCoy opened this issue Dec 22, 2022 · 17 comments
Assignees
Labels

Comments

@MrDrMcCoy
Copy link

Deviation from expected behavior:
Rook / Ceph believes that a certain disk type used on all my nodes is smaller than 5GB and refuses to use it, despite its actual size being 256GB.

Prepare pod log message: cephosd: skipping device "sda": ["Insufficient space (<5GB)"].

Actual size of device:

root@rock5b-13:~# lsblk /dev/sda
NAME MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda    8:0    1 238.5G  0 disk

Expected behavior:
Rook / Ceph should use this disk.

How to reproduce it (minimal and precise):

  1. Ensure all traces of Rook are removed from Kubernetes and /var/lib/rook
  2. Nuke the disk:
    # Remove partition tables
    sgdisk --zap-all /dev/sda
    # Remove filesystem identifiers
    wipefs --all --force /dev/sda
    # Zero first 100MB
    dd if=/dev/zero bs=1M count=100 oflag=direct of=/dev/sda
    # Zero last 100MB
    dd bs=512 if=/dev/zero of=/dev/sda oflag=direct count=204800 seek=$(($(blockdev --getsz /dev/sda) - 204800))
    # Inform kernel of changes
    partprobe /dev/sda
    
  3. Create Rook operator via Helm
  4. Create Rook cluster via Helm
  5. Observe that this disk type is skipped, while the NVMe drives are properly picked up

File(s) to submit:

Helm values:
rook-ceph-cluster-v1.10.7-values.yml.txt
rook-ceph-v1.10.7-values.yml.txt

Logs to submit:

rook-ceph-osd-prepare-rock5b-13-dz2fd.log
rook-ceph-operator-5bc5659499-ndq4z.log
rook-discover-xr4tw.log

Cluster Status to submit:

status:
  ceph:
    capacity:
      bytesAvailable: 3072491655168
      bytesTotal: 3072628629504
      bytesUsed: 136974336
      lastUpdated: '2022-12-22T04:38:35Z'
    fsid: 491af2f2-017a-4301-853f-c04fc62b9245
    health: HEALTH_OK
    lastChecked: '2022-12-22T04:38:35Z'
    versions:
      mds:
        ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable): 2
      mgr:
        ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable): 2
      mon:
        ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable): 3
      osd:
        ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable): 3
      overall:
        ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable): 10
  conditions:
    - lastHeartbeatTime: '2022-12-22T04:38:37Z'
      lastTransitionTime: '2022-12-22T03:57:33Z'
      message: Cluster created successfully
      reason: ClusterCreated
      status: 'True'
      type: Ready
  message: Cluster created successfully
  observedGeneration: 2
  phase: Ready
  state: Created
  storage:
    deviceClasses:
      - name: nvme
  version:
    image: quay.io/ceph/ceph:v17.2.5
    version: 17.2.5-0

Environment:

  • OS (e.g. from /etc/os-release): Debian GNU/Linux Bookworm (Sid, custom built Armbian image, needed for certain kernel modules and btrfs rootfs)
  • Kernel (e.g. uname -a): Linux rock5b-13 5.10.110-rockchip-rk3588-gadfc1747e7fe-dirty #trunk SMP Tue Dec 20 11:19:50 UTC 2022 aarch64 GNU/Linux
  • Cloud provider or hardware configuration: Radxa Rock5 Model B
  • Rook version (use rook version inside of a Rook Pod): v1.10.7
  • Storage backend version (e.g. for ceph do ceph -v): ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
  • Kubernetes version (use kubectl version): v1.26.0
  • Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): Microk8s
  • Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_OK
@MrDrMcCoy MrDrMcCoy added the bug label Dec 22, 2022
@satoru-takeuchi satoru-takeuchi self-assigned this Dec 22, 2022
@satoru-takeuchi
Copy link
Member

"rock5b-13" seems not to specify "/dev/sda".

     - name: rock5b-13
        devicePathFilter: (usb|nvme)
        # deviceFilter: (sd[a-z])
        # devices:
        #   - name: /dev/disk/by-path/platform-fe150000.pcie-pci-0000:01:00.0-nvme-1
        #     deviceClass: nvme
          # - name: /dev/disk/by-path/platform-xhci-hcd.10.auto-usb-0:1:1.0-scsi-0:0:0:0
          #   deviceClass: hdd
          # - name: sda # /dev/disk/by-path/platform-xhci-hcd.11.auto-usb-0:1:1.0-scsi-0:0:0:0
            # deviceClass: hdd

Could you add "/dev/sda" to be picked by Rook (e.g. devicePathFilter: (usb|nvme|sda)) and restart the operator pod (it might not be necessary because you enable discovery daemonset)?

If osd prepare pod won't pick up sda, please show me the log of parepare pod again.

Prepare pod log message: cephosd: skipping device "sda": ["Insufficient space (<5GB)"].

This log was shown while processing "deviceFilter: (sd[a-z])" filter and doesn't directly correspond to the detection of "/dev/sda."
In addition, IIRC, there is a Ceph's bug that recognizes some devices as "Insufficient space (<5GB)."

@MrDrMcCoy
Copy link
Author

MrDrMcCoy commented Dec 22, 2022

According to the documentation, devicePathFilter is a regex that applies to the udev device path. In the commented out devices list that I had previously tried, you can see that the path (/dev/disk/by-path/platform-xhci-hcd.11.auto-usb-0:1:1.0-scsi-0:0:0:0) contains usb (but not sda) and should be matched by this regex. If that is not the case, would you please clarify what devicePathFilter really matches against?

Here are the logs from setting devicePathFilter to (usb|nvme|sd[a-z]):

rook-ceph-operator-5bc5659499-bw4lp.log
rook-ceph-osd-prepare-rock5b-13-c5vwd.log
rook-ceph-cluster-v1.10.7-values.yml.txt

[rook@rook-ceph-tools-544cb74cb8-6bq64 /]$ ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME           STATUS  REWEIGHT  PRI-AFF
-1         2.79446  root default                                 
-5         0.93149      host rock5b-11                           
 1   nvme  0.93149          osd.1           up   1.00000  1.00000
-7         0.93149      host rock5b-12                           
 2   nvme  0.93149          osd.2           up   1.00000  1.00000
-3         0.93149      host rock5b-13                           
 0   nvme  0.93149          osd.0           up   1.00000  1.00000

Additionally, I had attempted this previously with deviceFilter and the explicit devices mappings, each giving me the same results. I had also tried setting devices to sda and the udev path with no difference observed.

In addition, IIRC, there is a Ceph's bug that recognizes some devices as "Insufficient space (<5GB)."

This sounds promising! Would you please direct me to the details of this bug and perhaps suggest a version not affected by it?

@satoru-takeuchi
Copy link
Member

I read the new logs and found that this behavior is actually odd. sda is one of the a candidate of the osd creation, has enough space, hasn't any fs, and has a valid device type ("disk").

I'll try to reproduce this problem later (might be the next year.)

This sounds promising! Would you please direct me to the details of this bug and perhaps suggest a version not affected by it?

I'll search the ceph issue tracker and will tell you the issue.

@nikhiljha
Copy link

nikhiljha commented Dec 27, 2022

We're also running into this. Our full configuration: https://github.com/ocf/kubernetes/blob/main/apps/rook.py

2022-12-27 00:14:14.461309 I | cephosd: old lsblk can't detect bluestore signature, so try to detect here
2022-12-27 00:14:14.461864 D | exec: Running command: lsblk /dev/sda --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-12-27 00:14:14.465145 D | sys: lsblk output: "SIZE=\"16000900661248\" ROTA=\"1\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sda\" KNAME=\"/dev/sda\" MOUNTPOINT=\"\" FSTYPE=\"\""
2022-12-27 00:14:14.465162 D | exec: Running command: ceph-volume inventory --format json /dev/sda
2022-12-27 00:14:14.731023 I | cephosd: skipping device "sda": ["Insufficient space (<5GB)"].
2022-12-27 00:14:14.731035 I | cephosd: old lsblk can't detect bluestore signature, so try to detect here
2022-12-27 00:14:14.739724 D | exec: Running command: lsblk /dev/sdb --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2022-12-27 00:14:14.743340 D | sys: lsblk output: "SIZE=\"16000900661248\" ROTA=\"1\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sdb\" KNAME=\"/dev/sdb\" MOUNTPOINT=\"\" FSTYPE=\"\""
2022-12-27 00:14:14.743354 D | exec: Running command: ceph-volume inventory --format json /dev/sdb
2022-12-27 00:14:15.004877 I | cephosd: skipping device "sdb": ["Insufficient space (<5GB)"].

Our drives that are 3.5T aren't affected. Also on ceph 17.2.5.

Did a little bit of debugging, it looks like the LHS of this is returning 0 because self.sys_api is {} because sys_info.devices doesn't contain it.

https://github.com/ceph/ceph/blob/8e459c55f4d7afef71d5cdfcba01b9583fb3ff3e/src/ceph-volume/ceph_volume/util/device.py#L586

sys_info.devices doesn't contain it because... (in drive.py)

 if get_file_contents(os.path.join(_sys_block_path, dev, 'removable')) == "1":
            continue

and

sh-4.4# cat /sys/block/sda/removable 
1

I think removable is getting set because this is a hotswap disk? but I actually do want to use the drive in Ceph. Either way this is definitely an upstream Ceph bug because it shouldn't say the drive is too small when actually the issue is that it thinks it is removable.

Funnily enough, this code was actually touched very recently (should be in Ceph 18) but the change won't fix this: ceph/ceph@4b3d174 since it happens before the point where it checks removable.

I tried working around this with a udev rule but it did not work :(

@nikhiljha
Copy link

nikhiljha commented Dec 28, 2022

I ended up doing a kernel patch for fun: ocf/nix@bac6ec2, but this is clearly a hack 🙃

It works great though, my OSDs are all up now 🤣🤣🤣

@satoru-takeuchi
Copy link
Member

I read Ceph's code. There are two problems.

  • (a) Your devices couldn't create OSDs because these devices are removable.
  • (b) The rejected reason is wrong (e.g. "Insufficient space (<5GB).") because this line has a bug. In this case, the "size" value should be compared with 5368709120/sector_size.

I'll submit an Ceph issue about (b) because it's apparently a bug. On the other hand, whether (a) is correct or not depends on Ceph's design. This behavior was introduced in this commit. Since this is not a Rook's matter but a Ceph's matter, @MrDrMcCoy @nikhiljha Could you open a new issue to support removable devices?

@hub-bag
Copy link

hub-bag commented Jan 20, 2023

I have the same issue when using 32G USB devices in my Raspberry Pi playground. Is it not possible to use USB devices for a ceph cluster?

@satoru-takeuchi
Copy link
Member

It's impossible. Please open a new issue(feature request) to Ceph issue tracker if necessary.

@hub-bag
Copy link

hub-bag commented Jan 21, 2023

Good to know, thank you. I understand the limitations after reading some more discussion threads but it would be good to have something like this in the documentation if possible? Or is this something that really I should have already had the knowledge for?

@satoru-takeuchi
Copy link
Member

@hub-bag OK, I'll track your suggestion in the following ticket.

#10859

@hub-bag
Copy link

hub-bag commented Jan 21, 2023

Awesome, thank you!

@satoru-takeuchi
Copy link
Member

FYI, there is a rejected feature request to support removble devices.

ceph-volume does not allow the use of removable disks
https://tracker.ceph.com/issues/38833

@satoru-takeuchi
Copy link
Member

I'll submit an Ceph issue about (b) because it's apparently a bug.

done.

report "Insufficient space (<5GB)" even when disk size is sufficient
https://tracker.ceph.com/issues/58591

@satoru-takeuchi
Copy link
Member

satoru-takeuchi commented Mar 22, 2023

This problem is not Rook's problem. It will be fixed in this Ceph'S PR.
ceph/ceph#49954

@sfxworks
Copy link

Weird, these drives use to be detected on this same motherboard/disk combo. Now only the nvmes are.

Defaulted container "provision" out of: provision, copy-bins (init)
2023-06-28 09:52:51.829403 D | exec: Running command: lsblk /dev/sda --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2023-06-28 09:52:51.834035 D | sys: lsblk output: "SIZE=\"8001563222016\" ROTA=\"1\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sda\" KNAME=\"/dev/sda\" MOUNTPOINT=\"\" FSTYPE=\"\""
2023-06-28 09:52:51.834067 D | exec: Running command: sgdisk --print /dev/sda
2023-06-28 09:52:51.838875 D | exec: Running command: udevadm info --query=property /dev/sda
2023-06-28 09:52:51.846403 D | sys: udevadm info output: "DEVLINKS=/dev/disk/by-path/pci-0000:83:00.0-ata-1 /dev/disk/by-id/wwn-0x5000c500c4e69f12 /dev/disk/by-path/pci-0000:83:00.0-ata-1.0 /dev/disk/by-diskseq/1 /dev/disk/by-id/ata-ST8000NM002A-2KE102_WKD16BWM\nDEVNAME=/dev/sda\nDEVPATH=/devices/pci0000:80/0000:80:08.2/0000:83:00.0/ata1/host0/target0:0:0/0:0:0:0/block/sda\nDEVTYPE=disk\nDISKSEQ=1\nID_ATA=1\nID_ATA_DOWNLOAD_MICROCODE=1\nID_ATA_FEATURE_SET_PM=1\nID_ATA_FEATURE_SET_PM_ENABLED=1\nID_ATA_FEATURE_SET_PUIS=1\nID_ATA_FEATURE_SET_PUIS_ENABLED=0\nID_ATA_FEATURE_SET_SECURITY=1\nID_ATA_FEATURE_SET_SECURITY_ENABLED=0\nID_ATA_FEATURE_SET_SECURITY_ENHANCED_ERASE_UNIT_MIN=66272\nID_ATA_FEATURE_SET_SECURITY_ERASE_UNIT_MIN=66272\nID_ATA_FEATURE_SET_SECURITY_FROZEN=1\nID_ATA_FEATURE_SET_SMART=1\nID_ATA_FEATURE_SET_SMART_ENABLED=1\nID_ATA_ROTATION_RATE_RPM=7200\nID_ATA_SATA=1\nID_ATA_SATA_SIGNAL_RATE_GEN1=1\nID_ATA_SATA_SIGNAL_RATE_GEN2=1\nID_ATA_WRITE_CACHE=1\nID_ATA_WRITE_CACHE_ENABLED=1\nID_BUS=ata\nID_MODEL=ST8000NM002A-2KE102\nID_MODEL_ENC=ST8000NM002A-2KE102\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\\x20\nID_PATH=pci-0000:83:00.0-ata-1.0\nID_PATH_ATA_COMPAT=pci-0000:83:00.0-ata-1\nID_PATH_TAG=pci-0000_83_00_0-ata-1_0\nID_REVISION=NN02\nID_SERIAL=ST8000NM002A-2KE102_WKD16BWM\nID_SERIAL_SHORT=WKD16BWM\nID_TYPE=disk\nID_WWN=0x5000c500c4e69f12\nID_WWN_WITH_EXTENSION=0x5000c500c4e69f12\nMAJOR=8\nMINOR=0\nSUBSYSTEM=block\nTAGS=:systemd:\nUSEC_INITIALIZED=5223977"
2023-06-28 09:52:51.846424 D | exec: Running command: lsblk --noheadings --path --list --output NAME /dev/sda
2023-06-28 09:52:52.063667 D | inventory: &{Name:sda Parent: HasChildren:false DevLinks:/dev/disk/by-path/pci-0000:83:00.0-ata-1 /dev/disk/by-id/wwn-0x5000c500c4e69f12 /dev/disk/by-path/pci-0000:83:00.0-ata-1.0 /dev/disk/by-diskseq/1 /dev/disk/by-id/ata-ST8000NM002A-2KE102_WKD16BWM Size:8001563222016 UUID:c09e7d98-88bf-4819-80a7-d070636922e4 Serial:ST8000NM002A-2KE102_WKD16BWM Type:disk Rotational:true Readonly:false Partitions:[] Filesystem: Mountpoint: Vendor: Model:ST8000NM002A-2KE102 WWN:0x5000c500c4e69f12 WWNVendorExtension:0x5000c500c4e69f12 Empty:false CephVolumeData: RealPath:/dev/sda KernelName:sda Encrypted:false}
2023-06-28 09:52:52.063863 D | cephosd: &{Name:sda Parent: HasChildren:false DevLinks:/dev/disk/by-path/pci-0000:83:00.0-ata-1 /dev/disk/by-id/wwn-0x5000c500c4e69f12 /dev/disk/by-path/pci-0000:83:00.0-ata-1.0 /dev/disk/by-diskseq/1 /dev/disk/by-id/ata-ST8000NM002A-2KE102_WKD16BWM Size:8001563222016 UUID:c09e7d98-88bf-4819-80a7-d070636922e4 Serial:ST8000NM002A-2KE102_WKD16BWM Type:disk Rotational:true Readonly:false Partitions:[] Filesystem: Mountpoint: Vendor: Model:ST8000NM002A-2KE102 WWN:0x5000c500c4e69f12 WWNVendorExtension:0x5000c500c4e69f12 Empty:false CephVolumeData: RealPath:/dev/sda KernelName:sda Encrypted:false}
2023-06-28 09:52:52.064407 D | exec: Running command: lsblk /dev/sda --bytes --nodeps --pairs --paths --output SIZE,ROTA,RO,TYPE,PKNAME,NAME,KNAME,MOUNTPOINT,FSTYPE
2023-06-28 09:52:52.068568 D | sys: lsblk output: "SIZE=\"8001563222016\" ROTA=\"1\" RO=\"0\" TYPE=\"disk\" PKNAME=\"\" NAME=\"/dev/sda\" KNAME=\"/dev/sda\" MOUNTPOINT=\"\" FSTYPE=\"\""
2023-06-28 09:52:52.068584 D | exec: Running command: ceph-volume inventory --format json /dev/sda
2023-06-28 09:52:52.835040 I | cephosd: skipping device "sda": ["Insufficient space (<5GB)"].

@davidpanic
Copy link

Heads up for anyone in the same boat:

Create a lvm volume spanning the entire disk and pass that to ceph.

NOTE: For some bizarre reason Ceph doesn't take into account vg names so it needs different lv names for all disks on the same node, otherwise it won't create any OSDs because of name conflicts.

DISK="/dev/sda"
DISK_NO="1"

VG_NAME="hdd$DISK_NO"
LV_NAME="lv$DISK_NO"

vgcreate $VG_NAME $DISK
lvcreate -n $LV_NAME -l 100%FREE $VG_NAME

cluster.yaml:

nodes:
  - name: node-1
    devices:
    - name: /dev/mapper/hdd1-lv1
      config:
        deviceClass: hdd
    - name: /dev/mapper/hdd2-lv2
      config:
        deviceClass: hdd

@oboudry-mvp
Copy link

For information, I had the same problem and could relate it to the fact that the drives where flagged as removable. Changing device settings from AHCI to IDE in the BIOS fixed the issue. Ceph now uses the drives. Writing this in case it can be a solution for someone else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants