Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broken rook-ceph-osd-prepare job when used metadata device with osdsPerDevice more that 1 #13637

Closed
degorenko opened this issue Jan 29, 2024 · 1 comment · Fixed by #13673
Closed
Assignees
Labels

Comments

@degorenko
Copy link
Contributor

rook-ceph-osd-prepare job is failed for node, when some device is using at the same time metadataDevice and osdsPerDevice > 1 parameters:

2024-01-22 16:20:10.829583 I | cephosd: using vde as metadataDevice for device /dev/vdd and let ceph-volume lvm batch decide how to create volumes
2024-01-22 16:20:10.829599 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 2 --crush-device-class hdd /dev/vdd --db-devices /dev/vde --report
2024-01-22 16:20:11.494524 D | exec: --> passed data devices: 1 physical, 0 LVM
2024-01-22 16:20:11.495092 D | exec: --> relative data size: 0.5
2024-01-22 16:20:11.495653 D | exec: --> passed block_db devices: 1 physical, 0 LVM
2024-01-22 16:20:11.496807 D | exec:
2024-01-22 16:20:11.496900 D | exec: Total OSDs: 2
2024-01-22 16:20:11.497067 D | exec:
2024-01-22 16:20:11.497097 D | exec:   Type            Path                                                    LV Size         % of device
2024-01-22 16:20:11.497208 D | exec: ----------------------------------------------------------------------------------------------------
2024-01-22 16:20:11.497259 D | exec:   data            /dev/vdd                                                25.00 GB        50.00%
2024-01-22 16:20:11.497372 D | exec:   block_db        /dev/vde                                                25.00 GB        50.00%
2024-01-22 16:20:11.497461 D | exec: ----------------------------------------------------------------------------------------------------
2024-01-22 16:20:11.497507 D | exec:   data            /dev/vdd                                                25.00 GB        50.00%
2024-01-22 16:20:11.497613 D | exec:   block_db        /dev/vde                                                25.00 GB        50.00%
2024-01-22 16:20:11.520805 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 2 --crush-device-class hdd /dev/vdd --db-devices /dev/vde --report --format json
2024-01-22 16:20:12.269088 D | cephosd: ceph-volume reports: [{"data": "/dev/vdd", "data_size": "25.00 GB", "encryption": "None", "block_db": "/dev/vde", "block_db_size": "25.00 GB"}, {"data": "/dev/vdd", "data_size": "25.00 GB", "encryption": "None", "block_db": "/dev/vde", "block_db_size": "25.00 GB"}]
2024-01-22 16:20:12.291122 C | rookcmd: failed to configure devices: failed to initialize osd: failed to create enough required devices, required: [{"data": "/dev/vdd", "data_size": "25.00 GB", "encryption": "None", "block_db": "/dev/vde", "block_db_size": "25.00 GB"}, {"data": "/dev/vdd", "data_size": "25.00 GB", "encryption": "None", "block_db": "/dev/vde", "block_db_size": "25.00 GB"}], actual: [{/dev/vde None /dev/vdd 25.00 GB 25.00 GB} {/dev/vde None /dev/vdd 25.00 GB 25.00 GB}]

Rook version is v1.12.10, Ceph version is v17.2.7.

Looks like issue is in incorrect condition here: https://github.com/rook/rook/blob/v1.12.10/pkg/daemon/ceph/osd/volume.go#L782

In case of manual cmd batch prepare run

ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 2 --crush-device-class hdd /dev/vdd --db-devices /dev/vde

everything is working.

lsblk output:

NAME                                                                                                  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop0                                                                                                   7:0    0 63.5M  1 loop
loop1                                                                                                   7:1    0 91.8M  1 loop
loop2                                                                                                   7:2    0 40.8M  1 loop
loop3                                                                                                   7:3    0 40.4M  1 loop
loop4                                                                                                   7:4    0 63.9M  1 loop
nbd0                                                                                                   43:0    0    0B  0 disk
nbd1                                                                                                   43:32   0    0B  0 disk
nbd2                                                                                                   43:64   0    0B  0 disk
nbd3                                                                                                   43:96   0    0B  0 disk
nbd4                                                                                                   43:128  0    0B  0 disk
nbd5                                                                                                   43:160  0    0B  0 disk
nbd6                                                                                                   43:192  0    0B  0 disk
nbd7                                                                                                   43:224  0    0B  0 disk
vda                                                                                                   252:0    0  100G  0 disk
├─vda1                                                                                                252:1    0 99.9G  0 part /usr/local/bin
│                                                                                                                              /etc/hosts
│                                                                                                                              /etc/hostname
│                                                                                                                              /etc/resolv.conf
│                                                                                                                              /dev/termination-log
├─vda14                                                                                               252:14   0    4M  0 part
└─vda15                                                                                               252:15   0  106M  0 part
vdb                                                                                                   252:16   0   50G  0 disk
└─ceph--4fddaca2--a986--4873--b3ff--62fc187b7042-osd--block--8dfe5173--3319--4ca5--bb38--53338fd65f09 253:0    0   50G  0 lvm
vdc                                                                                                   252:32   0   64M  0 disk
vdd                                                                                                   252:48   0   50G  0 disk
vde                                                                                                   252:64   0   50G  0 disk
vdf                                                                                                   252:80   0   50G  0 disk
nbd8                                                                                                   43:256  0    0B  0 disk
nbd9                                                                                                   43:288  0    0B  0 disk
nbd10                                                                                                  43:320  0    0B  0 disk
nbd11                                                                                                  43:352  0    0B  0 disk
nbd12                                                                                                  43:384  0    0B  0 disk
nbd13                                                                                                  43:416  0    0B  0 disk
nbd14                                                                                                  43:448  0    0B  0 disk
nbd15                                                                                                  43:480  0    0B  0 disk

blkid output:

/dev/loop1: TYPE="squashfs"
/dev/vdb: UUID="swquhj-tdIZ-EkZb-gjDL-sKXa-JwxB-UVNQhP" TYPE="LVM2_member"
/dev/loop4: TYPE="squashfs"
/dev/loop2: TYPE="squashfs"
/dev/loop0: TYPE="squashfs"
/dev/mapper/ceph--4fddaca2--a986--4873--b3ff--62fc187b7042-osd--block--8dfe5173--3319--4ca5--bb38--53338fd65f09: TYPE="ceph_bluestore"
/dev/vdc: SEC_TYPE="msdos" LABEL_FATBOOT="config-2" LABEL="config-2" UUID="7CD1-65F8" TYPE="vfat"
/dev/vda15: LABEL_FATBOOT="UEFI" LABEL="UEFI" UUID="FF99-8513" TYPE="vfat" PARTUUID="615b92e7-d07a-4782-abbb-6b3c2448d415"
/dev/vda1: LABEL="cloudimg-rootfs" UUID="00999b20-fe3c-4c8c-a276-fbb809583bb1" TYPE="ext4" PARTUUID="19e5c42f-97d2-4a70-bdcd-5a8d8ab372da"
/dev/loop3: TYPE="squashfs"
/dev/vda14: PARTUUID="fdaac75b-93a4-4156-9d26-27bff81500ae"

logs.txt
cephcluster.txt

@degorenko degorenko added the bug label Jan 29, 2024
@satoru-takeuchi satoru-takeuchi self-assigned this Jan 29, 2024
satoru-takeuchi added a commit to cybozu-go/rook that referenced this issue Feb 2, 2024
The validation logic of checking the number of devices is
wrong when `metadataDevice` is set and `osdsPerDevice` > 1.
`len(cvReports)` is the expected number of OSDs and is
the number of specified data devices multiplied
by `osdsPerDevice`.

Closes: rook#13637

Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
@satoru-takeuchi
Copy link
Member

@degorenko This problem will be fixed in #13673.

satoru-takeuchi added a commit to cybozu-go/rook that referenced this issue Feb 6, 2024
The validation logic of checking the number of devices is
wrong when `metadataDevice` is set and `osdsPerDevice` > 1.
`len(cvReports)` is the expected number of OSDs and is
the number of specified data devices multiplied
by `osdsPerDevice`.

Closes: rook#13637

Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
satoru-takeuchi added a commit to cybozu-go/rook that referenced this issue Feb 6, 2024
The validation logic of checking the number of devices is
wrong when `metadataDevice` is set and `osdsPerDevice` > 1.
`len(cvReports)` is the expected number of OSDs and is
the number of specified data devices multiplied
by `osdsPerDevice`.

Closes: rook#13637

Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
satoru-takeuchi added a commit to cybozu-go/rook that referenced this issue Feb 7, 2024
The validation logic of checking the number of devices is
wrong when `metadataDevice` is set and `osdsPerDevice` > 1.
`len(cvReports)` is the expected number of OSDs and is
the number of specified data devices multiplied
by `osdsPerDevice`.

Closes: rook#13637

Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
mergify bot pushed a commit that referenced this issue Feb 7, 2024
The validation logic of checking the number of devices is
wrong when `metadataDevice` is set and `osdsPerDevice` > 1.
`len(cvReports)` is the expected number of OSDs and is
the number of specified data devices multiplied
by `osdsPerDevice`.

Closes: #13637

Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
(cherry picked from commit 753bbfe)
satoru-takeuchi added a commit that referenced this issue Feb 8, 2024
The validation logic of checking the number of devices is
wrong when `metadataDevice` is set and `osdsPerDevice` > 1.
`len(cvReports)` is the expected number of OSDs and is
the number of specified data devices multiplied
by `osdsPerDevice`.

Closes: #13637

Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
(cherry picked from commit 753bbfe)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants