Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSP5 OSD Down after reboot #58

Open
wuast94 opened this issue Mar 13, 2024 · 4 comments
Open

RSP5 OSD Down after reboot #58

wuast94 opened this issue Mar 13, 2024 · 4 comments

Comments

@wuast94
Copy link

wuast94 commented Mar 13, 2024

Describe the bug
When i Create an Ceph OSD it works without problems, as soon as i reboot the node the OSD wont come back up.
To Reproduce
Steps to reproduce the behavior:

  1. Install Proxmox and Ceph (using your repos of course)
  2. Create OSD
  3. Reboot
  4. OSD gone

ENV (please complete the following information):

  • OS: Debian GNU/Linux 12 (bookworm)
  • ARCH: [arm64
  • Raspberry PI 5
pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.6.20+rpt-rpi-v8)
pve-manager: 8.1.3+pve1 (running version: 8.1.3+pve1/26764642342c55bb)
proxmox-kernel-helper: 8.1.0
ceph: 18.2.0-pve2
ceph-fuse: 18.2.0-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx7
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0-1
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.4
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.2-1
proxmox-backup-file-restore: 3.0.4-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: not correctly installed
pve-firewall: 5.0.3
pve-firmware: 3.8-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.4
pve-qemu-kvm: 8.1.2-4
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.10+pve1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.0-pve4

Additional context

systemctl status ceph-osd@*.service dont give back anything
also journalctl -xeu ceph-osd@2.service no entries

i double checked that i am on the right host and use the right OSD number

output of OSD install:


The ZFS modules cannot be auto-loaded.
Try running 'modprobe zfs' as root to manually load them.
command '/sbin/zpool list -HPLv' failed: exit code 1

create OSD on /dev/sda (bluestore)
wiping block device /dev/sda
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.044 s, 201 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 8331d767-af24-40da-bac0-ccbaf0fcda92
Running command: vgcreate --force --yes ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099 /dev/sda
 stdout: Physical volume "/dev/sda" successfully created.
 stdout: Volume group "ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099" successfully created
Running command: lvcreate --yes -l 476924 -n osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92 ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099
 stdout: Logical volume "osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92" created.
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-2
Running command: /bin/chown -h ceph:ceph /dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92
Running command: /bin/chown -R ceph:ceph /dev/dm-0
Running command: /bin/ln -s /dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92 /var/lib/ceph/osd/ceph-2/block
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-2/activate.monmap
 stderr: 2024-03-13T09:48:04.607+0100 7fb083f180 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2024-03-13T09:48:04.607+0100 7fb083f180 -1 AuthRegistry(0x7fac063e30) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: got monmap epoch 5
--> Creating keyring file for osd.2
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 2 --monmap /var/lib/ceph/osd/ceph-2/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-2/ --osd-uuid 8331d767-af24-40da-bac0-ccbaf0fcda92 --setuser ceph --setgroup ceph
 stderr: 2024-03-13T09:48:05.071+0100 7f93c67040 -1 bluestore(/var/lib/ceph/osd/ceph-2//block) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
 stderr: 2024-03-13T09:48:05.075+0100 7f93c67040 -1 bluestore(/var/lib/ceph/osd/ceph-2//block) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
 stderr: 2024-03-13T09:48:05.075+0100 7f93c67040 -1 bluestore(/var/lib/ceph/osd/ceph-2//block) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
 stderr: 2024-03-13T09:48:05.079+0100 7f93c67040 -1 bluestore(/var/lib/ceph/osd/ceph-2/) _read_fsid unparsable uuid
--> ceph-volume lvm prepare successful for: /dev/sda
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92 --path /var/lib/ceph/osd/ceph-2 --no-mon-config
Running command: /bin/ln -snf /dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92 /var/lib/ceph/osd/ceph-2/block
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-2/block
Running command: /bin/chown -R ceph:ceph /dev/dm-0
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
Running command: /bin/systemctl enable ceph-volume@lvm-2-8331d767-af24-40da-bac0-ccbaf0fcda92
 stderr: Created symlink /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-2-8331d767-af24-40da-bac0-ccbaf0fcda92.service -> /lib/systemd/system/ceph-volume@.service.
Running command: /bin/systemctl enable --runtime ceph-osd@2
 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@2.service -> /lib/systemd/system/ceph-osd@.service.
Running command: /bin/systemctl start ceph-osd@2
--> ceph-volume lvm activate successful for osd ID: 2
--> ceph-volume lvm create successful for: /dev/sda
TASK OK
@jiangcuo
Copy link
Owner

/var/log/ceph is there has osd logs ?

@wuast94
Copy link
Author

wuast94 commented Mar 13, 2024

[2024-03-13 13:19:12,115][ceph_volume.main][INFO  ] Running command: ceph-volume  lvm trigger 1-18b2426f-90d1-4992-847c-a52b7ef19dc7
[2024-03-13 13:19:12,120][ceph_volume.util.system][WARNING] Executable lvs not found on the host, will return lvs as-is
[2024-03-13 13:19:12,120][ceph_volume.process][INFO  ] Running command: lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S tags={ceph.osd_id=1,ceph.osd_fsid=18b2426f-90d1-4992-847c-a52b7ef19dc7} -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2024-03-13 13:19:12,151][ceph_volume.main][INFO  ] Running command: ceph-volume  lvm trigger 2-611efe97-8305-4a23-9559-33dd95bce599
[2024-03-13 13:19:12,154][ceph_volume.util.system][WARNING] Executable lvs not found on the host, will return lvs as-is
[2024-03-13 13:19:12,154][ceph_volume.process][INFO  ] Running command: lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S tags={ceph.osd_id=2,ceph.osd_fsid=611efe97-8305-4a23-9559-33dd95bce599} -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2024-03-13 13:19:12,192][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 59, in newfunc
    return f(*a, **kw)
           ^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ceph_volume/main.py", line 153, in main
    terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python3/dist-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/main.py", line 46, in main
    terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python3/dist-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
           ^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/trigger.py", line 70, in main
    Activate(['--auto-detect-objectstore', osd_id, osd_uuid]).main()
  File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/activate.py", line 281, in main
    self.activate(args)
  File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
           ^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/activate.py", line 197, in activate
    raise RuntimeError('could not find osd.%s with osd_fsid %s' %
RuntimeError: could not find osd.1 with osd_fsid 18b2426f-90d1-4992-847c-a52b7ef19dc7
[2024-03-13 13:19:12,220][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 59, in newfunc
    return f(*a, **kw)
           ^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ceph_volume/main.py", line 153, in main
    terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python3/dist-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/main.py", line 46, in main
    terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python3/dist-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
           ^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/trigger.py", line 70, in main
    Activate(['--auto-detect-objectstore', osd_id, osd_uuid]).main()
  File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/activate.py", line 281, in main
    self.activate(args)
  File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
           ^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/activate.py", line 197, in activate
    raise RuntimeError('could not find osd.%s with osd_fsid %s' %
RuntimeError: could not find osd.2 with osd_fsid 611efe97-8305-4a23-9559-33dd95bce599
[2024-03-13 13:19:12,365][ceph_volume.main][INFO  ] Running command: ceph-volume  lvm trigger 2-8331d767-af24-40da-bac0-ccbaf0fcda92
[2024-03-13 13:19:12,368][ceph_volume.util.system][WARNING] Executable lvs not found on the host, will return lvs as-is
[2024-03-13 13:19:12,368][ceph_volume.process][INFO  ] Running command: lvs --noheadings --readonly --separator=";" -a --units=b --nosuffix -S tags={ceph.osd_id=2,ceph.osd_fsid=8331d767-af24-40da-bac0-ccbaf0fcda92} -o lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2024-03-13 13:19:12,428][ceph_volume.process][INFO  ] stdout ceph.block_device=/dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92,ceph.block_uuid=OUftbF-UGG7-RZfB-tgrn-2KtY-JJe4-5RT0jM,ceph.cephx_lockbox_secret=,ceph.cluster_fsid=594dd1f3-8f66-4a84-bb9b-ab7b6437e739,ceph.cluster_name=ceph,ceph.crush_device_class=,ceph.encrypted=0,ceph.osd_fsid=8331d767-af24-40da-bac0-ccbaf0fcda92,ceph.osd_id=2,ceph.osdspec_affinity=,ceph.type=block,ceph.vdo=0";"/dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92";"osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92";"ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099";"OUftbF-UGG7-RZfB-tgrn-2KtY-JJe4-5RT0jM";"2000364240896
[2024-03-13 13:19:12,428][ceph_volume.devices.lvm.activate][INFO  ] auto detecting objectstore
[2024-03-13 13:19:12,432][ceph_volume.devices.lvm.activate][DEBUG ] Found block device (osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92) with encryption: False
[2024-03-13 13:19:12,432][ceph_volume.devices.lvm.activate][DEBUG ] Found block device (osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92) with encryption: False
[2024-03-13 13:19:12,432][ceph_volume.process][INFO  ] Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
[2024-03-13 13:19:12,433][ceph_volume.process][INFO  ] Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92 --path /var/lib/ceph/osd/ceph-2 --no-mon-config
[2024-03-13 13:19:12,464][ceph_volume.process][INFO  ] stderr failed to read label for /dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92: (2) No such file or directory
2024-03-13T13:19:12.460+0100 7fb6a1a040 -1 bluestore(/dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92) _read_bdev_label failed to open /dev/ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099/osd-block-8331d767-af24-40da-bac0-ccbaf0fcda92: (2) No such file or directory
[2024-03-13 13:19:12,467][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 59, in newfunc
    return f(*a, **kw)
           ^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ceph_volume/main.py", line 153, in main
    terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python3/dist-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/main.py", line 46, in main
    terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python3/dist-packages/ceph_volume/terminal.py", line 194, in dispatch
    instance.main()
  File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
           ^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/trigger.py", line 70, in main
    Activate(['--auto-detect-objectstore', osd_id, osd_uuid]).main()
  File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/activate.py", line 281, in main
    self.activate(args)
  File "/usr/lib/python3/dist-packages/ceph_volume/decorators.py", line 16, in is_root
    return func(*a, **kw)
           ^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/activate.py", line 205, in activate
    return activate_bluestore(lvs, args.no_systemd)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/ceph_volume/devices/lvm/activate.py", line 112, in activate_bluestore
    process.run(prime_command)
  File "/usr/lib/python3/dist-packages/ceph_volume/process.py", line 147, in run
    raise RuntimeError(msg)
RuntimeError: command returned non-zero exit status: 1\

@wuast94
Copy link
Author

wuast94 commented Mar 13, 2024

more context:

lvs --version

  LVM version:     2.03.16(2) (2022-05-18)
  Library version: 1.02.185 (2022-05-18)
  Driver version:  4.48.0
  Configuration:   ./configure --build=aarch64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/aarch64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --libdir=/lib/aarch64-linux-gnu --sbindir=/sbin --with-usrlibdir=/usr/lib/aarch64-linux-gnu --with-optimisation=-O2 --with-cache=internal --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --with-default-pid-dir=/run --with-default-run-dir=/run/lvm --with-default-locking-dir=/run/lock/lvm --with-thin=internal --with-thin-check=/usr/sbin/thin_check --with-thin-dump=/usr/sbin/thin_dump --with-thin-repair=/usr/sbin/thin_repair --with-udev-prefix=/ --enable-applib --enable-blkid_wiping --enable-cmdlib --enable-dmeventd --enable-editline --enable-lvmlockd-dlm --enable-lvmlockd-sanlock --enable-lvmpolld --enable-notify-dbus --enable-pkgconfig --enable-udev_rules --enable-udev_sync --disable-readline

lsblk

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda           8:0    0   1.8T  0 disk 
nvme0n1     259:0    0 238.5G  0 disk

vgchange -ay

1 logical volume(s) in volume group "ceph-b9cc563f-5758-4ead-bbec-74c6aafb7099" now active

lsblk after vgchange -ay

NAME                                                                                                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda                                                                                                     8:0    0   1.8T  0 disk 
└─ceph--b9cc563f--5758--4ead--bbec--74c6aafb7099-osd--block--8331d767--af24--40da--bac0--ccbaf0fcda92 254:0    0   1.8T  0 lvm

/var/lib/ceph/osd/ceph-2 is empty

@wuast94
Copy link
Author

wuast94 commented Mar 15, 2024

Found a workaround:
After restart executing
vgchange -ay activates the Logical Volumes and then all the automations take over

IF the restart was longer ago and the automations run into problems running ceph-volume lvm activate --all afterwards brings the OSD back up again

adding
@reboot /usr/sbin/vgchange -ay >> /var/log/vgchange.log 2>&1
to my crontab fixes the issue for me

this is a workaround that fixes my specific error, but i think there is something off that also impacts hot plugging etc.

i hope all my information helps to get this fixed 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants