Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph-volume lvm batch is not creating OSDs on partitions in latest Nautilus v14.2.13 and Octopus v15.2.8 #6849

Closed
travisn opened this issue Dec 17, 2020 · 15 comments
Labels
bug ceph main ceph tag

Comments

@travisn
Copy link
Member

travisn commented Dec 17, 2020

Is this a bug report or feature request?

  • Bug Report

Deviation from expected behavior:
In the latest Nautilus v14.2.15 and Octopus 15.2.8, ceph-volume lvm batch is not allowing an OSD to be created on a raw partition.

In the integration tests we are seeing this with the following in the osd prepare job:

2020-12-17 06:43:11.875525 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --osds-per-device 1 /dev/sda1 --report
2020-12-17 06:43:12.515146 D | exec: usage: ceph-volume lvm batch [-h] [--db-devices [DB_DEVICES [DB_DEVICES ...]]]
2020-12-17 06:43:12.515335 D | exec:                              [--wal-devices [WAL_DEVICES [WAL_DEVICES ...]]]
2020-12-17 06:43:12.515413 D | exec:                              [--journal-devices [JOURNAL_DEVICES [JOURNAL_DEVICES ...]]]
2020-12-17 06:43:12.515480 D | exec:                              [--auto] [--no-auto] [--bluestore] [--filestore]
2020-12-17 06:43:12.515619 D | exec:                              [--report] [--yes]
2020-12-17 06:43:12.515976 D | exec:                              [--format {json,json-pretty,pretty}] [--dmcrypt]
2020-12-17 06:43:12.516441 D | exec:                              [--crush-device-class CRUSH_DEVICE_CLASS]
2020-12-17 06:43:12.521079 D | exec:                              [--no-systemd]
2020-12-17 06:43:12.521237 D | exec:                              [--osds-per-device OSDS_PER_DEVICE]
2020-12-17 06:43:12.521308 D | exec:                              [--data-slots DATA_SLOTS]
2020-12-17 06:43:12.521394 D | exec:                              [--block-db-size BLOCK_DB_SIZE]
2020-12-17 06:43:12.521460 D | exec:                              [--block-db-slots BLOCK_DB_SLOTS]
2020-12-17 06:43:12.521524 D | exec:                              [--block-wal-size BLOCK_WAL_SIZE]
2020-12-17 06:43:12.521575 D | exec:                              [--block-wal-slots BLOCK_WAL_SLOTS]
2020-12-17 06:43:12.521656 D | exec:                              [--journal-size JOURNAL_SIZE]
2020-12-17 06:43:12.521738 D | exec:                              [--journal-slots JOURNAL_SLOTS] [--prepare]
2020-12-17 06:43:12.521813 D | exec:                              [--osd-ids [OSD_IDS [OSD_IDS ...]]]
2020-12-17 06:43:12.521880 D | exec:                              [DEVICES [DEVICES ...]]
2020-12-17 06:43:12.521930 D | exec: ceph-volume lvm batch: error: /dev/sda1 is a partition, please pass LVs or raw block devices

Expected behavior:
Raw partitions have been working and expected to continue working.

How to reproduce it (minimal and precise):

Attempt to create an OSD on a partition with v15.2.8.

@travisn travisn added bug ceph main ceph tag labels Dec 17, 2020
@travisn travisn changed the title ceph-volume lvm batch is not creating OSDs on partitions in latest Nautilus v14.2.15 and Octopus v15.2.8 ceph-volume lvm batch is not creating OSDs on partitions in latest Nautilus v14.2.13 and Octopus v15.2.8 Dec 18, 2020
@travisn
Copy link
Member Author

travisn commented Dec 18, 2020

Looks related to this change in c-v: ceph/ceph#38280

@pavanfhw
Copy link

Runned into this issue. What should I do for now?
Do I need to change the Ceph version in the cluster deployment? What is the latest workign image?

@travisn
Copy link
Member Author

travisn commented Dec 18, 2020

If you need to create OSDs on partitions, you'll need to use Ceph v14.2.12 or v15.2.7 while we are following up on the issue.

@leseb
Copy link
Member

leseb commented Jan 5, 2021

Not sure what is the right fix for this in 1.5, #4879 is appealing but has its own limitations...

@haslersn
Copy link
Contributor

I'm also affected by this problem. Is there a workaround?

@valkmit
Copy link

valkmit commented Feb 14, 2021

I am also looking for a potential fix for this!

@leseb
Copy link
Member

leseb commented Feb 15, 2021

@hasheddan @valkmit what about this? #6849 (comment) is it not feasible for you?

@haslersn
Copy link
Contributor

@leseb with v15.2.7 I get a different error. Full log. Tail of the log:

2021-02-15 19:07:33.856933 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --dmcrypt --osds-per-device 1 /dev/sdd /dev/sda /dev/sdb /dev/sdc --db-devices /dev/disk/by-path/pci-0000:11:00.0-nvme-1-part3 --report
2021-02-15 19:07:35.856836 D | exec: Traceback (most recent call last):
2021-02-15 19:07:35.856884 D | exec:   File "/usr/sbin/ceph-volume", line 11, in <module>
2021-02-15 19:07:35.856889 D | exec:     load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
2021-02-15 19:07:35.856894 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 40, in __init__
2021-02-15 19:07:35.856898 D | exec:     self.main(self.argv)
2021-02-15 19:07:35.856903 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
2021-02-15 19:07:35.856907 D | exec:     return f(*a, **kw)
2021-02-15 19:07:35.856911 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 151, in main
2021-02-15 19:07:35.856915 D | exec:     terminal.dispatch(self.mapper, subcommand_args)
2021-02-15 19:07:35.856919 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2021-02-15 19:07:35.856923 D | exec:     instance.main()
2021-02-15 19:07:35.856927 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 42, in main
2021-02-15 19:07:35.856931 D | exec:     terminal.dispatch(self.mapper, self.argv)
2021-02-15 19:07:35.856934 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2021-02-15 19:07:35.856938 D | exec:     instance.main()
2021-02-15 19:07:35.856941 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
2021-02-15 19:07:35.856945 D | exec:     return func(*a, **kw)
2021-02-15 19:07:35.856949 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 322, in main
2021-02-15 19:07:35.856952 D | exec:     self._get_explicit_strategy()
2021-02-15 19:07:35.856956 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 332, in _get_explicit_strategy
2021-02-15 19:07:35.856960 D | exec:     self._filter_devices()
2021-02-15 19:07:35.856963 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 385, in _filter_devices
2021-02-15 19:07:35.856967 D | exec:     raise RuntimeError(err.format(len(devs) - len(usable)))
2021-02-15 19:07:35.856973 D | exec: RuntimeError: 1 devices were filtered in non-interactive mode, bailing out
failed to configure devices: failed to initialize devices: failed ceph-volume report: exit status 1

@leseb
Copy link
Member

leseb commented Feb 16, 2021

@leseb with v15.2.7 I get a different error. Full log. Tail of the log:

2021-02-15 19:07:33.856933 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm batch --prepare --bluestore --yes --dmcrypt --osds-per-device 1 /dev/sdd /dev/sda /dev/sdb /dev/sdc --db-devices /dev/disk/by-path/pci-0000:11:00.0-nvme-1-part3 --report
2021-02-15 19:07:35.856836 D | exec: Traceback (most recent call last):
2021-02-15 19:07:35.856884 D | exec:   File "/usr/sbin/ceph-volume", line 11, in <module>
2021-02-15 19:07:35.856889 D | exec:     load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
2021-02-15 19:07:35.856894 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 40, in __init__
2021-02-15 19:07:35.856898 D | exec:     self.main(self.argv)
2021-02-15 19:07:35.856903 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc
2021-02-15 19:07:35.856907 D | exec:     return f(*a, **kw)
2021-02-15 19:07:35.856911 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 151, in main
2021-02-15 19:07:35.856915 D | exec:     terminal.dispatch(self.mapper, subcommand_args)
2021-02-15 19:07:35.856919 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2021-02-15 19:07:35.856923 D | exec:     instance.main()
2021-02-15 19:07:35.856927 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/main.py", line 42, in main
2021-02-15 19:07:35.856931 D | exec:     terminal.dispatch(self.mapper, self.argv)
2021-02-15 19:07:35.856934 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch
2021-02-15 19:07:35.856938 D | exec:     instance.main()
2021-02-15 19:07:35.856941 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root
2021-02-15 19:07:35.856945 D | exec:     return func(*a, **kw)
2021-02-15 19:07:35.856949 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 322, in main
2021-02-15 19:07:35.856952 D | exec:     self._get_explicit_strategy()
2021-02-15 19:07:35.856956 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 332, in _get_explicit_strategy
2021-02-15 19:07:35.856960 D | exec:     self._filter_devices()
2021-02-15 19:07:35.856963 D | exec:   File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/batch.py", line 385, in _filter_devices
2021-02-15 19:07:35.856967 D | exec:     raise RuntimeError(err.format(len(devs) - len(usable)))
2021-02-15 19:07:35.856973 D | exec: RuntimeError: 1 devices were filtered in non-interactive mode, bailing out
failed to configure devices: failed to initialize devices: failed ceph-volume report: exit status 1

I don't know what's going on, could please open a bug here? https://tracker.ceph.com/ under the ceph-volume component?
Thanks

@haslersn
Copy link
Contributor

@leseb Thanks, I will do so after my registration is approved.

Could it be that it's not allowed to have metadata pools on partitions? See https://tracker.ceph.com/issues/47966#note-3

@leseb
Copy link
Member

leseb commented Feb 17, 2021

@leseb Thanks, I will do so after my registration is approved.

Could it be that it's not allowed to have metadata pools on partitions? See https://tracker.ceph.com/issues/47966#note-3

Could be but I'm not sure, c-v's partition support is still confusing.

@DjangoCalendar
Copy link

@travisn does this issue affects upgrades as well.

Let's say I am using:

  • ceph:v15.2.7
  • rook/ceph:v1.5.3

with this setup I am running on raw partitions.

Will I be able to upgrade to version for example:

  • ceph:v15.2.8
  • rook/ceph:v1.5.7

Asking in shorter way:
Does this is issue affects only fresh deployment or upgrades as well ?

@cbartz
Copy link

cbartz commented Feb 18, 2021

@DjangoCalendar You can upgrade and use your existing OSD's. But if you want to setup a fresh OSD it wont work on partitions.

@mathieu-rossignol
Copy link

mathieu-rossignol commented Mar 2, 2021

Hi,

I have rook-ceph-v1.5.8 and I'm getting this issue if I use anything else than ceph/ceph:v15.2.7 image in my CephCluster definition.
At least with these versions of ceph I'm facing this issue:

  • v15
  • v15.2.8
  • v15.2.9

I seems there is a regression after 15.2.7 :(. But in fact isn't it a pure ceph issue ? To summarize for others coming here: use ceph/ceph:v15.2.7 for the CephCluster image:

image

@leseb leseb pinned this issue Mar 3, 2021
@leseb
Copy link
Member

leseb commented Mar 3, 2021

Let me summarize (hopefully) once and for all.

Problem: ceph-volume lvm is not creating OSDs on partitions anymore as of Nautilus v14.2.13 and Octopus v15.2.8 and onward. The error can be seen from the osd prepare job logs and resemble:

2021-02-25 09:01:42.471002 D | exec: ceph-volume lvm batch: error: /dev/sda1 is a partition, please pass LVs or raw block devices

Solutions:

  1. Use full devices (e,g: /dev/sda) or Logical Volumes on top of your partition, Rook could happily propose LVs to ceph-volume but passing dm- does not seem to work on 15.2.9, see: https://tracker.ceph.com/issues/49582
  2. For the time being, stick with v14.2.12 or v15.2.7, and once Rook 1.6 is released, upgrade to at least v14.2.14 or v15.2.9 to get partitions working again. As of 1.6, Rook OSD's implementation for simple scenarios (one OSD = one disk basically) is not using LVM but RAW mode from ceph-volume. See ceph: add raw mode for non-pvc osd #4879 for more details.

I'm converting to a Discussion so people can continue discussing but highlight the latest solution better.

@leseb leseb closed this as completed Mar 3, 2021
@rook rook locked and limited conversation to collaborators Mar 3, 2021
@rook rook unlocked this conversation Sep 15, 2021
@leseb leseb unpinned this issue Jan 7, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
bug ceph main ceph tag
Projects
None yet
Development

No branches or pull requests

8 participants