Skip to content

Commit

Permalink
osd: support create osd with metadata partition
Browse files Browse the repository at this point in the history
Currently, when rook provisions OSDs(in the OSD prepare job), rook effectively run a
c-v command such as the following.
```console
ceph-volume lvm batch --prepare <deviceA> <deviceB> <deviceC> --db-devices <metadataDevice>
```
but c-v lvm batch only supports disk and lvm, instead of disk partitions.

We can resort to `ceph-volume lvm prepare` to implement it.

Signed-off-by: Liang Zheng <zhengliang0901@gmail.com>
  • Loading branch information
microyahoo committed Jan 10, 2024
1 parent 1075674 commit efe5e71
Show file tree
Hide file tree
Showing 5 changed files with 391 additions and 43 deletions.
46 changes: 46 additions & 0 deletions .github/workflows/canary-integration-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -377,6 +377,52 @@ jobs:
with:
name: canary

osd-with-metadata-partition-device:
runs-on: ubuntu-20.04
if: "!contains(github.event.pull_request.labels.*.name, 'skip-ci')"
steps:
- name: checkout
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: consider debugging
uses: ./.github/workflows/tmate_debug
with:
use-tmate: ${{ secrets.USE_TMATE }}

- name: setup cluster resources
uses: ./.github/workflows/canary-test-config

- name: validate-yaml
run: tests/scripts/github-action-helper.sh validate_yaml

- name: use local disk as OSD metadata partition
run: |
BLOCK=$(sudo lsblk --paths|awk '/14G/ || /64G/ {print $1}'| head -1)
tests/scripts/github-action-helper.sh use_local_disk
tests/scripts/create-bluestore-partitions.sh --disk "$BLOCK" --osd-count 2
- name: deploy cluster
run: |
export ALLOW_LOOP_DEVICES=true
tests/scripts/github-action-helper.sh deploy_cluster loop_osd_with_metadata_partition_device
- name: wait for prepare pod
run: tests/scripts/github-action-helper.sh wait_for_prepare_pod 1

- name: wait for ceph to be ready
run: tests/scripts/github-action-helper.sh wait_for_ceph_to_be_ready osd 1

- name: check-ownerreferences
run: tests/scripts/github-action-helper.sh check_ownerreferences

- name: collect common logs
if: always()
uses: ./.github/workflows/collect-logs
with:
name: canary

osd-with-metadata-device:
runs-on: ubuntu-20.04
if: "!contains(github.event.pull_request.labels.*.name, 'skip-ci')"
Expand Down
2 changes: 1 addition & 1 deletion Documentation/CRDs/Cluster/ceph-cluster-crd.md
Original file line number Diff line number Diff line change
Expand Up @@ -478,7 +478,7 @@ See the table in [OSD Configuration Settings](#osd-configuration-settings) to kn

The following storage selection settings are specific to Ceph and do not apply to other backends. All variables are key-value pairs represented as strings.

* `metadataDevice`: Name of a device or lvm to use for the metadata of OSDs on each node. Performance can be improved by using a low latency device (such as SSD or NVMe) as the metadata device, while other spinning platter (HDD) devices on a node are used to store data. Provisioning will fail if the user specifies a `metadataDevice` but that device is not used as a metadata device by Ceph. Notably, `ceph-volume` will not use a device of the same device class (HDD, SSD, NVMe) as OSD devices for metadata, resulting in this failure.
* `metadataDevice`: Name of a device, partition or lvm to use for the metadata of OSDs on each node. Performance can be improved by using a low latency device (such as SSD or NVMe) as the metadata device, while other spinning platter (HDD) devices on a node are used to store data. Provisioning will fail if the user specifies a `metadataDevice` but that device is not used as a metadata device by Ceph. Notably, `ceph-volume` will not use a device of the same device class (HDD, SSD, NVMe) as OSD devices for metadata, resulting in this failure.
* `databaseSizeMB`: The size in MB of a bluestore database. Include quotes around the size.
* `walSizeMB`: The size in MB of a bluestore write ahead log (WAL). Include quotes around the size.
* `deviceClass`: The [CRUSH device class](https://ceph.io/community/new-luminous-crush-device-classes/) to use for this selection of storage devices. (By default, if a device's class has not already been set, OSDs will automatically set a device's class to either `hdd`, `ssd`, or `nvme` based on the hardware properties exposed by the Linux kernel.) These storage classes can then be used to select the devices backing a storage pool by specifying them as the value of [the pool spec's `deviceClass` field](../Block-Storage/ceph-block-pool-crd.md#spec).
Expand Down
122 changes: 83 additions & 39 deletions pkg/daemon/ceph/osd/volume.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@ const (
dbDeviceFlag = "--db-devices"
cephVolumeCmd = "ceph-volume"
cephVolumeMinDBSize = 1024 // 1GB

blockDBFlag = "--block.db"
blockDBSizeFlag = "--block.db-size"
dataFlag = "--data"
)

// These are not constants because they are used by the tests
Expand Down Expand Up @@ -665,6 +669,9 @@ func (a *OsdAgent) initializeDevicesLVMMode(context *clusterd.Context, devices *
}
metadataDevices[md]["devices"] = deviceArg
}
if metadataDevice.Type == sys.PartType {
metadataDevices[md]["part"] = "true" // ceph-volume lvm batch only supports disk and lvm
}
deviceDBSizeMB := getDatabaseSize(a.storeConfig.DatabaseSizeMB, device.Config.DatabaseSizeMB)
if a.storeConfig.IsValidStoreType() && deviceDBSizeMB > 0 {
if deviceDBSizeMB < cephVolumeMinDBSize {
Expand Down Expand Up @@ -721,71 +728,108 @@ func (a *OsdAgent) initializeDevicesLVMMode(context *clusterd.Context, devices *

for md, conf := range metadataDevices {

var hasPart bool
mdArgs := batchArgs
if _, ok := conf["osdsperdevice"]; ok {
mdArgs = append(mdArgs, []string{
osdsPerDeviceFlag,
conf["osdsperdevice"],
}...)
if part, ok := conf["part"]; ok && part == "true" {
hasPart = true
}
if hasPart {
// ceph-volume lvm prepare --data {vg/lv} --block.wal {partition} --block.db {/path/to/device}
baseArgs := []string{"-oL", cephVolumeCmd, "--log-path", logPath, "lvm", "prepare", storeFlag}
if a.storeConfig.EncryptedDevice {
baseArgs = append(baseArgs, encryptedFlag)
}
mdArgs = baseArgs
} else {
if _, ok := conf["osdsperdevice"]; ok {
mdArgs = append(mdArgs, []string{
osdsPerDeviceFlag,
conf["osdsperdevice"],
}...)
}
}
if _, ok := conf["deviceclass"]; ok {
mdArgs = append(mdArgs, []string{
crushDeviceClassFlag,
conf["deviceclass"],
}...)
}
if _, ok := conf["databasesizemb"]; ok {
if hasPart {
devices := strings.Split(conf["devices"], " ")
if len(devices) > 1 {
logger.Warningf("device partition %s can only be used by one device", md)
}
mdArgs = append(mdArgs, []string{
databaseSizeFlag,
conf["databasesizemb"],
dataFlag,
devices[0],
}...)
if _, ok := conf["databasesizemb"]; ok {
mdArgs = append(mdArgs, []string{
blockDBSizeFlag,
conf["databasesizemb"],
}...)
}
} else {
if _, ok := conf["databasesizemb"]; ok {
mdArgs = append(mdArgs, []string{
databaseSizeFlag,
conf["databasesizemb"],
}...)
}
mdArgs = append(mdArgs, strings.Split(conf["devices"], " ")...)
}
mdArgs = append(mdArgs, strings.Split(conf["devices"], " ")...)

// Do not change device names if udev persistent names are passed
mdPath := md
if !strings.HasPrefix(mdPath, "/dev") {
mdPath = path.Join("/dev", md)
}

mdArgs = append(mdArgs, []string{
dbDeviceFlag,
mdPath,
}...)
if hasPart {
mdArgs = append(mdArgs, []string{
blockDBFlag,
mdPath,
}...)
} else {
mdArgs = append(mdArgs, []string{
dbDeviceFlag,
mdPath,
}...)

// Reporting
reportArgs := append(mdArgs, []string{
"--report",
}...)
// Reporting
reportArgs := append(mdArgs, []string{
"--report",
}...)

if err := context.Executor.ExecuteCommand(baseCommand, reportArgs...); err != nil {
return errors.Wrap(err, "failed ceph-volume report") // fail return here as validation provided by ceph-volume
}
if err := context.Executor.ExecuteCommand(baseCommand, reportArgs...); err != nil {
return errors.Wrap(err, "failed ceph-volume report") // fail return here as validation provided by ceph-volume
}

reportArgs = append(reportArgs, []string{
"--format",
"json",
}...)
reportArgs = append(reportArgs, []string{
"--format",
"json",
}...)

cvOut, err := context.Executor.ExecuteCommandWithOutput(baseCommand, reportArgs...)
if err != nil {
return errors.Wrapf(err, "failed ceph-volume json report: %s", cvOut) // fail return here as validation provided by ceph-volume
}
cvOut, err := context.Executor.ExecuteCommandWithOutput(baseCommand, reportArgs...)
if err != nil {
return errors.Wrapf(err, "failed ceph-volume json report: %s", cvOut) // fail return here as validation provided by ceph-volume
}

logger.Debugf("ceph-volume reports: %+v", cvOut)
logger.Debugf("ceph-volume reports: %+v", cvOut)

var cvReports []cephVolReportV2
if err = json.Unmarshal([]byte(cvOut), &cvReports); err != nil {
return errors.Wrap(err, "failed to unmarshal ceph-volume report json")
}
var cvReports []cephVolReportV2
if err = json.Unmarshal([]byte(cvOut), &cvReports); err != nil {
return errors.Wrap(err, "failed to unmarshal ceph-volume report json")
}

if len(strings.Split(conf["devices"], " ")) != len(cvReports) {
return errors.Errorf("failed to create enough required devices, required: %s, actual: %v", cvOut, cvReports)
}
if len(strings.Split(conf["devices"], " ")) != len(cvReports) {
return errors.Errorf("failed to create enough required devices, required: %s, actual: %v", cvOut, cvReports)
}

for _, report := range cvReports {
if report.BlockDB != mdPath && !strings.HasSuffix(mdPath, report.BlockDB) {
return errors.Errorf("wrong db device for %s, required: %s, actual: %s", report.Data, mdPath, report.BlockDB)
for _, report := range cvReports {
if report.BlockDB != mdPath && !strings.HasSuffix(mdPath, report.BlockDB) {
return errors.Errorf("wrong db device for %s, required: %s, actual: %s", report.Data, mdPath, report.BlockDB)
}
}
}

Expand Down

0 comments on commit efe5e71

Please sign in to comment.