Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph: support device selection for OSDs by device path name patterns #4285

Merged
merged 2 commits into from Dec 5, 2019

Conversation

@morimoto-cybozu
Copy link
Contributor

morimoto-cybozu commented Nov 8, 2019

Description of your changes:

Add devicePathFilter to Storage Selection Settings which enables device selection
with a regular expression for device path names.
For example, devicePathFilter: "^/dev/disk/by-path/pci.*-sas-.*" selects all SAS
devices.

This PR contains a small fix for using ceph-volume.
When using devicePathFilter, I found some device caused an error in ceph-volume.
This is because ceph-volume cannot handle a device correctly if the device has an alias
under /dev/mapper/.
This PR fixes it by checking whether the device has that kind of alias
and using the alias as an argument for ceph-volume.

Which issue is resolved by this Pull Request:
Resolves #4275

Checklist:

  • Reviewed the developer guide on Submitting a Pull Request
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.
  • Comments have been added or updated based on the standards set in CONTRIBUTING.md
  • Add the flag for skipping the CI if this PR does not require a build. See here for more details.
for _, devlink := range strings.Fields(device.DevLinks) {
matched, err = regexp.Match(desiredDevice.Name, []byte(devlink))
if err != nil {
logger.Errorf("regex failed on device %s and filter %s. %+v", device.Name, desiredDevice.Name, err)

This comment has been minimized.

Copy link
@leseb

leseb Nov 8, 2019

Member
Suggested change
logger.Errorf("regex failed on device %s and filter %s. %+v", device.Name, desiredDevice.Name, err)
logger.Errorf("regex failed on device %q and filter %q. %+v", device.Name, desiredDevice.Name, err)
logger.Errorf("regex failed on device %s and filter %s. %+v", device.Name, desiredDevice.Name, err)
continue
}

This comment has been minimized.

Copy link
@leseb

leseb Nov 8, 2019

Member

unnecessary space

break
}
}
logger.Infof("device %s matches device filter %s: %t", device.Name, desiredDevice.Name, matched)

This comment has been minimized.

Copy link
@leseb

leseb Nov 8, 2019

Member
Suggested change
logger.Infof("device %s matches device filter %s: %t", device.Name, desiredDevice.Name, matched)
logger.Infof("device %q matches device filter %q", device.Name, desiredDevice.Name)

I don't think we need to show matched, or else it should be a degug log.

@@ -237,6 +237,12 @@ func (a *OsdAgent) initializeDevices(context *clusterd.Context, devices *DeviceO
if device.Data == -1 {
logger.Infof("configuring new device %s", name)
deviceArg := path.Join("/dev", name)
// ceph-volume prefers to use /dev/mapper/<name> if the device has

This comment has been minimized.

Copy link
@leseb

leseb Nov 8, 2019

Member

if the device has? has what?

@leseb leseb added this to In progress in v1.2 via automation Nov 8, 2019
@leseb leseb added this to the 1.2 milestone Nov 8, 2019
@morimoto-cybozu morimoto-cybozu force-pushed the cybozu-go:use-mapper-name branch from 75df377 to 48bbd4d Nov 11, 2019
@morimoto-cybozu

This comment has been minimized.

Copy link
Contributor Author

morimoto-cybozu commented Nov 11, 2019

@leseb
Thank you, I reflected your comments.

Copy link
Member

travisn left a comment

Could you confirm the steps to try this out in a simple cluster such as minikube? What environments were you able to test it in?

@@ -211,12 +211,14 @@ This feature is only available when `useAllNodes` has been set to `false`.
Below are the settings available, both at the cluster and individual node level, for selecting which storage resources will be included in the cluster.

* `useAllDevices`: `true` or `false`, indicating whether all devices found on nodes in the cluster should be automatically consumed by OSDs. **Not recommended** unless you have a very controlled environment where you will not risk formatting of devices with existing data. When `true`, all devices will be used except those with partitions created or a local filesystem. Is overridden by `deviceFilter` if specified.
* `deviceFilter`: A regular expression that allows selection of devices to be consumed by OSDs. If individual devices have been specified for a node then this filter will be ignored. This field uses [golang regular expression syntax](https://golang.org/pkg/regexp/syntax/). For example:
* `deviceFilter`: A regular expression for short kernel names (e.g. `/dev/sda`) that allows selection of devices to be consumed by OSDs. If individual devices have been specified for a node then this filter will be ignored. This field uses [golang regular expression syntax](https://golang.org/pkg/regexp/syntax/). For example:

This comment has been minimized.

Copy link
@travisn

travisn Nov 11, 2019

Member

The /dev prefix actually isn't used for the device filter. If it is mentioned in the example, someone will probably think they need to specify it. But the filter won't work if you add the /dev prefix since the operator always prepends it as well

}
}

if len(dataDevices) == 0 {

This comment has been minimized.

Copy link
@travisn

travisn Nov 11, 2019

Member

shouldn't this just be in an else block? it seems better for code readability

} else if device.Name == desiredDevice.Name {
logger.Infof("%s found in the desired devices", device.Name)
logger.Debugf("%q found in the desired devices", device.Name)

This comment has been minimized.

Copy link
@travisn

travisn Nov 11, 2019

Member

how about keep this as Infof?

This comment has been minimized.

Copy link
@travisn

travisn Dec 5, 2019

Member

In the osd prepare pod we should log almost everything as Info. It's only a temporary job so the logs aren't going to be very long anyway. It's very helpful for troubleshooting if we include all these details.

continue
}
if matched {
break

This comment has been minimized.

Copy link
@travisn

travisn Nov 11, 2019

Member

I'd suggest an info log here to indicate the device matched the filter

This comment has been minimized.

Copy link
@morimoto-cybozu

morimoto-cybozu Nov 12, 2019

Author Contributor

As commented below, I will log the final result of filtering in the following part of the code.
It would be useful to log the alias and the filter for debugging here, but the current logging
style of "outputting log even if not matched" is somewhat noisy for the inner loop.
So I will record device aliases in one line after this loop.

continue
}
logger.Infof("device %s matches device filter %s: %t", device.Name, desiredDevice.Name, matched)
logger.Debugf("device %q matches device filter %q: %t", device.Name, desiredDevice.Name, matched)

This comment has been minimized.

Copy link
@travisn

travisn Nov 11, 2019

Member

keep this as Infof?

This comment has been minimized.

Copy link
@morimoto-cybozu

morimoto-cybozu Nov 12, 2019

Author Contributor

This log is recorded even if the filter is not matched, per device and per filter.
So we will see many log lines like device sda matches device filter ^sd[bc]: false.
These logs give the information of "which filter caused the device to be selected,"
but I think this type of information is for debugging.

It would be informative to log the final result of filtering, so I will add logging of
"device ... is selected" at the INFO level after processing all filters.

This comment has been minimized.

Copy link
@BlaineEXE

BlaineEXE Nov 13, 2019

Member

I would tend to Info logging here. Contextually, this log appears during OSD provisioning and not in the main operator, so I like to consider what information is likely to be helpful in 80% of cases where users are looking at this log. I imagine that many users might want to validate their selection of device filter in the logs, and debug logging is not on by default. I also don't think this will add overly much logging as Info that it will be a detriment to the user.

Maybe it would be good to output both "matched" and "non-matched" lists after this loop.

This comment has been minimized.

Copy link
@morimoto-cybozu

morimoto-cybozu Nov 15, 2019

Author Contributor

Aha, surely, it's important to provide helpful logs to most users.
I got extra help from these types of logs indeed when I was less experienced in Rook.

I will rearrange the logs as follows.

  • keep logging of matching steps as Infof
  • enrich the log of device %q is selected, which I add in this PR, with the information of the related filter
  • output the summary of matched and non-matched devices after the loop
@@ -237,6 +237,12 @@ func (a *OsdAgent) initializeDevices(context *clusterd.Context, devices *DeviceO
if device.Data == -1 {
logger.Infof("configuring new device %s", name)
deviceArg := path.Join("/dev", name)
// ceph-volume prefers to use /dev/mapper/<name> if the device has this kind of alias
for _, devlink := range device.PersistentDevicePaths {
if strings.HasPrefix(devlink, "/dev/mapper") {

This comment has been minimized.

Copy link
@travisn

travisn Nov 11, 2019

Member

We require the /dev/mapper prefix wherever the devlink is applied? Otherwise it will be ignored? If so, why are the doc examples using other paths rather than /dev/mapper?

This comment has been minimized.

Copy link
@morimoto-cybozu

morimoto-cybozu Nov 12, 2019

Author Contributor

Using the /dev/mapper-prefixed argument is necessary in certain cases,
regardless of whether or not to use devLinksFilter: "/dev/mapper/...".

ceph-volume accepts only a device name under /dev/mapper if the device has
that type of alias.
(See disk.py.)

On the other hand, the current version of Rook calls ceph-volume lvm batch with /dev/<KNAME>.
This causes an error with the device mapper.
For example, if dm-0 has an alias like /dev/mapper/crypt-sda, Rook invokes ceph-volume
with an argument of /dev/dm-0, but ceph-volume returns an error.

This will happen even when using a device filter like deviceFilter: "dm-.*".

So, in this PR, I have arranged the codes to pass /dev/mapper name to ceph-volume if necessary.

// ceph-volume prefers to use /dev/mapper/<name> if the device has this kind of alias
for _, devlink := range device.PersistentDevicePaths {
if strings.HasPrefix(devlink, "/dev/mapper") {
deviceArg = devlink

This comment has been minimized.

Copy link
@travisn

travisn Nov 11, 2019

Member

Could there be multiple devlinks that match? Should we be appending to a list instead of overwriting the var?

This comment has been minimized.

Copy link
@morimoto-cybozu

morimoto-cybozu Nov 12, 2019

Author Contributor

In my Ubuntu 18.04, only one link will be created with the name of the contents of /sys/block/<KNAME>/dm/name.
This rule is stored in /lib/udev/rules.d/55-dm.rules and it is provided by Red Hat.
I think this is a general rule file.

v1.2 automation moved this from In progress to Review in progress Nov 11, 2019
@satoru-takeuchi satoru-takeuchi mentioned this pull request Nov 12, 2019
4 of 4 tasks complete
@morimoto-cybozu

This comment has been minimized.

Copy link
Contributor Author

morimoto-cybozu commented Nov 12, 2019

@travisn
Thank you for your comments.

I construct testing k8s environment on KVMs by using my team's OSS tools (placemat and neco).
Each KVM runs CoreOS and provides /dev/vd[abcd].
The storages are encrypted as /dev/mapper/crypt-vd[abcd].
I specified devLinksFilter: "^/dev/disk/by-id/dm-name-crypt-vd[cd]$" for testing.

I will try to construct testing environment with minikube or kind, and share the testing steps here.

@morimoto-cybozu

This comment has been minimized.

Copy link
Contributor Author

morimoto-cybozu commented Nov 12, 2019

Testing steps are as follows.

  1. Build container images and install k8s with kubeadm, as described in tests/README.md
  2. Setup a loop device by sudo truncate -s 10G /tmp/rook-test-back; sudo losetup --show -f /tmp/rook-test-back
  3. If /dev/loop0 is assigned, setup a crypted device by sudo cryptsetup open --type plain /dev/loop0 rook-test-loop0
  4. Apply common.yaml, operator.yaml, and cluster.yaml
  • common.yaml
    definition of "devLinksFilter" is added to "cluster/examples/kubernetes/ceph/common.yaml"
  • operator.yaml
    image URL is updated from "cluster/examples/kubernetes/ceph/operator.yaml"
  • cluster.yaml
    devLinksFilter: "^/dev/disk/by-id/dm-name-rook-test-.*$" is specified
  1. Confirm logs and OSDs
  2. Clean-up: delete k8s objects and execute sudo cryptsetup close rook-test-loop0; sudo losetup -d /dev/loop0; sudo rm /tmp/rook-test-back; if cryptsetup fails, do lsblk and sudo dmsetup remove /dev/<VG>/<LV>, where <VG> and <LV> are configured by Ceph
@morimoto-cybozu morimoto-cybozu force-pushed the cybozu-go:use-mapper-name branch from 48bbd4d to 14bd36f Nov 12, 2019
@morimoto-cybozu

This comment has been minimized.

Copy link
Contributor Author

morimoto-cybozu commented Nov 12, 2019

@travisn
Updated. Please take a look.

@morimoto-cybozu morimoto-cybozu force-pushed the cybozu-go:use-mapper-name branch from 14bd36f to c734d91 Nov 13, 2019
@@ -135,6 +135,7 @@ spec:
useAllDevices:
type: boolean
deviceFilter: {}
devLinksFilter: {}

This comment has been minimized.

Copy link
@BlaineEXE

BlaineEXE Nov 13, 2019

Member

Would it be safer if this were specified as string type?

@@ -52,7 +53,8 @@ storage:
directories:
- path: "/rook/dir1"
- name: "node2"
deviceFilter: "^foo*"`)
deviceFilter: "^foo*"
devLinksFilter: "^/dev/disk/by-id/.*foo.*"`)

This comment has been minimized.

Copy link
@BlaineEXE

BlaineEXE Nov 13, 2019

Member

Is there a technical reason why the links filter needs to be separate from device filter? From a user perspective, I think I would prefer to use deviceFilter for any of sd[b-z], /dev/sd[b-z], or /dev/disk/by-id/.* strings.

This comment has been minimized.

Copy link
@morimoto-cybozu

morimoto-cybozu Nov 15, 2019

Author Contributor

I agree that extending deviceFilter is preferable to introducing a new filter.
I added a new filter to avoid incompatibility.
If deviceFilter is extended to be compared with long device names while a user has specified deviceFilter: sd[b-z] without caring about long names, the extended filter can mistakenly match long names containing sde substring, e.g. /dev/disk/by-id/dm-name-misdefined-lv.

#2845 applies the strategy of "if a path specification begins with /dev, compare it to the full paths of devices."
I once examined this strategy and thought that it was not applicable to device filters.
A device filter is given as a regular expression, so it may not begin with /dev nor ^/dev even when it is expected to match a full path.

Now I've got an idea of "if a device filter contains /, compare it to full paths."

Do you have any good solutions?

This comment has been minimized.

Copy link
@BlaineEXE

BlaineEXE Nov 15, 2019

Member

Those are good and well-defined thoughts @morimoto-cybozu. Thank you. My main goal is to choose whatever will be less confusing for users (while still considering development and maintenance effort).

With deviceFilter we support filtering disks where we assume the /dev path is present in front. e.g., sd[b-z] matches /dev/sd[b-z]. This is pretty basic and not really sufficient for the real world.

What we want from our filtering is to be able to additionally support matching disks in /dev/disk/.... Do we also want to support matching by /dev/mapper?

Is it safe to make the assumption that disks will always be in /dev?


Now I've got an idea of "if a device filter contains /, compare it to full paths.

I wonder what it looks like to use this from a user perspective. I think there are a few cases.
Cases:

  1. Old/existing way (matching 'basic' disks)
    • sd[b-z]should match /dev/sdb through /dev/sdz
    • sg[a-z][a-z]? should match /dev/sga through /dev/sgzz
  2. New way, but users want to match basic disks
    • /dev/sd[b-z] (match same as above)
    • /dev/sg[a-z][a-z]? (match same as above)
  3. New way, with non-basic disks
    • /dev/disk/by-id/.*foo.* should match all disks that have id containing 'foo'
  4. Potentially aberrant case of 3 assuming /dev is prepended; a user might assume that /dev is optional for disk matching based on the behavior defined in 1 above
    • disk/by-id/.*foo.* should match all disks that have id containing 'foo' still
  5. New way, matching mapper disks
    • /dev/mapper/.*foo.* should match all mapper disk with 'foo'
  6. Potentially aberrant case of 5 assuming /dev is prepended
    • mapper/.*foo.* should match all mapper disks with 'foo'
  7. Try to break the logic
    • disk should not match anything
    • mapper should not match anything
    • /dev/disk should not match anything????
    • /dev/mapper should not match anything????

I also want to consider what exists currently as actual behavior, not ideal behavior. Currently, by looking in the code, devices are listed with lsblk -all, and then they are selected for use from among this list by a pretty simple regexp.Match. On my local system, this reports disks only as sd[a-z]. I'm not sure if this would report sg disks at all even, but I can imagine hoping to match disks by /dev/sg somehow. So the current filter really is just a matcher do disks from lsblk's output.

This also brings to my mind, for matching disk by path (I feel like we are sort of considering matching disks by path here?), we need to also make sure they are disks and not partitions or other unsupported types. I'm not sure how to validate that a given is a disk off the top of my head.

This becomes easier if we know we are only looking at disks in /dev/disk/.... If we have to support any path under /dev/..., things start getting harder I think.


Sorry my thoughts are kind of scattered. I'm hoping to tease out both ideal behavior as well as ideal user interface here. Personally, I think using / as a marker for path matching might work well, but there could be breaking examples I didn't think of. We do need to still make sure we are matching the paths matched back to disks which exist and are actual disk types (e.g., not part, crypt, etc.) so that we aren't matching things Rook can't support. We can probably cross-reference the lsblk output for that. I may be repeating a lot of information you've already gotten down.

This comment has been minimized.

Copy link
@morimoto-cybozu

morimoto-cybozu Nov 27, 2019

Author Contributor

Sorry for the late reply.

Your several cases are thought-provoking for me. Though I proposed the idea of using "/" to switch the path matching mode, it seems confusing to users. The current deviceFilter implies a prefix of "/dev" as you mentioned. The combination of the implicit "/dev" and the matching mode switch may cause users to write aberrant and unreadable filter patterns, as you pointed in examples.

So I re-propose to add a new filter instead of extending deviceFilter. This keeps deviceFilter simple and gives clear definition to the new filter.

By the way, "/dev/sd[b-z]" in your case 2 does not match "/dev/sdb" in the current devLinksFilter implementation. This is because I coded to compare the filter only with DEVLINKS paths in the udevadm output. I introduced devLinksFilter to use predictable and topology-aware device names, so "/dev/sdb" was out of the scope. This is simple in coding but complicated for users with the benefit of hindsight. I will improve the matching to cover the case 2, probably with changing the filter name to more appropriate one.

Currently, by looking in the code, devices are listed with lsblk -all
(snip)
I'm not sure if this would report sg disks at all even, but I can imagine hoping to match disks by /dev/sg somehow.
(snip)
If we have to support any path under /dev/..., things start getting harder I think.

I think that the use of lsblk is reasonable to get the list of devices for Rook/Ceph. lsblk lists all block devices on the system, and Rook/Ceph uses block devices for OSDs. Using KNAMEs and DEVLINKS names returned by udevadm would be sufficient for Rook/Ceph.

/dev/sg* are character special files to send SCSI commands to devices, e.g. in burning a CD-R, and not used by Rook/Ceph.

We do need to still make sure we are matching the paths matched back to disks which exist and are actual disk types (e.g., not part, crypt, etc.)

The types of disks are checked in PopulateDeviceInfo(), which is indirectly called from the OSD Provisioning Pod to give device candidates to getAvailableDevices(). My PR uses these candidates in matching as well as the current deviceFilter does, so the device type constraint will be kept.

Note that this check allows crypt and part even in the original version of Rook. I think these types are valid for Rook. In fact my use case is to construct a Rook/Ceph cluster with pre-encrypted disks.

This comment has been minimized.

Copy link
@travisn

travisn Dec 5, 2019

Member

@BlaineEXE Any more thoughts on keeping the separate devLinksFilter property? Or shall we go with this?

`ceph-volume` cannot handle a device correctly if the device has an alias
under `/dev/mapper/`.
This commit fixes it by checking whether the device has that kind of alias
and using the alias as an argument for `ceph-volume`.

Signed-off-by: Kazuhito MATSUDA <kazuto.jinnai@gmail.com>
Copy link
Member

travisn left a comment

Let's finish this up by end of week before 1.2 feature freeze. Thanks!

@@ -52,7 +53,8 @@ storage:
directories:
- path: "/rook/dir1"
- name: "node2"
deviceFilter: "^foo*"`)
deviceFilter: "^foo*"
devLinksFilter: "^/dev/disk/by-id/.*foo.*"`)

This comment has been minimized.

Copy link
@travisn

travisn Dec 5, 2019

Member

@BlaineEXE Any more thoughts on keeping the separate devLinksFilter property? Or shall we go with this?

} else if device.Name == desiredDevice.Name {
logger.Infof("%s found in the desired devices", device.Name)
logger.Debugf("%q found in the desired devices", device.Name)

This comment has been minimized.

Copy link
@travisn

travisn Dec 5, 2019

Member

In the osd prepare pod we should log almost everything as Info. It's only a temporary job so the logs aren't going to be very long anyway. It's very helpful for troubleshooting if we include all these details.

@morimoto-cybozu morimoto-cybozu force-pushed the cybozu-go:use-mapper-name branch from c734d91 to eb1355b Dec 5, 2019
@morimoto-cybozu

This comment has been minimized.

Copy link
Contributor Author

morimoto-cybozu commented Dec 5, 2019

@travisn @BlaineEXE
Thank you for reminding!
I updated this pull request as follows:

  • revert the log level from Debug to Info
  • enrich the log line to explain which filter is applied
  • rename devLinksFilter to devicePathFilter; it now matches /dev/sda
  • clarify the type of devicePathFilter as string

Please take a look.

Add devicePathFilter to Storage Selection Settings which enables
device selection with a regular expression for device path names.

Fixes: #4275
Signed-off-by: Kazuhito MATSUDA <kazuto.jinnai@gmail.com>
@morimoto-cybozu morimoto-cybozu force-pushed the cybozu-go:use-mapper-name branch from eb1355b to 40d868e Dec 5, 2019
@morimoto-cybozu morimoto-cybozu changed the title ceph: support device selection for OSDs by udev device name patterns ceph: support device selection for OSDs by device path name patterns Dec 5, 2019
@morimoto-cybozu

This comment has been minimized.

Copy link
Contributor Author

morimoto-cybozu commented Dec 5, 2019

Fixed the commit comment and the PR description for renaming of devicePathFilter.

@leseb
leseb approved these changes Dec 5, 2019
@travisn
travisn approved these changes Dec 5, 2019
@travisn travisn merged commit 0a352f6 into rook:master Dec 5, 2019
4 checks passed
4 checks passed
DCO DCO
Details
Summary 1 potential rule
Details
commitlint found 0 problems, 0 warnings
continuous-integration/jenkins/pr-head This commit looks good
Details
v1.2 automation moved this from Review in progress to Done Dec 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
v1.2
  
Done
5 participants
You can’t perform that action at this time.