New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-10884: add multipath support for volume discovery #394
Conversation
@RomanBednar: This pull request references Jira Issue OCPBUGS-10884, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@RomanBednar: This pull request references Jira Issue OCPBUGS-10884, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@RomanBednar: This pull request references Jira Issue OCPBUGS-10884, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Here are my results from manual verification:Multipath configuration on worker node:
Create LocalVolumeDiscovery in web console and check discovery result:
Create LocalVolumeSet in web console and verify:
Create a PVC:
Schedule a Pod:
Check pv, pvc and pod status:
Cross check block device on worker node with crio:
Inspect PersistentVolume and verify volume path:
|
discoveredDevice := v1alpha1.DiscoveredDevice{ | ||
Path: fmt.Sprintf("/dev/%s", blockDevice.Name), | ||
Path: path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just get away by changing this line to:
Path: fmt.Sprintf("/dev/%s", blockDevice.KName)
I am not sure why we need device mapper path from /dev/mapper
for any of the stuff, except place where we are determining filesystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnufied well, we probably don't want to report invalid path in discovery result. That's the only reason I believe. Without the patch you'll get /dev/mpath
or something like that - such path will never exist for multipath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I meant if you used KName
here, it will point to a valid device. Output of lsblk
with KNAME
always points to a valid device. Having said that, there are pros and cons of both the approaches:
Pro of using KName
here is - we can use single device mechanism for every device type.
Pro of using mpath name is, users can easily tell system has selected an mpath device. But then how we show different devices becomes inconsistent. I also think that, mpath device names btw changes between reboots (so can KName
to be fair), so in the end all of the code must use dev/disk/by-id
path anyways. But symlink in /dev/disk/by-id
always points to KName
device in /dev/<kname>
location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I meant if you used
KName
here, it will point to a valid device. Output oflsblk
withKNAME
always points to a valid device. Having said that, there are pros and cons of both the approaches:
Ah, I see. I misread that and though you're suggesting to use what the original code was using, so blockDevice.Name
for the path.
Those are valid points, the reason I went with /dev/mapper
was that the original code used Name
instead of KName
and I assumed we'd like to keep this naming - volume discovery result looked like this:
- deviceID: /dev/disk/by-id/dm-name-mpatha
fstype: ""
model: ""
path: /dev/mpatha
Pro of using
KName
here is - we can use single device mechanism for every device type.Pro of using mpath name is, users can easily tell system has selected an mpath device. But then how we show different devices becomes inconsistent. I also think that, mpath device names btw changes between reboots (so can
KName
to be fair), so in the end all of the code must usedev/disk/by-id
path anyways. But symlink in/dev/disk/by-id
always points toKName
device in/dev/<kname>
location.
Not sure what you mean by "single device mechanism" but I assume it's the way we parse device paths in the code. So with KName
approach the advantage would be simpler code for sure.
As for the inconsistency in showing the devices: without further refactoring the discovery is not optimal, regardless of which approach of showing path we choose. See the LocalVolumeDiscoveryResult
I've posted above - we show all underlying devices of mpath (type: disk
) along with multipath device (type: mpath
) which is duplicated for every underlying device. Ideally it should show only one mpath? That means adding code that will reliably verify what mpath devices lead to exactly - wondering if it's even worth dealing with.
@RomanBednar: This pull request references Jira Issue OCPBUGS-10884, which is invalid:
Comment In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@gnufied I've updated to code to prevent duplicates in the discovery result and use kname path instead. See the current version is ok. We're expected to backport this to 4.12, I'll clone the Jiras and do the backports once we have a lgtm here. One thing to note is that the duplicates are now resolved by removing duplicate device IDs, while this works fine it would make much more sense to use I've tested the patch again, the discovery result now contains only one device mapper device using KNAME (
|
/assign @gnufied |
for _, v := range sample { | ||
k := key{v.DeviceID} | ||
if i, ok := m[k]; ok { | ||
unique[i] = v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why did we make a key and type for it? Couldn't we have just used DeviceId
via something like map[string]int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But another thing is - deviceID
may not be available always. Could we not have used v.path
for uniqueness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why did we make a key and type for it? Couldn't we have just used DeviceId via something like map[string]int ?
Sure, but if we would need more criteria for uniqueness instead of just one string using custom type would allow that, like this:
...
type key struct{ value string; value2 string }
m := make(map[key]int)
for _, v := range sample {
k := key{v.DeviceID, v.Path}
...
But another thing is - deviceID may not be available always. Could we not have used v.path for uniqueness?
Correct, deviceID
can be set to ""
- I missed that. That's even worse with this unique filter because that would hide all devices (except one) that have deviceID
set to ""
. I think it makes sense to use what I described above and let uniqueDevices()
use both v.Path
and v.DeviceID
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should work.
Commands executed use CombinedOutput() which returns stdout and stderr in one output value. In order to efficiently debug failed commands we must preserve the output and log it if the command fails. Otherwise we get useless messages/events like this: failed to discover devices. Error: exit status 1 failed to list all the block devices in the node. github.com/openshift/local-storage-operator/diskmaker/discovery.getValidBlockDevices /go/src/github.com/openshift/local-storage-operator/diskmaker/discovery/discovery.go:153
We can eliminate duplicates in discovery result by: 1) adding -M flag to lsblk 2) filtering out devices so we get unique device by-id paths only Option 1 is not available in current base image because it's using old version of lsblk which does not have -M flag.
@RomanBednar: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/jira refresh |
@RomanBednar: This pull request references Jira Issue OCPBUGS-10884, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (wduan@redhat.com), skipping review request. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.13 |
@RomanBednar: once the present PR merges, I will cherry-pick it on top of release-4.13 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-4.12 |
@RomanBednar: once the present PR merges, I will cherry-pick it on top of release-4.12 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gnufied, RomanBednar The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@RomanBednar: Jira Issue OCPBUGS-10884: Some pull requests linked via external trackers have merged: The following pull requests linked via external trackers have not merged:
These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-10884 has not been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@RomanBednar: #394 failed to apply on top of branch "release-4.12":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@RomanBednar: new pull request created: #407 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Adds multipath devices to discovery result.
Unrelated to discovery: we should expose multipath in console to improve user experience when creating
LocalVolumeSet
with multipath: openshift/console#12723cc @openshift/storage