-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iSCSI doesn't give enough time multipathd to create multipath device #60894
Comments
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
I can reproduce this issue fairly often (although not reliably). The problem is in pkg/volume/iscsi/iscsi_util.go in the AttachDisk() function. The basic problem is that the method that finds the multipath device looks at each device path one time, and if none of them have an associated multipath device, it just falls back to formatting and mounting the first device path. There is no wait loop for the multipath device to appear. It would be easy to add a wait loop, but that would slow down the non-multipath use case. A possible optimization would be to only wait when 2 or more paths are known, but that still opens up the possibility of incorrect behavior when attaching while all but one paths are down, or when only one path is discovered (for whatever reason). |
After thinking about this over lunch, I think a possible fix would be: add a 10 second wait loop for the multipath device if and only if the PV are 2 or more target portals specified. This puts control in the hands of the provisioner without modifying the API at all, and not breaking backwards compatibility. The one corner case I can see is what would happen the PV has 2 portals, but the system is only able to log in to one. This would occur when multipath was correctly configured on the storage controller and in the provisioner, but one of the paths was down (for whatever reason) at attach time. I'm inclined to make the attachment FAIL if the multipath device can't be found within 10 seconds, which would be a change to current behavior, but this can easily be worked around by the cluster administrator setting up multipathd to create devices for single path disks. Provisioners would want to document the recommended way to do that (i.e. set find_multipaths = no, and configure the blacklist appropriately). |
Would it break anything? Does anyone run multiple portals for the same volume without multipath? Is it something we should support? |
@redbaron was it working perfectly with older kube versions and the issue started to pop up recently or from
@jsafrane my understanding was/is that
Suppose the backup portal IPs are filled by provisioner/admin but initiator havent enabled ( dm_mulitpath is not loaded or deamon service is not running)
@redbaron my understanding is that, if multipath is enabled, even if it failed ( for some reason) for backup paths, a mpath device will be created for single path in the system. If thats the case above shouldnt be a concern and device mapper layer will take care of the path discovery later when path becomes stable/available.
I dont think so. |
we started with 1.9.x on kubelets, so can't tell much about mounting behaviour of previous versions.
it is going to fail, but it is fine, nodes should be correctly configured. Lets say you specify fsType, but given FS module is not loaded by kernel, or now mkfs. binary is available to the kubelet, mount is going to fail and thats OK IMHO. Kubelet shouldn't try to fallback to anything less than was asked, mounting via single portal when multiple portals are specified is similar to trying "default FS" if required fsType can't be used.
Yes, default configuration for multipathd is |
I'm fairly confident that adding a wait loop for the multipath device in the case when there are 2 or more portals in the PV spec won't cause any regressions. The thing that gives me pause is that there are cases when there's a single portal in the PV spec, but the device still supports multipath and the existing sendtargets command is the mechanism for finding those other paths. I'm wondering if it's possible to detect that case (reliably) and use a wait loop for the multipath device in those cases too. |
Did you mean Multiple Connection Per Session (MC/S)? I don't think this is supported in open-iscsi. Open-iSCSI only supports MPIO (multiple portals) |
I don't understand the move by RedHat to default to find_multipaths = no. It basically guarantees bad behavior in cloud-oriented use cases when there's a temporary path failure. My reading of the linked document is that the main motivation for changing the default is to avoid the need to setup a correct blacklist. It seems to me that correct behavior is more important that avoiding some work up front when configuring multipath.
No, I'm referring to the discovery code in iscsi_util.go. If a storage backend has multiple target portals, but the dynamic provisioner only puts one portal in the PV spec, then as long as that one portal is accessible at attach time, open-iscsi will discover the other portals and log into them. The only weakness to this approach is what happens when that one path is inaccessible at attach time due to a network switch/network port/NIC failure, while the other ports are still available. |
@rootfs , AFAIK multipathd (binary) in RHEL 7 defaults to @bswartz ,
IMHO that should be direct scsi mount unconditionally, without multipath involved |
I would be inclined to agree, except that someone added the sendtargets logic to the iscsi code. This code exists for no reason other than to allow finding multiple path given a single path. In fact the existing NetApp dynamic provisioner (Trident) relies on that behavior currently to get multipath support (this is something I've fixed and will upstream shortly). If we decide that PVs with exactly one portal should not support multipath, then we should remove that sendtargets logic from the iscsi module in kubelet. |
Automatic merge from submit-queue (batch tested with PRs 67017, 67190, 67110, 67140, 66873). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add wait loop for multipath devices to appear It takes a variable amount of time for the multipath daemon to create /dev/dm-XX in response to new LUNs being discovered. The old iscsi_util code only discovered the multipath device if it was created quickly enough, but in a significant number of cases, kubelet would grab one of the individual paths and put a filesystem it on before multipathd could construct a multipath device. This change waits for the multipath device to get created for up to 10 seconds, but only if the PV actually had more than one portal. fixes #60894 ```release-note Dynamic provisions that create iSCSI PVs can ensure that multipath is used by specifying 2 or more target portals in the PV, which will cause kubelet to wait up to 10 seconds for the multipath device. PVs with just one portal continue to work as before, with kubelet not waiting for the multipath device and just using the first disk it finds. ```
/kind bug
/sig storage
What happened:
When mounting iSCSI PV kubelet doesn't give multipathd enough time to create multipath device and fallbacks to mounting single SCSI device
What you expected to happen:
it uses multipath device to mount
How to reproduce it (as minimally and precisely as possible):
create ~5 pods on a single node, each with dynamically provisioned PV (we use Trident for that), observe how some mounts are /dev/sd* and not /dev/mapper/*
Environment:
Kubelet 1.9.3
Baremetal + NetApp ONTAP ove iSCSI + Trident dynamic volume provisioner
CoreOS (kernel 4.14.24, multipath-tools 0.6.4)
The text was updated successfully, but these errors were encountered: