Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iSCSI doesn't give enough time multipathd to create multipath device #60894

Closed
redbaron opened this issue Mar 7, 2018 · 14 comments · Fixed by #67140
Closed

iSCSI doesn't give enough time multipathd to create multipath device #60894

redbaron opened this issue Mar 7, 2018 · 14 comments · Fixed by #67140
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@redbaron
Copy link
Contributor

redbaron commented Mar 7, 2018

/kind bug
/sig storage

What happened:

When mounting iSCSI PV kubelet doesn't give multipathd enough time to create multipath device and fallbacks to mounting single SCSI device

What you expected to happen:
it uses multipath device to mount

How to reproduce it (as minimally and precisely as possible):
create ~5 pods on a single node, each with dynamically provisioned PV (we use Trident for that), observe how some mounts are /dev/sd* and not /dev/mapper/*

Environment:
Kubelet 1.9.3
Baremetal + NetApp ONTAP ove iSCSI + Trident dynamic volume provisioner
CoreOS (kernel 4.14.24, multipath-tools 0.6.4)

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage. labels Mar 7, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 5, 2018
@redbaron
Copy link
Contributor Author

redbaron commented Jun 5, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 5, 2018
@bswartz
Copy link
Contributor

bswartz commented Jun 25, 2018

I can reproduce this issue fairly often (although not reliably). The problem is in pkg/volume/iscsi/iscsi_util.go in the AttachDisk() function. The basic problem is that the method that finds the multipath device looks at each device path one time, and if none of them have an associated multipath device, it just falls back to formatting and mounting the first device path. There is no wait loop for the multipath device to appear.

It would be easy to add a wait loop, but that would slow down the non-multipath use case. A possible optimization would be to only wait when 2 or more paths are known, but that still opens up the possibility of incorrect behavior when attaching while all but one paths are down, or when only one path is discovered (for whatever reason).

@bswartz
Copy link
Contributor

bswartz commented Jun 25, 2018

After thinking about this over lunch, I think a possible fix would be: add a 10 second wait loop for the multipath device if and only if the PV are 2 or more target portals specified. This puts control in the hands of the provisioner without modifying the API at all, and not breaking backwards compatibility.

The one corner case I can see is what would happen the PV has 2 portals, but the system is only able to log in to one. This would occur when multipath was correctly configured on the storage controller and in the provisioner, but one of the paths was down (for whatever reason) at attach time. I'm inclined to make the attachment FAIL if the multipath device can't be found within 10 seconds, which would be a change to current behavior, but this can easily be worked around by the cluster administrator setting up multipathd to create devices for single path disks. Provisioners would want to document the recommended way to do that (i.e. set find_multipaths = no, and configure the blacklist appropriately).

@jsafrane
Copy link
Member

ISCSIVolumeSource.Portals is vaguely described as "target portal ips for high availability", without describing what the "high availability" means. Should we assume that it means multipath? Then Kubernetes can always wait for multipath device to appear when more that one Portal is specified.

Would it break anything? Does anyone run multiple portals for the same volume without multipath? Is it something we should support?

@rootfs @mtanino, any opinion?

@rootfs
Copy link
Contributor

rootfs commented Jul 10, 2018

@jsafrane yes, the iscsi portals were made possible by @humblec for HA/mpio use case (see #39345). The lack of concrete description makes it less obvious though.

@humblec
Copy link
Contributor

humblec commented Jul 11, 2018

@redbaron was it working perfectly with older kube versions and the issue started to pop up recently or from v1.9? Why I am asking is, multipath support was added in kube v1.6 ( on Feb 14, 2017) and atleast I havent seen this issue reported till this report or very recently. Gluster team were also using multipathing feature for quite some time where we always got mpath device mounted reliably, but noticed this issue very recently or with latest OCP builds. iic, There are good number of changes happened in between especially on device discovery/attach ( ex: https://github.com/kubernetes/kubernetes/pull/50334/files) disk paths. I am not concretely saying its a regression or a behavioral change or side effect of some other patch, but trying to figure out if thats the case. @mtanino @rootfs any thoughts here ?

ISCSIVolumeSource.Portals is vaguely described as "target portal ips for high availability", without describing what the "high availability" means. Should we assume that it means multipath? 

@jsafrane my understanding was/is that high availability with more target portal IPs means multipathing ( atleast) in ISCSI context, but we could have mention multipath as well in the description.

Then Kubernetes can always wait for multipath device to appear when more that one Portal is specified.

Suppose the backup portal IPs are filled by provisioner/admin but initiator havent enabled ( dm_mulitpath is not loaded or deamon service is not running) mulitipath software in the kubelet/node, what will happen ? Is it going to be infinite wait for failure ?

The one corner case I can see is what would happen the PV has 2 portals, but the system is only able to log in to one. This would occur when multipath was correctly configured on the .....

@redbaron my understanding is that, if multipath is enabled, even if it failed ( for some reason) for backup paths, a mpath device will be created for single path in the system. If thats the case above shouldnt be a concern and device mapper layer will take care of the path discovery later when path becomes stable/available.

Would it break anything? Does anyone run multiple portals for the same volume without multipath? Is it something we should support?

I dont think so.

@redbaron
Copy link
Contributor Author

@humblec,

was it working perfectly with older kube versions and the issue started to pop up recently or from v1.9?

we started with 1.9.x on kubelets, so can't tell much about mounting behaviour of previous versions.

Suppose the backup portal IPs are filled by provisioner/admin but initiator havent enabled ( dm_mulitpath is not loaded or deamon service is not running) mulitipath software in the kubelet/node, what will happen ? Is it going to be infinite wait for failure.

it is going to fail, but it is fine, nodes should be correctly configured. Lets say you specify fsType, but given FS module is not loaded by kernel, or now mkfs. binary is available to the kubelet, mount is going to fail and thats OK IMHO. Kubelet shouldn't try to fallback to anything less than was asked, mounting via single portal when multiple portals are specified is similar to trying "default FS" if required fsType can't be used.

my understanding is that, if multipath is enabled, even if it failed ( for some reason) for backup paths, a mpath device will be created for single path in the system.

Yes, default configuration for multipathd is find_multipaths = no option (note, that default in Linux distributions might be different), which creates multipath device for every non-blacklisted block device.

@bswartz
Copy link
Contributor

bswartz commented Jul 11, 2018

I'm fairly confident that adding a wait loop for the multipath device in the case when there are 2 or more portals in the PV spec won't cause any regressions.

The thing that gives me pause is that there are cases when there's a single portal in the PV spec, but the device still supports multipath and the existing sendtargets command is the mechanism for finding those other paths. I'm wondering if it's possible to detect that case (reliably) and use a wait loop for the multipath device in those cases too.

@rootfs
Copy link
Contributor

rootfs commented Jul 11, 2018

@redbaron

Yes, default configuration for multipathd is find_multipaths = no option (note, that default in Linux distributions might be different), which creates multipath device for every non-blacklisted block device.

find_multipaths = yes is the default on RHEL 7, see here

@rootfs
Copy link
Contributor

rootfs commented Jul 11, 2018

@bswartz

The thing that gives me pause is that there are cases when there's a single portal in the PV spec, but the device still supports multipath and the existing sendtargets command is the mechanism for finding those other paths. I'm wondering if it's possible to detect that case (reliably) and use a wait loop for the multipath device in those cases too.

Did you mean Multiple Connection Per Session (MC/S)? I don't think this is supported in open-iscsi. Open-iSCSI only supports MPIO (multiple portals)

@bswartz
Copy link
Contributor

bswartz commented Jul 11, 2018

@rootfs

find_multipaths = yes is the default on RHEL 7, see here

I don't understand the move by RedHat to default to find_multipaths = no. It basically guarantees bad behavior in cloud-oriented use cases when there's a temporary path failure.

My reading of the linked document is that the main motivation for changing the default is to avoid the need to setup a correct blacklist. It seems to me that correct behavior is more important that avoiding some work up front when configuring multipath.

@rootfs

Did you mean Multiple Connection Per Session (MC/S)? I don't think this is supported in open-iscsi. Open-iSCSI only supports MPIO (multiple portals)

No, I'm referring to the discovery code in iscsi_util.go. If a storage backend has multiple target portals, but the dynamic provisioner only puts one portal in the PV spec, then as long as that one portal is accessible at attach time, open-iscsi will discover the other portals and log into them. The only weakness to this approach is what happens when that one path is inaccessible at attach time due to a network switch/network port/NIC failure, while the other ports are still available.

@redbaron
Copy link
Contributor Author

redbaron commented Jul 11, 2018

@rootfs ,

AFAIK multipathd (binary) in RHEL 7 defaults to find_multipaths = no, but configuration utility they ship writes config with find_multipaths = yes, that utility needs to be called explicitly

@bswartz ,

The thing that gives me pause is that there are cases when there's a single portal in the PV spec

IMHO that should be direct scsi mount unconditionally, without multipath involved

@bswartz
Copy link
Contributor

bswartz commented Jul 12, 2018

@redbaron

The thing that gives me pause is that there are cases when there's a single portal in the PV spec

IMHO that should be direct scsi mount unconditionally, without multipath involved

I would be inclined to agree, except that someone added the sendtargets logic to the iscsi code. This code exists for no reason other than to allow finding multiple path given a single path. In fact the existing NetApp dynamic provisioner (Trident) relies on that behavior currently to get multipath support (this is something I've fixed and will upstream shortly).

If we decide that PVs with exactly one portal should not support multipath, then we should remove that sendtargets logic from the iscsi module in kubelet.

k8s-github-robot pushed a commit that referenced this issue Aug 11, 2018
Automatic merge from submit-queue (batch tested with PRs 67017, 67190, 67110, 67140, 66873). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add wait loop for multipath devices to appear

It takes a variable amount of time for the multipath daemon
to create /dev/dm-XX in response to new LUNs being discovered.
The old iscsi_util code only discovered the multipath device
if it was created quickly enough, but in a significant number
of cases, kubelet would grab one of the individual paths and
put a filesystem it on before multipathd could construct a
multipath device.

This change waits for the multipath device to get created for
up to 10 seconds, but only if the PV actually had more than
one portal.

fixes #60894

```release-note
Dynamic provisions that create iSCSI PVs can ensure that multipath is used by specifying 2 or more target portals in the PV, which will cause kubelet to wait up to 10 seconds for the multipath device. PVs with just one portal continue to work as before, with kubelet not waiting for the multipath device and just using the first disk it finds.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants