Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iscsi: Each target is tried only once in case of multipath #68140

Closed
jsafrane opened this issue Aug 31, 2018 · 3 comments
Closed

iscsi: Each target is tried only once in case of multipath #68140

jsafrane opened this issue Aug 31, 2018 · 3 comments
Labels
sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@jsafrane
Copy link
Member

If I have a volume with 3 targets, iSCSI volume plugin tries to log in into each target. If it succeeds with at least one target, it mounts the volume.

I experience login timeouts quite frequently when I drain a node with >20 iSCSI volumes with 3 paths each - these volumes need to be re-attached to different nodes and iscsiadm --login sometimes times out under such load. In the worst case, two targets for the same device time out and I end up with only one path that's not even part of multipath.

IMO, the volume plugin should try several to log in to all targets. Or at least wait until it has two paths. It should never mount a single path if multipath was requested.

/sig storage
@j-griffith @bswartz @rootfs @humblec

@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Aug 31, 2018
@bswartz
Copy link
Contributor

bswartz commented Aug 31, 2018

I think your point above is debatable. As long as there is a single path available and the multipath daemon creates a DM device for it, why not go ahead and use it? When the other 2 paths eventually come up, the multipath daemon will find the additional paths and add them too.

I would argue that this is the whole point of multipathing -- to enable the system to continue to work while there are hardware failures. You are suggesting that we actually fail to attach when there is exactly 1 path available, and that seems to me like the worst of both worlds. The system is now more fragile than a single-path solution because if either path goes down, I can't get any work done.

If your problem is that kubernetes is actually formatting and mounting the /dev/sdX device, thereby preventing multipath from creating a /dev/dm-X device later on when 2 or more paths are available, then yes I have seen that issue, and I have a fix for it here:
#67140

The above fix is not perfect, but it's better than the old behavior, and completely backward compatible.

@redbaron
Copy link
Contributor

@bswartz , if kubelet managed to login via 1 portal only, then mounted /dev/dm-X (so via multipath), then who and when will be discovering new paths?

@bswartz
Copy link
Contributor

bswartz commented Sep 18, 2018

@redbaron, as @jsafrane pointed out, nothing does this today. It's a problem, because even if you wanted to do to manually, you wouldn't know which targets to log into or which LUNs to scan without having a good understanding of the internals of the iSCSI code in kubelet.

We're going to have to think about whether it makes sense to solve this in kubernetes or elsewhere.

One cheap and partial solution would be to make sure that we log enough information when we fail to login or scan a LUN that an administrator could grep the logs for those messages, and from reading those message know which commands to issue to login to the missing targets and/or scan the missing LUNs when the paths eventually comes back. This could in theory be done in a way that wouldn't disrupt kubernetes' normal operation but it's obviously a hack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests

4 participants