Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log out from multiple target portals when using iscsi storage plugin #46239

Merged
merged 1 commit into from
Jun 3, 2017

Conversation

mtanino
Copy link

@mtanino mtanino commented May 22, 2017

What this PR does / why we need it:

When using iscsi storage with multiple target portal (TP) addresses
and multipathing the volume manager logs on to the IQN for all
portal addresses, but when a pod gets destroyed the volume manager
only logs out for the primary TP and sessions for another TPs are
always remained.

This patch adds mount points for all TPs, and then log out from all
TPs when a pod is destroyed. If a TP is referred from another pods,
the connection will be remained as usual.

Which issue this PR fixes
fixes #45394

Special notes for your reviewer:

Release note:

iscsi storage plugin: Fix dangling session when using multiple target portal addresses.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 22, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @mtanino. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with @k8s-bot ok to test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels May 22, 2017
@deads2k deads2k removed their assignment May 22, 2017
@gnufied
Copy link
Member

gnufied commented May 22, 2017

/assign @rootfs

@rootfs
Copy link
Contributor

rootfs commented May 23, 2017

@k8s-bot ok to test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 23, 2017
portals = append(portals, tp)
}
}
return portals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My go fu is not great but this looks like it will return an array with empty elements at the end in case it removes some duplicates which would mean the kubelet tries to execute iscsiadm commands on empty data.

This go playground example avoids it, maybe can use that
https://play.golang.org/p/q3bZ3hpOzD

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment. Your suggestion looks good.
My go is not great too, so I'm curious what kind of input causes empty elements return?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad
I expected the len of the new array to be 3 and the last element empty in the case
of having a list of three portals with on set of duplicates

See
https://play.golang.org/p/hkm8_bhlOy

return portals
}

// If all mntPaths indicate global mounts, then return ture. Otherwise return false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ture = true

return false
}
}
return flag
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can just return true here as flag isn't used anywhere

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Thanks.

glog.Errorf("iscsi: failed to delete node record Error: %s", string(out))
}
} else {
glog.Infof("iscsi: log out target %s iqn %s", portal, iqn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rootfs how long backwards compatibility is promised.
Can this code be removed after X versions of k8s.
If so would be good to add a todo for when it should be removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a todo is good for note taking

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Will do.


for _, mntPath := range mntPaths {
// if device is no longer used, see if need to logout the target
if cnt == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite following the logic here
If there is two mntpaths where the first one has a cnt of 0 and the first one the second one a count of 2.
Won't we skip logging out the first one?

Copy link
Author

@mtanino mtanino May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This logic should be changed like following.

In case of multipath with two portals, mntpaths has two global mount paths like;
/var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/x.x.x.10:3260-iqn.aaa
/var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/x.x.x.20:3260-iqn.bbb

Then, we should;
(1) Unmount all mntPaths
(2) Count num(cnt) of remained devices with GetDeviceNameFromMount
(3) If cnt == 0,then logout from these two iSCSI sessions

Any concerns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need to wait to log out one device until the other one has cnt == 0
If we don't have to wait and just log out any device with cnt == 0 then we can combine the loops

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cnt shows that the number of remained devices
If we have two remaining paths like this, then GetDeviceNameFromMount returns '2'.

/var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/x.x.x.10:3260-iqn.aaa
/var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/x.x.x.20:3260-iqn.bbb

In this case, I think we should unmount these paths at first. Then if these unmount is succeeded and cnt == 0, it means no one is using iSCSI connection so we can logout each iSCSI connections step by step. so I'm using divided for two loop for these steps.

But can we combine these steps into single for loop? Please let me know if you have good idea.
Thanks,

// This portal/iqn/iface is no longer referenced, log out.
// Extract the portal and iqn from device path.
portal, iqn, err := extractPortalAndIqn(device)
if err = c.mounter.Unmount(mntPath); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By doing Unmount first and GetDeviceNameFromMount second can't the cnt, cnt-- be skipped
and only use cnt

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@mtanino mtanino changed the title [WIP] Log out from multiple target portals when using iscsi storage plugin Log out from multiple target portals when using iscsi storage plugin May 26, 2017
@mtanino
Copy link
Author

mtanino commented May 26, 2017

@k8s-bot ok to test

@mtanino
Copy link
Author

mtanino commented May 26, 2017

@rootfs
Can you take a look of this fix?

@mtanino
Copy link
Author

mtanino commented May 27, 2017

@k8s-bot pull-kubernetes-cross test this

if len(bkpPortal) == 1 {
globalPDPath = b.manager.MakeGlobalPDName(*b.iscsiDisk, "")
} else {
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it.

@rootfs
Copy link
Contributor

rootfs commented May 30, 2017

@humblec

@@ -167,7 +171,11 @@ func (util *ISCSIUtil) AttachDisk(b iscsiDiskMounter) error {

iscsiTransport = extractTransportname(string(out))

bkpPortal := b.portals
bkpPortal := removeDuplicate(b.portals)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need removeDuplicate in attachdisk, can you please give some details on why we need this ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I thought if user passes duplicate portal IPs like this, iscsiadm will be called multiple times unnecessarily, but it was my misunderstanding.

    iscsi:
      targetPortal: 192.168.122.85:3260
      portals: ['192.168.122.85:3260', '192.168.122.7:3260']
      iqn: iqn.2017-05.com.example:rhel7

Actually waitForPathToExist() checks existing path so even if user specify duplicate portal IPs in the config, duplicate IPs will be skipped.

I'll remove removeDuplicate() logic from attachdisk()

Thanks,

@@ -167,7 +171,11 @@ func (util *ISCSIUtil) AttachDisk(b iscsiDiskMounter) error {

iscsiTransport = extractTransportname(string(out))

bkpPortal := b.portals
bkpPortal := removeDuplicate(b.portals)
if len(bkpPortal) == 0 {
Copy link
Contributor

@humblec humblec May 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its perfectly valid that portal can be empty. I may be missing the logic here, can you please fill me in?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same above.

@@ -27,16 +27,16 @@ import (

// Abstract interface to disk operations.
type diskManager interface {
MakeGlobalPDName(disk iscsiDisk) string
MakeGlobalPDName(disk iscsiDisk, portal string) string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

portal is a known member here in this code path as []string. May be you could use some other name to avoid confusion. Just a thought.

@humblec
Copy link
Contributor

humblec commented May 30, 2017

@mtanino Thanks for this PR ! I have few comments. Also it would be appreciated if you can explain some more on the bind mount approach followed in attachdisk for better understanding, I failed to get it in quick look.

@humblec
Copy link
Contributor

humblec commented May 30, 2017

@mtanino : The test failure looks to be valid as well.

I0527 08:03:19.329] pkg/volume/util/device_util.go:31: cannot use deviceHandler literal (type *deviceHandler) as type DeviceUtil in return argument:
I0527 08:03:19.329] 	*deviceHandler does not implement DeviceUtil (missing FindDevicemapperName method)
I0527 08:03:19.329] !!! [0527 08:03:19] Call tree:
I0527 08:03:19.330] !!! [0527 08:03:19]  1: /go/src/k8s.io/kubernetes/hack/lib/golang.sh:690 kube::golang::build_binaries_for_platform(...)
I0527 08:03:19.330] !!! [0527 08:03:19]  2: hack/make-rules/build.sh:27 kube::golang::build_binaries(...)
I0527 08:03:19.336] Makefile:89: recipe for target 'all' failed
I0527 08:03:19.339] Makefile:396: recipe for target 'cross' failed

@humblec
Copy link
Contributor

humblec commented May 30, 2017

@mtanino also the PR says, logout from multiple target portals when using iscsi plugin, however it change the code in attachdisk()..etc, where my expectation was that, the change would be more or less in detachdisk(). It could be better if you can split this PR into different commits and organize the changes. It will make the review process bit easier. At time of merging we could squash it .

@mtanino
Copy link
Author

mtanino commented May 30, 2017

@humblec

where my expectation was that, the change would be more or less in detachdisk().

In my understanding,

  • Currently only first path is mounted under kubelet dir even if user specify multiple portals.
/dev/mapper/mpathb on /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/192.168.122.85:3260-iqn.2017-05.com.example:rhel7-lun-0 
  • DetachDisk() collects iSCSi connection information from this mount path.
    But we don't have 2nd and subsequent sessions information so that we can't logout from them.

Is this correct?

In my approach, In order to logout from multiple iSCSI sessions, we need following two fix in this PR.

  • Add mount points for all iSCSI sessions at Attachdisk()
  • Logout from multiple portals using above mount points information at DetachDisk()

As a result, if user specify multiple portals like this,

    iscsi:
      targetPortal: 192.168.122.85:3260
      portals: ['192.168.122.7:3260']
      iqn: iqn.2017-05.com.example:rhel7

we can see mount points like this.

/dev/mapper/mpathb on /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/192.168.122.85:3260-iqn.2017-05.com.example:rhel7-lun-0 type ext4 (ro,relatime,block_validity,delalloc,barrier,user_xattr,acl,stripe=2048)
/dev/mapper/mpathb on /var/lib/kubelet/plugins/kubernetes.io/iscsi/iface-default/192.168.122.7:3260-iqn.2017-05.com.example:rhel7-lun-0 type ext4 (ro,relatime,block_validity,delalloc,barrier,user_xattr,acl,stripe=2048)
/dev/mapper/mpathb on /var/lib/kubelet/pods/f3838bc8-4562-11e7-8fd5-525400645a80/volumes/kubernetes.io~iscsi/iscsipd-rw type ext4 (ro,relatime,block_validity,delalloc,barrier,user_xattr,acl,stripe=2048)

Then we can properly logout using these two iSCSI connection information.

Any thought?

Thanks,
Mitsuhiro

@mtanino
Copy link
Author

mtanino commented May 30, 2017

@humblec
I'll take a look device_util.go as well.

@humblec
Copy link
Contributor

humblec commented May 30, 2017

@mtanino thanks for giving more details about the approach! iirc, when there are entries in portals field, we form paths for each portal. You could validate this in /dev/disk/by-path. Please pay attention to the portal IPs ( 10.0.2.{15,16,17,18} and also the IQN. These ( as in below output) all belongs to same mpath device. Also, we could list the active sessions from iscsiadm -m session . I think once we confirmed there is no active mount/ref exist on this mpath device, we can make use of existing iscsi sessions or parse below path to understand the existing connections and logout accordingly.

# ll /dev/disk/by-path/
lrwxrwxrwx. 1 root root 9 Feb 16 15:58 ip-10.0.2.15:3260-iscsi-iqn.2016-04.test.com:storage.target00-lun-0 -> ../../sdb
lrwxrwxrwx. 1 root root 9 Feb 16 15:41 ip-10.0.2.16:3260-iscsi-iqn.2016-04.test.com:storage.target00-lun-0 -> ../../sdc
lrwxrwxrwx. 1 root root 9 Feb 16 15:58 ip-10.0.2.17:3260-iscsi-iqn.2016-04.test.com:storage.target00-lun-0 -> ../../sdd
lrwxrwxrwx. 1 root root 9 Feb 16 15:41 ip-10.0.2.18:3260-iscsi-iqn.2016-04.test.com:storage.target00-lun-0 -> ../../sde

@rootfs Any thoughts?

@tobad357
Copy link
Contributor

@mtanino if going down the route of saving a json file wont that become an issue in the scenario where the umount goes through but something happens with the iscsi logout.
In that case the kubelet will not be able to logout the sessions in successive tries

@rootfs
Copy link
Contributor

rootfs commented May 31, 2017

@tobad357 the json file lives on the host, so it will survive an unmount and iscsi logout.

file := path.Join(mnt, "iscsi.json")
fp, err := os.Create(file)
if err != nil {
return fmt.Errorf("iscsi: create err %s/%s", file, err)
Copy link
Contributor

@rootfs rootfs May 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iscsi: create %s err %s

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

file := path.Join(mnt, "iscsi.json")
fp, err := os.Open(file)
if err != nil {
return fmt.Errorf("iscsi: open err %s/%s", file, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@rootfs
Copy link
Contributor

rootfs commented May 31, 2017

/approve

return nil
}

func (util *ISCSIUtil) loadISCSI(conf *iscsiDisk, mnt string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make a note here, the iscsi config json is not deleted

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 31, 2017
When using iscsi storage with multiple target portal (TP)
addresses and multipathing the volume manager logs on to
the IQN for all portal addresses, but when a pod gets
destroyed the volume manager only logs out for the primary
TP and sessions for another TPs are always remained.

This patch adds methods to store and load iscsi disk
configrations, then uses the stored config at DetachDisk
path.

Fix kubernetes#45394
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented May 31, 2017

@mtanino: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-cross 340a78211e46ab0fe91a4edb06516a4d6452963f link @k8s-bot pull-kubernetes-cross test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@rootfs
Copy link
Contributor

rootfs commented May 31, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 31, 2017
@mtanino
Copy link
Author

mtanino commented Jun 1, 2017

@k8s-bot pull-kubernetes-e2e-gce-etcd3 test this

@mtanino
Copy link
Author

mtanino commented Jun 1, 2017

@k8s-bot pull-kubernetes-node-e2e test this

@humblec
Copy link
Contributor

humblec commented Jun 1, 2017

/lgtm. Thanks !!

@mtanino
Copy link
Author

mtanino commented Jun 1, 2017

After merged the fix, can we cherry pick the PR to v1.6.x tree?

@childsb
Copy link
Contributor

childsb commented Jun 1, 2017

/approve

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: childsb, mtanino, rootfs

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@calebamiles calebamiles modified the milestone: v1.7 Jun 2, 2017
@tobad357
Copy link
Contributor

tobad357 commented Jun 3, 2017

@mtanino I agree this is very important to us as well
Would be great to get into a 1.6.x release

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit fcf183d into kubernetes:master Jun 3, 2017
@enisoc enisoc added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jun 5, 2017
k8s-github-robot pushed a commit that referenced this pull request Jun 6, 2017
Automatic merge from submit-queue (batch tested with PRs 46112, 46764, 46727, 46974, 46968)

iscsi storage plugin: bkpPortal should be initialized beforehand

**What this PR does / why we need it**:
This patch is a follow up patch for the PR #46239.
The bkpPortal in DetachDisk() path should be initialized before using it.

**Special notes for your reviewer**:
/cc @rootfs @childsb 

**Release note**:

```
NONE
```
k8s-github-robot pushed a commit that referenced this pull request Jun 9, 2017
…9-upstream-release-1.6

Automatic merge from submit-queue

Automated cherry pick of #46239

Cherry pick of #46239 on release-1.6.

#46239: Log out from multiple portals with iscsi storage plugin

This Cherry pick also includes fix #46968(follow-up fix for #46239)

#46968: iscsi storage plugin: bkpPortal should be initialized beforehand
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

iscsi storage plugin only logs out of primary portal