Mark the volumes as detached when node does not exist #50239

FengyunPan · 2017-08-07T09:05:05Z

If node does not exist, node's volumes will be detached
automatically and become available. So mark them detached and do not return err.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #
#50200

Release note:

NONE

k8s-ci-robot · 2017-08-07T09:05:12Z

Hi @FengyunPan. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

FengyunPan · 2017-08-07T09:06:41Z

/assign @jingxu97

FengyunPan · 2017-08-07T10:45:29Z

/unassign @justinsb
/unassign @rootfs
/assign @anguslees
/assign @dims

dims · 2017-08-07T11:24:33Z

/ok-to-test

FengyunPan · 2017-08-07T12:45:33Z

/test pull-kubernetes-e2e-kops-aws

dims · 2017-08-07T15:37:12Z

/approve

dims · 2017-08-07T19:02:13Z

/approve no-issue

dims · 2017-08-07T19:02:35Z

/lgtm

jingxu97 · 2017-08-07T20:41:26Z

@FengyunPan , could you follow the same pattern as GCE PD or AWS, to have the check in DiskIsAttached function? In that function, if instance is no longer exist, return false without error.

FengyunPan · 2017-08-08T01:16:31Z

@jingxu97 Ok, I will check it.

dims · 2017-08-08T01:42:54Z

/approve cancel
/lgtm cancel

@jingxu97 can you please remove the do-not-merge?

jingxu97 · 2017-08-10T17:33:41Z

@rootfs, I think the fix is when node is no longer exist (deleted by cloud provider), we can say the status of the volume is detached. Currently GCE PD and AWS have this logic. Is that the case for cinder?

rootfs · 2017-08-10T18:10:12Z

@jingxu97 there is a long list of volume status
https://developer.openstack.org/api-ref/block-storage/v2/index.html

Status	Description
creating	The volume is being created.
available	The volume is ready to attach to an instance.
attaching	The volume is attaching to an instance.
detaching	The volume is detaching from an instance.
in-use	The volume is attached to an instance.
maintenance	The volume is locked and being migrated.
deleting	The volume is being deleted.
awaiting-transfer	The volume is awaiting for transfer.
error	A volume creation error occurred.
error_deleting	A volume deletion error occurred.
backing-up	The volume is being backed up.
restoring-backup	A backup is being restored to the volume.
error_backing-up	A backup error occurred.
error_restoring	A backup restoration error occurred.
error_extending	An error occurred while attempting to extend a volume.
downloading	The volume is downloading an image.
uploading	The volume is being uploaded to an image.
retyping	The volume is changing type to another volume type.
extending	The volume is being extended.

rootfs · 2017-08-10T19:51:39Z

pkg/volume/cinder/attacher.go

+	// So mark them detached.
+	if err == cloudprovider.InstanceNotFound {
+		for _, spec := range specs {
+			volumesAttachedCheck[spec] = false


mark it but don't detach it yet.

Cannot comment on line 216. Consider something like the following

for volumeID, attached := range attachedResult { if !attached { spec := volumeSpecMap[volumeID] volumesAttachedCheck[spec] = false glog.V(2).Infof("VolumesAreAttached: check volume %q (specName: %q) is no longer attached", volumeID, spec.Name()) } else // volume attached, check if the instance is alive here }

Why we need check if the instance is alive when volume is attached?
Can you tell me more detail message about it? Can you give me a example? Thank you.

Checking the instance is alive is not sufficient: the cinder volume could be moved already to other nodes or in a different state.

We need both attachment info and node info to ensure that the cinder volume is not moved elsewhere and the node the volume is attached is dead, then we can safely detach the volume.

Agree with @rootfs

Oh, I see. Thank you very much @rootfs.
I update it soon, please take a look.
And I think the change should be inside of openstach_volume.go DiskIsAttached function too. @jingxu97

jingxu97 · 2017-08-10T19:53:11Z

Thanks @rootfs
@FengyunPan since the behavior of Cinder might be different from GCE PD, could you please double check? Also as I mentioned before, the change should be inside of openstach_volume.go DiskIsAttached function.

FengyunPan · 2017-08-11T05:22:00Z

@rootfs

I am not convinced this is an accurate fix. It is safe to detach only if the attachment status is 'attached' (deal with in-use and attaching) and the instance id in the attachment cannot be found.

Sorry, I can't understand. Can volume be attached to a not exist instance? I think it can't, so it is no need to check it in the case.
Please let me know if my understance is wrong, thank you.

rootfs · 2017-08-11T15:55:01Z

pkg/volume/cinder/attacher.go

@@ -185,7 +186,11 @@ func (attacher *cinderDiskAttacher) VolumesAreAttached(specs []*volume.Spec, nod
 	}

 	instanceID, err := attacher.nodeInstanceID(nodeName)
-	if err != nil {
+	if err == cloudprovider.InstanceNotFound {
+		glog.V(2).Infof("VolumesAreAttached: node %q does not exist.", nodeName)


print both nodeName and instanceID for better diagnostics.

@jingxu97 should raise severity to glog.Warningf too?

@rootfs if instannce is deleted, openstack cloud provider can't get its instanceID, so attacher.nodeInstanceID() return ("", err). Printing nodename is enough.
Yeah, it should use glog.Warningf, thank you.

dims · 2017-08-14T11:11:10Z

@anguslees - any thoughts on this?

jingxu97 · 2017-08-14T19:07:01Z

pkg/cloudprovider/providers/openstack/openstack_volumes.go

@@ -412,6 +412,14 @@ func (os *OpenStack) DiskIsAttached(instanceID, volumeID string) (bool, error) {
 		return false, err
 	}

+	if instanceID == "" {
+		if volume.AttachedServerId != "" {
+			return true, nil


I don't think the logic here is right. The function is checking whether the volume is attached to the instanced passed in as parameters. Here is the instanceId is empty, it means that the instance is deleted from the cloudprovider. If the volume.AttachedServerId is not empty, that means the volume is a attached to a different instance. But the function should return false because the volume is not attached to the original instance.

For most of the volume plugins, if the node no longer exist in cloud provider, we can safely assume the volume is detached from the node. But for cinder, could you please double check the behavior if you have the environment? Thanks!

@jingxu97 IMO, if instanceID is empty, the DiskIsAttached() is checking whether the volume is attached. If the volume is attached, returning true means that the disk is attached, or returning false means that the disk is not attached.

But the function should return false because the volume is not attached to the original instance.

We can't get the instance id for the original instance which does not exist. If DiskIsAttached() return true when instance does not exist, that means the volume is attached to another instance and we should not mark the volume detached(as rootfs says). If DiskIsAttached() return false when instance does not exist, that means the volume is available, we can mark the volume detached. The code from line 207 already do it. And that also is my double check, I have tested it.

But for cinder, could you please double check the behavior if you have the environment?

I am sorry, I can't understand the meaning of your ‘double check’, can you tell more detail messages? thank you again.

"We can't get the instance id for the original instance which does not exist. If DiskIsAttached() return true when instance does not exist that means the volume is attached to another instance and we should not mark the volume detached"

Your description is not correct.
DiskIsAttached does not just check disk is attached to ANY node, it checks whether it is attached to the Node that you passed in that function. Suppose disk is currently attached to node B, the function check of DiskIsAttached (disk, nodeA) should return false because it is not attached to nodeA, it is attached to node B. Please let me know if this is clear.

Double check means that I want to confirm that node not found means volume is detached automatically. Seems like this is true for most of cloud provider. But I don't have the cinder volume setup to test it.

If DiskIsAttached() without my codes, DiskIsAttached() just check whether disk is attached to the instanceID that you passed in that function, and the instanceID can't be empty, right?
Now I add the codes into DiskIsAttached(), DiskIsAttached() can check whether disk is attached to the instanceID(not empty) that you passed in that function, also can check whether disk is attached when instanceID is empty.

Yes, volume will be detached when instance is not found in openstack cinder.

Is that confused? Is it need to add another func to check whether disk is attached?

If we are sure that if node not found in openstack cinder indicates that volume is already detached. Then I suggest we can the following changes
In cinder/attacher.go

if err == cloudprovider.InstanceNotFound { //return false with nil error ... }

You might have this change in the very first version. Sorry for the confusion.
@rootfs mentioned that cinder might have a more complicated status, and node not exist might not represent volume is already detached successfully from the node. If this is true, we might not return false directly. But I could not think a good way to check it. Passing an empty string for node instance seems not right because our goal is to check whether the volume is still attached to that specified node (not any other node). What does it mean if node id is empty for this function?

@jingxu97 @rootfs Yeah, volume will be detached when instance is not found in openstack cinder. I am sorry for that I can't find the doc from openstack, but find the codes:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L899
nova will detach all the device from instance when user delete instance.
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L983
detach volume:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1387
If instance does not exist, nova will detach the volume too:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1389

So we can do: If the node instance is no longer exist (return cloudprovider.InstanceNotFound error), we can safely assume that volume is no longer attached to the node. So DiskIsAttached() return disk is not attached without error.

If node doesn't exist, OpenStack Nova will assume the volumes are not attached to it. So mark the volumes as detached and return false without error. Fix: kubernetes#50200

rootfs · 2017-08-22T15:18:56Z

Since we are using V2 block storage api, it is safe to say the attachment should be updated with a Nova instance id. This could change in v3 per info here. We'll revisit this issue when we pick up v3 block storage sdk

@kubernetes/sig-openstack-pr-reviews

FengyunPan · 2017-08-23T01:54:15Z

@rootfs Yes, I agree. Cinder V3 is different.
But currently we just support cinder v1 and v2. Can we merge this pr to fix #50200?
After Merging PR (gophercloud/gophercloud#415), I will add codes to support cinder v3 for openstack cloud provider. Then revisit this issue.

rootfs · 2017-08-23T13:44:41Z

/approve

@FengyunPan yes that's my idea too :)

pass to @jingxu97 for lgtm

FengyunPan · 2017-08-23T13:51:34Z

@rootfs Yeah, thank you, I will focus on openstack cinder v3 later.

dims · 2017-08-23T13:53:57Z

/approve
/lgtm

k8s-github-robot · 2017-08-23T13:54:38Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: FengyunPan, dims, rootfs

Associated issue: 50200

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~pkg/cloudprovider/providers/openstack/OWNERS~~ [dims]
~~pkg/volume/cinder/OWNERS~~ [dims,rootfs]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-08-23T13:54:39Z

/test all

Tests are more than 96 hours old. Re-running tests.

k8s-github-robot · 2017-08-23T15:41:03Z

Automatic merge from submit-queue (batch tested with PRs 38947, 50239, 51115, 51094, 51116)

Automatic merge from submit-queue (batch tested with PRs 38947, 50239, 51115, 51094, 51116) Mark the volumes as detached when node does not exist If node does not exist, node's volumes will be detached automatically and become available. So mark them detached and do not return err. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # kubernetes#50200 **Release note**: ```release-note NONE ```

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 7, 2017

k8s-github-robot assigned justinsb and rootfs Aug 7, 2017

k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Aug 7, 2017

k8s-ci-robot assigned jingxu97 Aug 7, 2017

FengyunPan force-pushed the fix-no-exist-node branch from 2715ef8 to 076d858 Compare August 7, 2017 10:38

k8s-ci-robot assigned anguslees and dims and unassigned justinsb and rootfs Aug 7, 2017

FengyunPan force-pushed the fix-no-exist-node branch from 076d858 to 82b87cf Compare August 7, 2017 10:52

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 7, 2017

k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 7, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 7, 2017

jingxu97 added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Aug 7, 2017

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 8, 2017

k8s-github-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 8, 2017

k8s-ci-robot assigned rootfs Aug 10, 2017

rootfs reviewed Aug 10, 2017

View reviewed changes

FengyunPan force-pushed the fix-no-exist-node branch from 82b87cf to 8d0d995 Compare August 11, 2017 15:36

rootfs reviewed Aug 11, 2017

View reviewed changes

FengyunPan force-pushed the fix-no-exist-node branch from 8d0d995 to b011571 Compare August 12, 2017 03:23

jingxu97 reviewed Aug 14, 2017

View reviewed changes

FengyunPan mentioned this pull request Aug 15, 2017

Guideline for Checking Volume attached/detached Status through Cloudprovider #50645

Closed

FengyunPan force-pushed the fix-no-exist-node branch from b011571 to 12ec68a Compare August 15, 2017 07:54

Mark the volumes as detached when node does not exist

63725e3

If node doesn't exist, OpenStack Nova will assume the volumes are not attached to it. So mark the volumes as detached and return false without error. Fix: kubernetes#50200

FengyunPan force-pushed the fix-no-exist-node branch from 12ec68a to 63725e3 Compare August 15, 2017 08:45

rootfs mentioned this pull request Aug 22, 2017

Allow attach of volumes to multiple nodes for vSphere #51066

Merged

k8s-ci-robot added the area/provider/openstack Issues or PRs related to openstack provider label Aug 22, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 23, 2017

k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 23, 2017

k8s-github-robot merged commit 012e94b into kubernetes:master Aug 23, 2017

FengyunPan mentioned this pull request Sep 22, 2017

Support VolumesV3 for OpenStack cloud Provider #52877

Closed

Mark the volumes as detached when node does not exist #50239

Mark the volumes as detached when node does not exist #50239

Conversation

FengyunPan commented Aug 7, 2017 • edited

k8s-ci-robot commented Aug 7, 2017

FengyunPan commented Aug 7, 2017

FengyunPan commented Aug 7, 2017

dims commented Aug 7, 2017

FengyunPan commented Aug 7, 2017

dims commented Aug 7, 2017

dims commented Aug 7, 2017

dims commented Aug 7, 2017

jingxu97 commented Aug 7, 2017

FengyunPan commented Aug 8, 2017

dims commented Aug 8, 2017

jingxu97 commented Aug 10, 2017

rootfs commented Aug 10, 2017

Choose a reason for hiding this comment

rootfs Aug 10, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jingxu97 commented Aug 10, 2017

FengyunPan commented Aug 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dims commented Aug 14, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jingxu97 Aug 15, 2017 • edited

Choose a reason for hiding this comment

FengyunPan Aug 15, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FengyunPan Aug 15, 2017 • edited

Choose a reason for hiding this comment

rootfs commented Aug 22, 2017

FengyunPan commented Aug 23, 2017

rootfs commented Aug 23, 2017

FengyunPan commented Aug 23, 2017

dims commented Aug 23, 2017

k8s-github-robot commented Aug 23, 2017

k8s-github-robot commented Aug 23, 2017

k8s-github-robot commented Aug 23, 2017

FengyunPan commented Aug 7, 2017 •

edited

rootfs Aug 10, 2017 •

edited

jingxu97 Aug 15, 2017 •

edited

FengyunPan Aug 15, 2017 •

edited

FengyunPan Aug 15, 2017 •

edited