Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark the volumes as detached when node does not exist #50239

Merged
merged 1 commit into from Aug 23, 2017

Conversation

FengyunPan
Copy link

@FengyunPan FengyunPan commented Aug 7, 2017

If node does not exist, node's volumes will be detached
automatically and become available. So mark them detached and do not return err.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #
#50200

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 7, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @FengyunPan. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Aug 7, 2017
@FengyunPan
Copy link
Author

/assign @jingxu97

@FengyunPan
Copy link
Author

/unassign @justinsb
/unassign @rootfs
/assign @anguslees
/assign @dims

@dims
Copy link
Member

dims commented Aug 7, 2017

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 7, 2017
@FengyunPan
Copy link
Author

/test pull-kubernetes-e2e-kops-aws

@dims
Copy link
Member

dims commented Aug 7, 2017

/approve

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 7, 2017
@dims
Copy link
Member

dims commented Aug 7, 2017

/approve no-issue

@dims
Copy link
Member

dims commented Aug 7, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 7, 2017
@jingxu97 jingxu97 added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Aug 7, 2017
@jingxu97
Copy link
Contributor

jingxu97 commented Aug 7, 2017

@FengyunPan , could you follow the same pattern as GCE PD or AWS, to have the check in DiskIsAttached function? In that function, if instance is no longer exist, return false without error.

@FengyunPan
Copy link
Author

@jingxu97 Ok, I will check it.

@dims
Copy link
Member

dims commented Aug 8, 2017

/approve cancel
/lgtm cancel

@jingxu97 can you please remove the do-not-merge?

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 8, 2017
@k8s-github-robot k8s-github-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 8, 2017
@jingxu97
Copy link
Contributor

@rootfs, I think the fix is when node is no longer exist (deleted by cloud provider), we can say the status of the volume is detached. Currently GCE PD and AWS have this logic. Is that the case for cinder?

@rootfs
Copy link
Contributor

rootfs commented Aug 10, 2017

@jingxu97 there is a long list of volume status
https://developer.openstack.org/api-ref/block-storage/v2/index.html

Status Description
creating The volume is being created.
available The volume is ready to attach to an instance.
attaching The volume is attaching to an instance.
detaching The volume is detaching from an instance.
in-use The volume is attached to an instance.
maintenance The volume is locked and being migrated.
deleting The volume is being deleted.
awaiting-transfer The volume is awaiting for transfer.
error A volume creation error occurred.
error_deleting A volume deletion error occurred.
backing-up The volume is being backed up.
restoring-backup A backup is being restored to the volume.
error_backing-up A backup error occurred.
error_restoring A backup restoration error occurred.
error_extending An error occurred while attempting to extend a volume.
downloading The volume is downloading an image.
uploading The volume is being uploaded to an image.
retyping The volume is changing type to another volume type.
extending The volume is being extended.

// So mark them detached.
if err == cloudprovider.InstanceNotFound {
for _, spec := range specs {
volumesAttachedCheck[spec] = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mark it but don't detach it yet.

Copy link
Contributor

@rootfs rootfs Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot comment on line 216. Consider something like the following

for volumeID, attached := range attachedResult {
 		if !attached {
 			spec := volumeSpecMap[volumeID]
 			volumesAttachedCheck[spec] = false
 			glog.V(2).Infof("VolumesAreAttached: check volume %q (specName: %q) is no longer attached", volumeID, spec.Name())
 		} else 
               // volume attached, check if the instance is alive here
 	}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need check if the instance is alive when volume is attached?
Can you tell me more detail message about it? Can you give me a example? Thank you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the instance is alive is not sufficient: the cinder volume could be moved already to other nodes or in a different state.

We need both attachment info and node info to ensure that the cinder volume is not moved elsewhere and the node the volume is attached is dead, then we can safely detach the volume.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @rootfs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Thank you very much @rootfs.
I update it soon, please take a look.
And I think the change should be inside of openstach_volume.go DiskIsAttached function too. @jingxu97

@jingxu97
Copy link
Contributor

Thanks @rootfs
@FengyunPan since the behavior of Cinder might be different from GCE PD, could you please double check? Also as I mentioned before, the change should be inside of openstach_volume.go DiskIsAttached function.

@FengyunPan
Copy link
Author

@rootfs

I am not convinced this is an accurate fix. It is safe to detach only if the attachment status is 'attached' (deal with in-use and attaching) and the instance id in the attachment cannot be found.

Sorry, I can't understand. Can volume be attached to a not exist instance? I think it can't, so it is no need to check it in the case.
Please let me know if my understance is wrong, thank you.

@@ -185,7 +186,11 @@ func (attacher *cinderDiskAttacher) VolumesAreAttached(specs []*volume.Spec, nod
}

instanceID, err := attacher.nodeInstanceID(nodeName)
if err != nil {
if err == cloudprovider.InstanceNotFound {
glog.V(2).Infof("VolumesAreAttached: node %q does not exist.", nodeName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

print both nodeName and instanceID for better diagnostics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jingxu97 should raise severity to glog.Warningf too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rootfs if instannce is deleted, openstack cloud provider can't get its instanceID, so attacher.nodeInstanceID() return ("", err). Printing nodename is enough.
Yeah, it should use glog.Warningf, thank you.

@dims
Copy link
Member

dims commented Aug 14, 2017

@anguslees - any thoughts on this?

@@ -412,6 +412,14 @@ func (os *OpenStack) DiskIsAttached(instanceID, volumeID string) (bool, error) {
return false, err
}

if instanceID == "" {
if volume.AttachedServerId != "" {
return true, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the logic here is right. The function is checking whether the volume is attached to the instanced passed in as parameters. Here is the instanceId is empty, it means that the instance is deleted from the cloudprovider. If the volume.AttachedServerId is not empty, that means the volume is a attached to a different instance. But the function should return false because the volume is not attached to the original instance.

For most of the volume plugins, if the node no longer exist in cloud provider, we can safely assume the volume is detached from the node. But for cinder, could you please double check the behavior if you have the environment? Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jingxu97 IMO, if instanceID is empty, the DiskIsAttached() is checking whether the volume is attached. If the volume is attached, returning true means that the disk is attached, or returning false means that the disk is not attached.

But the function should return false because the volume is not attached to the original instance.

We can't get the instance id for the original instance which does not exist. If DiskIsAttached() return true when instance does not exist, that means the volume is attached to another instance and we should not mark the volume detached(as rootfs says). If DiskIsAttached() return false when instance does not exist, that means the volume is available, we can mark the volume detached. The code from line 207 already do it. And that also is my double check, I have tested it.

But for cinder, could you please double check the behavior if you have the environment?

I am sorry, I can't understand the meaning of your ‘double check’, can you tell more detail messages? thank you again.

Copy link
Contributor

@jingxu97 jingxu97 Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"We can't get the instance id for the original instance which does not exist. If DiskIsAttached() return true when instance does not exist that means the volume is attached to another instance and we should not mark the volume detached"

Your description is not correct.
DiskIsAttached does not just check disk is attached to ANY node, it checks whether it is attached to the Node that you passed in that function. Suppose disk is currently attached to node B, the function check of DiskIsAttached (disk, nodeA) should return false because it is not attached to nodeA, it is attached to node B. Please let me know if this is clear.

Double check means that I want to confirm that node not found means volume is detached automatically. Seems like this is true for most of cloud provider. But I don't have the cinder volume setup to test it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If DiskIsAttached() without my codes, DiskIsAttached() just check whether disk is attached to the instanceID that you passed in that function, and the instanceID can't be empty, right?
Now I add the codes into DiskIsAttached(), DiskIsAttached() can check whether disk is attached to the instanceID(not empty) that you passed in that function, also can check whether disk is attached when instanceID is empty.

Yes, volume will be detached when instance is not found in openstack cinder.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that confused? Is it need to add another func to check whether disk is attached?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are sure that if node not found in openstack cinder indicates that volume is already detached. Then I suggest we can the following changes
In cinder/attacher.go

if err == cloudprovider.InstanceNotFound {
    //return false with nil error
   ...
}

You might have this change in the very first version. Sorry for the confusion.
@rootfs mentioned that cinder might have a more complicated status, and node not exist might not represent volume is already detached successfully from the node. If this is true, we might not return false directly. But I could not think a good way to check it. Passing an empty string for node instance seems not right because our goal is to check whether the volume is still attached to that specified node (not any other node). What does it mean if node id is empty for this function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jingxu97 @rootfs Yeah, volume will be detached when instance is not found in openstack cinder. I am sorry for that I can't find the doc from openstack, but find the codes:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L899
nova will detach all the device from instance when user delete instance.
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L983
detach volume:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1387
If instance does not exist, nova will detach the volume too:
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1389

So we can do: If the node instance is no longer exist (return cloudprovider.InstanceNotFound error), we can safely assume that volume is no longer attached to the node. So DiskIsAttached() return disk is not attached without error.

If node doesn't exist, OpenStack Nova will assume the volumes
are not attached to it. So mark the volumes as detached and
return false without error.
Fix: kubernetes#50200
@rootfs
Copy link
Contributor

rootfs commented Aug 22, 2017

Since we are using V2 block storage api, it is safe to say the attachment should be updated with a Nova instance id. This could change in v3 per info here. We'll revisit this issue when we pick up v3 block storage sdk

@kubernetes/sig-openstack-pr-reviews

@k8s-ci-robot k8s-ci-robot added the area/provider/openstack Issues or PRs related to openstack provider label Aug 22, 2017
@FengyunPan
Copy link
Author

@rootfs Yes, I agree. Cinder V3 is different.
But currently we just support cinder v1 and v2. Can we merge this pr to fix #50200?
After Merging PR (gophercloud/gophercloud#415), I will add codes to support cinder v3 for openstack cloud provider. Then revisit this issue.

@rootfs
Copy link
Contributor

rootfs commented Aug 23, 2017

/approve

@FengyunPan yes that's my idea too :)

pass to @jingxu97 for lgtm

@FengyunPan
Copy link
Author

@rootfs Yeah, thank you, I will focus on openstack cinder v3 later.

@dims
Copy link
Member

dims commented Aug 23, 2017

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 23, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: FengyunPan, dims, rootfs

Associated issue: 50200

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 23, 2017
@k8s-github-robot
Copy link

/test all

Tests are more than 96 hours old. Re-running tests.

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 38947, 50239, 51115, 51094, 51116)

@k8s-github-robot k8s-github-robot merged commit 012e94b into kubernetes:master Aug 23, 2017
dims pushed a commit to dims/kubernetes that referenced this pull request Feb 8, 2018
Automatic merge from submit-queue (batch tested with PRs 38947, 50239, 51115, 51094, 51116)

Mark the volumes as detached when node does not exist

If node does not exist, node's volumes will be detached
automatically and become available. So mark them detached and do not return err.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
kubernetes#50200

**Release note**:
```release-note
NONE
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/openstack Issues or PRs related to openstack provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants