Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volumeattachment can stuck on detachhing if volume already detached #52

Closed
kvaps opened this issue Jan 22, 2020 · 4 comments
Closed

volumeattachment can stuck on detachhing if volume already detached #52

kvaps opened this issue Jan 22, 2020 · 4 comments

Comments

@kvaps
Copy link
Member

kvaps commented Jan 22, 2020

Example, volumeattachment can't be removed due 404 error:

Name:         csi-b858cce844edca0ccc24195483a44f5c51cf041e042963054bcbf8524e117587
Namespace:    
Labels:       <none>
Annotations:  csi.alpha.kubernetes.io/node-id: m9c34
API Version:  storage.k8s.io/v1
Kind:         VolumeAttachment
Metadata:
  Creation Timestamp:             2020-01-22T07:16:29Z
  Deletion Grace Period Seconds:  0
  Deletion Timestamp:             2020-01-22T09:56:27Z
  Finalizers:
    external-attacher/linstor-csi-linbit-com
  Resource Version:  2445479763
  Self Link:         /apis/storage.k8s.io/v1/volumeattachments/csi-b858cce844edca0ccc24195483a44f5c51cf041e042963054bcbf8524e117587
  UID:               912e3700-eae0-4160-9d3d-1d9f2cf93ea9
Spec:
  Attacher:   linstor.csi.linbit.com
  Node Name:  m9c34
  Source:
    Persistent Volume Name:  pvc-a5083d7f-5577-42ef-b535-f9a873da4d1a
Status:
  Attached:  true
  Detach Error:
    Message:  rpc error: code = Internal desc = ControllerpublishVolume failed for pvc-a5083d7f-5577-42ef-b535-f9a873da4d1a: 404 Not Found
    Time:     2020-01-22T12:59:20Z
Events:       <none>

but m9c34 node have no this resource attached:

# linstor r l -r pvc-a5083d7f-5577-42ef-b535-f9a873da4d1a
+----------------------------------------------------------------------------+
| ResourceName                             | Node | Port | Usage  |    State |
|============================================================================|
| pvc-a5083d7f-5577-42ef-b535-f9a873da4d1a | m9c4 | 8915 | Unused | UpToDate |
| pvc-a5083d7f-5577-42ef-b535-f9a873da4d1a | m9c5 | 8915 | Unused | UpToDate |
+----------------------------------------------------------------------------+

Workaround: create diskless resource on m9c34 and repeat the attempt

Proposed solution: consider 404 as successful response on detach operation

@kvaps kvaps changed the title volumeattachment can stuck on detachhing if volume already attached volumeattachment can stuck on detachhing if volume already detached Jan 22, 2020
@rck
Copy link
Member

rck commented Jan 22, 2020

Thanks for reporting this. Yes, we already have a special golinstor.client error type for expected 404 "errors". So it should be pretty easy to check for it and ignore it. Still, currently a bit busy with other things, so no guarantee how quick I can get to it.

@kvaps
Copy link
Member Author

kvaps commented Jan 22, 2020

@rck volumeattachement will also stuck when the node (where the volume was/is mounted) is unreachable, e.g. because it crashed.

@rck rck closed this as completed in ed1f3bc Feb 11, 2020
@rck
Copy link
Member

rck commented Feb 11, 2020

I guess the first part (Detach) is fixed. Feel free to open another issue for the Attach case. This will need discussion. As of now I'm not sure how the the CSI driver should fix your broken cluster.

@kvaps
Copy link
Member Author

kvaps commented Feb 11, 2020

Hi @rck, thanks, we solved that by removing existing (broken) volumeattachments and finalizer for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants