-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node reboot cause unable to mark volume as detached normally in GKE #52735
Comments
Thanks for writing this up @jingxu97 . I also wanted to add that, the cloudprovider VerifiedVolumeAttached check is not needed in AWS/EBS and I think we should either disable it or make it no-op. The reason why the original check was added is because in some cloudprovider's shutting down or rebooting a node can cause attached volumes to be detached. That doesn't happen on AWS. On AWS only termination can cause volume detach and in which case it is pointless to try and attach back those volumes as you wrote above. |
priority/important-longterm |
[MILESTONENOTIFIER] Milestone Removed Important: This issue was missing labels required for the v1.9 milestone for more than 3 days: priority: Must specify exactly one of |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
In our current GKE environment, when an instance is stopped, crashes, or deleted, a new instances will be automatically recreated so it can resume its processing tasks. Because node is deleted from cloud provider, all the volumes attached to the instance will be released (detached) automatically by cloud provider. Node object will be deleted from API server by node controller after it detects that node is deleted from cloud provider. Pods running on that nodes will be considered as orphaned pods and get deleted too. Volume manager has a control loop to periodically check the whether the volume is still attached to the node periodically.
In this scenario of node is deleted before pods are deleted, current volume manager could not handle it very well and cause unnecessary delay or even failures for volume operations. The following sequence of events will happen.
Note that around or after step 4, the new node is added back (normally a few minutes after node is deleted), volume is shown as not mounted from VolumeInUse field. At the stage, reconciler could pass the safety check and trigger detach. However, it might still get delayed unless we solve the next issue 6)
The main problem here is that when node is deleted before pods are deleted, the desired state does not update to reflect this important change so the reconciler’s behavior will be very different from what the real world wants. It attaches the volume back to the deleted node and because of the design, it further makes very difficult to detach from the deleted node quickly.
Proposed solution.
Node delete will cause volumes are deleted from desired state. When the node is added back, controller will list all pods and update the desired state. (This to avoid pods are not garbage collected if node object is quickly added back to the API server). This step will make sure that the desired state reflects that the volumes should not attach to the deleted node.
The text was updated successfully, but these errors were encountered: