-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unused CSI Volumes stay in use if services using them are removed in the wrong order #45547
Comments
cc @dperny |
Note: I am happy to help out with any further debugging. If you can point me to the code that is related to these state transitions, I can also start investigating. (Still a bit new to the swarmkit/moby codebase) |
One big weak point in CSI, and by extension Swarm's use of it, is that the state transitions of a CSI volume are strict. We cannot Unpublish a Volume on the Controller side until it has been Unpublished and Unstaged on the Node. To adhere to this restriction, we must get an affirmative signal from the Swarm Agent that a Volume has been successfully Unstaged. If something goes wrong in the Unstage process, the Volume will be stuck "In Use", possibly forever. This could happen if the Node is struck by lightning or falls through a crack in reality into the great nothingness between worlds. I'm unsure how Kubernetes handles such a case. This is present across plugins, though, which makes me strong suspect the problem is an issue with Swarm's implementation. |
Actually, if I recall correctly, Volume removal may be lazy... We don't attempt to remove a Volume from a Node until we need the Volume elsewhere. It's not unlikely that a Task would be scheduled back to the same Node if it failed, and the assumption is that Node CSI operations are somewhat expensive, so we avoid removing and then immediately re-adding the Volume. I think. It's been a while. |
The problem with this lazy behaviour is then that iirc you are forced to use |
I see the problem now. The way freeing volumes works, we look for volumes to remove each time we do a pass of the scheduler. Deleting a Service doesn't cause a scheduling pass, so we don't free the Volumes. In theory, some other scheduling event, like creating a new service, ought to cause the scheduling pass that successfully frees the Volume? I need to check... |
So what would be the fix here then? Can we "force" a scheduling pass in the volume rm Operation to try it out? |
Has anyone made any progress on figuring this out? This issue has made CSI volumes pretty unreliable and often times services wont start without manual intervention, due to the volume getting "stuck". |
I reread this issue @dperny and i think there ought to be something that force unstages (unrelated to how kubernetes does ist) . At least in a somewhat configurable manner. Kubernetes seems to have something outside of the CSI spec that handles this case |
Have you tried to see if another scheduling trigger fixes the volume not being unstaged? |
@dperny I can verify this. Simply creating an unrelated service causes the state transition to happen. Now the question is how do we trigger the schedule operation properly? I mean knowing this we can come up with all kinds of workarounds, but I don't think we should leave it at that. |
It seems like this PR is addressing this already? moby/swarmkit#3144 |
Has this been released yet? |
It's not vendored in moby/moby master yet, so I doubt it see how the code is missing here: https://github.com/moby/moby/blob/master/vendor/github.com/moby/swarmkit/v2/manager/scheduler/scheduler.go so I'd say we wait for Drew to weigh in, but the code changes I read in that PR look like it has been fixed by that PR. |
That's good, hopefully it'll be released soon |
Looks like this will be in 25.0.0 |
Description
CSI volumes have an issue around state transitions which lets them stay "in use" if a service using them is removed without the volume being drained first, leaving the volume to not be able to be removed without
-f
.This behaviour was observed with hetzner cloud csi and democratic-csi local hostpath.
This does not work:
This works:
Note that even in this case, the status change from "in use" to "created" is triggered by the availability update, which leads me to believe that there are some state transitions events that we are missing.
Reproduce
docker volume ls --cluster
docker volume ls --cluster
Expected behavior
Unused CSI volumes should automatically switch from "in use" to "created".
docker version
Client: Docker Engine - Community Version: 23.0.6 API version: 1.42 Go version: go1.19.9 Git commit: ef23cbc Built: Fri May 5 21:18:22 2023 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 23.0.6 API version: 1.42 (minimum version 1.12) Go version: go1.19.9 Git commit: 9dbdbd4 Built: Fri May 5 21:18:22 2023 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.21 GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8 runc: Version: 1.1.7 GitCommit: v1.1.7-0-g860f061 docker-init: Version: 0.19.0 GitCommit: de40ad0
docker info
Additional Info
Workaround with draining before removing the service originally discovered by @sidpalas with Hetzner Cloud CSI
The text was updated successfully, but these errors were encountered: