Unused CSI Volumes stay in use if services using them are removed in the wrong order #45547

s4ke · 2023-05-16T21:40:19Z

Description

CSI volumes have an issue around state transitions which lets them stay "in use" if a service using them is removed without the volume being drained first, leaving the volume to not be able to be removed without -f.

This behaviour was observed with hetzner cloud csi and democratic-csi local hostpath.

This does not work:

martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker volume create   --driver neuroforgede/swarm-csi-local-path:v1.8.3   --availability active   --scope single   --sharing none   --type mount   my-csi-local-volume
my-csi-local-volume
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker service create   --name my-service   --mount type=cluster,src=my-csi-local-volume,dst=/usr/share/nginx/html   --publish 8080:80   nginx
boh6bvmtxoq8an04jqkx2ramr
overall progress: 1 out of 1 tasks 
1/1: running   [==================================================>] 
verify: Service converged 
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker service rm my-service 
my-service
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker volume update --availability drain my-csi-local-volume
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker volume ls --cluster
VOLUME NAME           GROUP     DRIVER                                     AVAILABILITY   STATUS
my-csi-local-volume             neuroforgede/swarm-csi-local-path:v1.8.3   drain          in use (1 node)
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker volume update --availability active my-csi-local-volume
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker volume ls --cluster
VOLUME NAME           GROUP     DRIVER                                     AVAILABILITY   STATUS
my-csi-local-volume             neuroforgede/swarm-csi-local-path:v1.8.3   active         in use (1 node)
(reverse-i-search)`': ^C
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker volume rm my-csi-local-volume
Error response from daemon: rpc error: code = FailedPrecondition desc = volume is still in use
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$

This works:

martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker volume create   --driver neuroforgede/swarm-csi-local-path:v1.8.3   --availability active   --scope single   --sharing none   --type mount   my-csi-local-volume
my-csi-local-volume
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker service create   --name my-service   --mount type=cluster,src=my-csi-local-volume,dst=/usr/share/nginx/html   --publish 8080:80   nginx
ngp9coiy14wresuseasgzri7v
overall progress: 1 out of 1 tasks 
1/1: running   [==================================================>] 
verify: Service converged 
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker volume update --availability drain my-csi-local-volume
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker service ls
ID             NAME         MODE         REPLICAS   IMAGE          PORTS
ngp9coiy14wr   my-service   replicated   0/1        nginx:latest   *:8080->80/tcp
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker service rm my-service 
my-service
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ vol^C
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker volume ls --cluster
VOLUME NAME           GROUP     DRIVER                                     AVAILABILITY   STATUS
my-csi-local-volume             neuroforgede/swarm-csi-local-path:v1.8.3   drain          in use (1 node)
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker volume update --availability active my-csi-local-volume
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker volume ls --cluster
VOLUME NAME           GROUP     DRIVER                                     AVAILABILITY   STATUS
my-csi-local-volume             neuroforgede/swarm-csi-local-path:v1.8.3   active         created
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker volume rm my-csi-local-volume
my-csi-local-volume
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$ docker volume ls --cluster
VOLUME NAME   GROUP     DRIVER    AVAILABILITY   STATUS
martinb@ubuntu:~/csi-plugins-for-docker-swarm/democratic-csi/local-hostpath$

Note that even in this case, the status change from "in use" to "created" is triggered by the availability update, which leads me to believe that there are some state transitions events that we are missing.

Reproduce

docker volume create --driver neuroforgede/swarm-csi-local-path:v1.8.3 --availability active --scope single --sharing none --type mount my-csi-local-volume
docker volume ls --cluster
docker service create --name my-service --mount type=cluster,src=my-csi-local-volume,dst=/usr/share/nginx/html --publish 8080:80 nginx
docker service rm my-service
wait 1 minute to see if something changes in docker volume ls --cluster
docker volume update --availability drain my-csi-local-volume
wait 1 minute to see if something changes in docker volume ls --cluster

Expected behavior

Unused CSI volumes should automatically switch from "in use" to "created".

docker version

Client: Docker Engine - Community
 Version:           23.0.6
 API version:       1.42
 Go version:        go1.19.9
 Git commit:        ef23cbc
 Built:             Fri May  5 21:18:22 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          23.0.6
  API version:      1.42 (minimum version 1.12)
  Go version:       go1.19.9
  Git commit:       9dbdbd4
  Built:            Fri May  5 21:18:22 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.4
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.17.3
    Path:     /usr/libexec/docker/cli-plugins/docker-compose
  scan: Docker Scan (Docker Inc.)
    Version:  v0.23.0
    Path:     /usr/libexec/docker/cli-plugins/docker-scan

Server:
 Containers: 66
  Running: 10
  Paused: 0
  Stopped: 56
 Images: 109
 Server Version: 23.0.6
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: jnluxx8q8ghcla3zq2obm3ro6
  Is Manager: true
  ClusterID: vowmrzc70hmfksc5kfj9hckpf
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 192.168.116.129
  Manager Addresses:
   192.168.116.129:2377
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
 Kernel Version: 5.15.0-71-generic
 Operating System: Ubuntu 20.04.6 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 24
 Total Memory: 38.29GiB
 Name: ubuntu
 ID: 754B:FW3A:TJ3E:V5NJ:63QM:6HAQ:J4KL:K7JK:3XYY:CPCB:OUZK:XFWI
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: ancieque
 Registry: https://index.docker.io/v1/
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional Info

Workaround with draining before removing the service originally discovered by @sidpalas with Hetzner Cloud CSI

The text was updated successfully, but these errors were encountered:

corhere · 2023-05-17T15:40:56Z

cc @dperny

s4ke · 2023-06-11T10:07:56Z

Note: I am happy to help out with any further debugging. If you can point me to the code that is related to these state transitions, I can also start investigating. (Still a bit new to the swarmkit/moby codebase)

dperny · 2023-06-22T15:44:08Z

One big weak point in CSI, and by extension Swarm's use of it, is that the state transitions of a CSI volume are strict. We cannot Unpublish a Volume on the Controller side until it has been Unpublished and Unstaged on the Node. To adhere to this restriction, we must get an affirmative signal from the Swarm Agent that a Volume has been successfully Unstaged.

If something goes wrong in the Unstage process, the Volume will be stuck "In Use", possibly forever. This could happen if the Node is struck by lightning or falls through a crack in reality into the great nothingness between worlds. I'm unsure how Kubernetes handles such a case.

This is present across plugins, though, which makes me strong suspect the problem is an issue with Swarm's implementation.

dperny · 2023-06-22T15:59:02Z

Actually, if I recall correctly, Volume removal may be lazy... We don't attempt to remove a Volume from a Node until we need the Volume elsewhere. It's not unlikely that a Task would be scheduled back to the same Node if it failed, and the assumption is that Node CSI operations are somewhat expensive, so we avoid removing and then immediately re-adding the Volume. I think. It's been a while.

s4ke · 2023-06-25T12:38:11Z

The problem with this lazy behaviour is then that iirc you are forced to use -f in Volume removal which then bypasses the plugin altogether in my experience. This then leaves the volume in e.g. Hetzner without being present in swarm anymore.

dperny · 2023-06-26T14:59:14Z

I see the problem now. The way freeing volumes works, we look for volumes to remove each time we do a pass of the scheduler. Deleting a Service doesn't cause a scheduling pass, so we don't free the Volumes. In theory, some other scheduling event, like creating a new service, ought to cause the scheduling pass that successfully frees the Volume? I need to check...

s4ke · 2023-07-01T19:11:19Z

So what would be the fix here then? Can we "force" a scheduling pass in the volume rm Operation to try it out?

TheSilkky · 2023-07-29T05:15:25Z

Has anyone made any progress on figuring this out? This issue has made CSI volumes pretty unreliable and often times services wont start without manual intervention, due to the volume getting "stuck".

s4ke · 2023-07-29T06:22:12Z

This could happen if the Node is struck by lightning or falls through a crack in reality into the great nothingness between worlds. I'm unsure how Kubernetes handles such a case.

I reread this issue @dperny and i think there ought to be something that force unstages (unrelated to how kubernetes does ist) . At least in a somewhat configurable manner.

Kubernetes seems to have something outside of the CSI spec that handles this case

hetznercloud/csi-driver#164 (comment)

container-storage-interface/spec#512

s4ke · 2023-07-29T06:27:46Z

@TheSilkky

Has anyone made any progress on figuring this out? This issue has made CSI volumes pretty unreliable and often times services wont start without manual intervention, due to the volume getting "Stuck".

Have you tried to see if another scheduling trigger fixes the volume not being unstaged?

s4ke · 2023-07-29T07:26:46Z

In theory, some other scheduling event, like creating a new service, ought to cause the scheduling pass that successfully frees the Volume? I need to check...

@dperny I can verify this. Simply creating an unrelated service causes the state transition to happen. Now the question is how do we trigger the schedule operation properly?

I mean knowing this we can come up with all kinds of workarounds, but I don't think we should leave it at that.

s4ke · 2023-07-29T07:31:38Z

It seems like this PR is addressing this already? moby/swarmkit#3144

TheSilkky · 2023-07-29T07:37:19Z

It seems like this PR is addressing this already? moby/swarmkit#3144

Has this been released yet?

s4ke · 2023-07-29T07:41:01Z

It's not vendored in moby/moby master yet, so I doubt it

see how the code is missing here: https://github.com/moby/moby/blob/master/vendor/github.com/moby/swarmkit/v2/manager/scheduler/scheduler.go

so I'd say we wait for Drew to weigh in, but the code changes I read in that PR look like it has been fixed by that PR.

TheSilkky · 2023-07-29T07:47:56Z

It's not vendored in moby/moby master yet, so I doubt it

see how the code is missing here: https://github.com/moby/moby/blob/master/vendor/github.com/moby/swarmkit/v2/manager/scheduler/scheduler.go

so I'd say we wait for Drew to weigh in, but the code changes I read in that PR look like it has been fixed by that PR.

That's good, hopefully it'll be released soon

s4ke · 2023-09-06T12:25:57Z

Looks like this will be in 25.0.0

s4ke added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage labels May 16, 2023

corhere added area/swarm area/volumes labels May 17, 2023

thaJeztah added the version/23.0 label May 20, 2023

TheSilkky mentioned this issue Jun 20, 2023

Swarm volume still marked as in use when no services are using it hetznercloud/csi-driver#422

Closed

s4ke mentioned this issue Jun 21, 2023

One bad CSI volume can stop all volumes from being scheduled moby/swarmkit#3120

Open

thaJeztah mentioned this issue Aug 10, 2023

vendor: github.com/moby/swarmkit/v2 v2.0.0-20230808164555-1983e41a9fff #46184

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unused CSI Volumes stay in use if services using them are removed in the wrong order #45547

Unused CSI Volumes stay in use if services using them are removed in the wrong order #45547

s4ke commented May 16, 2023 •

edited

corhere commented May 17, 2023

s4ke commented Jun 11, 2023 •

edited

dperny commented Jun 22, 2023

dperny commented Jun 22, 2023

s4ke commented Jun 25, 2023

dperny commented Jun 26, 2023

s4ke commented Jul 1, 2023

TheSilkky commented Jul 29, 2023

s4ke commented Jul 29, 2023 •

edited

s4ke commented Jul 29, 2023

s4ke commented Jul 29, 2023 •

edited

s4ke commented Jul 29, 2023

TheSilkky commented Jul 29, 2023

s4ke commented Jul 29, 2023 •

edited

TheSilkky commented Jul 29, 2023

s4ke commented Sep 6, 2023

Unused CSI Volumes stay in use if services using them are removed in the wrong order #45547

Unused CSI Volumes stay in use if services using them are removed in the wrong order #45547

Comments

s4ke commented May 16, 2023 • edited

Description

Reproduce

Expected behavior

docker version

docker info

Additional Info

corhere commented May 17, 2023

s4ke commented Jun 11, 2023 • edited

dperny commented Jun 22, 2023

dperny commented Jun 22, 2023

s4ke commented Jun 25, 2023

dperny commented Jun 26, 2023

s4ke commented Jul 1, 2023

TheSilkky commented Jul 29, 2023

s4ke commented Jul 29, 2023 • edited

s4ke commented Jul 29, 2023

s4ke commented Jul 29, 2023 • edited

s4ke commented Jul 29, 2023

TheSilkky commented Jul 29, 2023

s4ke commented Jul 29, 2023 • edited

TheSilkky commented Jul 29, 2023

s4ke commented Sep 6, 2023

s4ke commented May 16, 2023 •

edited

s4ke commented Jun 11, 2023 •

edited

s4ke commented Jul 29, 2023 •

edited

s4ke commented Jul 29, 2023 •

edited

s4ke commented Jul 29, 2023 •

edited