Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to remove the DESIRED STATE of Remove in docker swarm #36527

Open
jipengzhu opened this issue Mar 8, 2018 · 5 comments
Open

how to remove the DESIRED STATE of Remove in docker swarm #36527

jipengzhu opened this issue Mar 8, 2018 · 5 comments

Comments

@jipengzhu
Copy link

jipengzhu commented Mar 8, 2018

Description

Steps to reproduce the issue:

  1. bind the service to a wrong hostname in a cluster and deploy the swarm stack
  2. remove the stack
  3. then the DESIRED STATE become to Remove and the tasks history can't be remove

Describe the results you received:

[zhujipeng@s17 swarm_cluster]$ sudo docker stack ps zookeeper
ID                  NAME                          IMAGE               NODE                DESIRED STATE       CURRENT STATE            ERROR                              PORTS
xolkwwhn3fqi        gpo0t6v855jdji2ih8uk6304v.1   zookeeper:latest                        Remove              Pending 34 seconds ago   "no suitable node (scheduling …"
vvvhp030ipqa        s1n2od7nm0vpzfajl3x8mtz5o.1   zookeeper:latest                        Remove              Pending 36 seconds ago   "no suitable node (scheduling …"
s2a8lmfbr0c5        yqenh5lyawdrij8ea2ql08ofi.1   zookeeper:latest                        Remove              Pending 37 seconds ago   "no suitable node (scheduling …"
tzp100pabmrz        wz8jz2x4rgecbvp10ngvcm5yl.1   zookeeper:latest                        Remove              Pending 32 minutes ago   "no suitable node (scheduling …"
l6klw2843ann        x2x0xgx5gtxnl25kj4as0a3sk.1   zookeeper:latest                        Remove              Pending 32 minutes ago   "no suitable node (scheduling …"
vhdivbapo83l        hwlp9os3qe409lzkbzao28ukp.1   zookeeper:latest                        Remove              Pending 32 minutes ago   "no suitable node (scheduling …"

[zhujipeng@s17 swarm_cluster]$ sudo docker inspect xolkwwhn3fqi | grep "\"Status" -A 7
        "Status": {
            "Timestamp": "2018-03-08T04:09:34.000715207Z",
            "State": "pending",
            "Message": "pending task scheduling",
            "Err": "no suitable node (scheduling constraints not satisfied on 3 nodes)",
            "ContainerStatus": {},
            "PortStatus": {}
        },

[zhujipeng@s17 swarm_cluster]$ sudo docker stack ls
NAME                SERVICES
elk                 2
registry            2

the zookeeper stack has been removed but bad tasks history left

Describe the results you expected:

remove the DESIRED STATE of Remove

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

[zhujipeng@s17 swarm_cluster]$ sudo docker version
Client:
 Version:	17.12.1-ce
 API version:	1.35
 Go version:	go1.9.4
 Git commit:	7390fc6
 Built:	Tue Feb 27 22:15:20 2018
 OS/Arch:	linux/amd64

Server:
 Engine:
  Version:	17.12.1-ce
  API version:	1.35 (minimum version 1.12)
  Go version:	go1.9.4
  Git commit:	7390fc6
  Built:	Tue Feb 27 22:17:54 2018
  OS/Arch:	linux/amd64
  Experimental:	false

Output of docker info:

[zhujipeng@s17 swarm_cluster]$ sudo docker info
Containers: 4
 Running: 4
 Paused: 0
 Stopped: 0
Images: 5
Server Version: 17.12.1-ce
Storage Driver: devicemapper
 Pool Name: docker-8:1-2215070851-pool
 Pool Blocksize: 65.54kB
 Base Device Size: 10.74GB
 Backing Filesystem: xfs
 Udev Sync Supported: true
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Data Space Used: 2.694GB
 Data Space Total: 107.4GB
 Data Space Available: 104.7GB
 Metadata Space Used: 4.792MB
 Metadata Space Total: 2.147GB
 Metadata Space Available: 2.143GB
 Thin Pool Minimum Free Space: 10.74GB
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.140-RHEL7 (2017-05-03)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: qxwl5v16zl1kkn007xnhplvxh
 Is Manager: true
 ClusterID: p7vkrqq3jjez5ipslit6xmwv4
 Managers: 1
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-327.22.2.el7.ttm.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 47.17GiB
ID: GP5L:DZB5:OJIN:4CA3:P2EK:XCTA:56NK:N2DB:L3WN:YHED:EQ2D:X2L2
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 registry.my.com:5000
 127.0.0.0/8
Registry Mirrors:
 https://registry.docker-cn.com/
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):

@eferley
Copy link

eferley commented Mar 13, 2018

Hello, I have the same issue.

Since for the moment my setup is only for training and tests (on AWS EC2 instances), I'll destroy and re-create everything from scratch, but that would not be fun in a production environment.

Edit : Perhaps related to moby/swarmkit#2555 ??

Description:

  • deployed a stack with a wrong node hostname constraint on one of the services
  • "pending" task appeared
  • was not able to find a way to remove pending tasks
  • scaled down the bad service to 0 replica
  • "pending" task still there
  • removed the stack
  • recreated the stack with the same name (with correct constraint in compose YAML)
  • stack deployed ok, but old "pending" task reappeared as well
  • tried to set swarm task history limit to 1 then to 0 with no effect, went back to the default of 5

Result : "pending" tasks that cannot be removed

# docker stack ps test :

ID                  NAME                              IMAGE                      NODE                DESIRED STATE       CURRENT STATE         ERROR                              PORTS
[...]
vtesoy4pzocj        test_traefik.1                    traefik:latest             dockmgr1c           Running             Running 2 hours ago                                      *:8080->8080/tcp,*:443->443/tcp,*:80->80/tcp
[...]
85d93acvwuqt        wrxqffnl7071ekscumvwh6htk.1       traefik:latest                                 Remove              Pending 2 hours ago   "no suitable node (scheduling …"
[...]

# docker inspect 85d9 :

[...]
        "Status": {
            "Timestamp": "2018-03-13T14:09:59.978723021Z",
            "State": "pending",
            "Message": "pending task scheduling",
            "Err": "no suitable node (scheduling constraints not satisfied on 3 nodes)",
            "ContainerStatus": {},
            "PortStatus": {}
        },
        "DesiredState": "remove",
[...]

Expected:

  • Removal of pending tasks during docker stack rm
  • A way to manually remove orphaned pending tasks ?

Environment :

  • Hosts : AWS AmazonLinux T2-Micro
  • Docker 17.12.0-ce
  • Swarm mode : 3 managers + 0 worker
  • Initial install was Docker 17.09.1-ce, updated to 17.12.0-ce with sudo yum -y update

# docker version

Client:
 Version:       17.12.0-ce
 API version:   1.35
 Go version:    go1.9.2
 Git commit:    3dfb8343b139d6342acfd9975d7f1068b5b1c3d3
 Built: Mon Mar  5 20:42:27 2018
 OS/Arch:       linux/amd64

Server:
 Engine:
  Version:      17.12.0-ce
  API version:  1.35 (minimum version 1.12)
  Go version:   go1.9.2
  Git commit:   402dd4a/17.12.0-ce
  Built:        Mon Mar  5 20:43:34 2018
  OS/Arch:      linux/amd64
  Experimental: false

# docker info

Containers: 7
 Running: 3
 Paused: 0
 Stopped: 4
Images: 4
Server Version: 17.12.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: ficiwe7e7qyu7pajxmid8s2jl
 Is Manager: true
 ClusterID: w2a2osuql2xbaoms02tlskvc8
 Managers: 3
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 10.30.53.60
 Manager Addresses:
  10.30.49.89:2377
  10.30.51.25:2377
  10.30.53.60:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.85-37.55.amzn1.x86_64
Operating System: Amazon Linux AMI 2017.09
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 993.2MiB
Name: ip-10-30-53-60
ID: VWSJ:6DA7:OC4T:XPCX:MHH4:YNHY:FMFP:N4Y4:CKUT:QLPN:2ORN:ZZ4J
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

compose yaml file

version: "3.4"
services:
  traefik:
[...]
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints:
          - node.role == manager
          - node.hostname == **wrong-hostname**
      update_config:
        parallelism: 1
        delay: 10s
      restart_policy:
        condition: on-failure
[...]

@dmilind
Copy link

dmilind commented Apr 26, 2019

is there any resolution to this issue ? I am facing the same issue where after removing the stack , tasks are not removed and Desired state is still "Remove".
We can not start the service in other stack or network also.

@gypark
Copy link

gypark commented Oct 29, 2021

I am facing the same problem now... Any help would be appreciated.

la2lt0y6mgsl        uhglwxnf3d1...   postgres_sr:10.3-20200103      API5          Remove              Pending 29 minutes ago   "host-mode port already in use…"

My swarm cluster is being used for production services. So I can't try to remove the stack or the entire cluster. I wish to just remove this specific pending task.

@zucatti
Copy link

zucatti commented Jan 4, 2023

same here in 2022 !!!!

@pschichtel
Copy link

I seem to have the same / a similar issue with docker's docker service scale command, when scaling services down to 0. With the update to docker 24 it started, that the first docker service scale svc_name=0 set's the replicas option to 0 and changes the tasks' desired state to Remove, however no step is taken towards actually reaching that state. The particular service is using a custom stop signal (SIGUSR2) to perform a graceful shutdown, but that signal never arrives at the container. Running the command twice works around the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants