docker system prune deletes resources from services in "Created" state #38733

yorap · 2019-02-14T09:16:21Z

Description
Running docker system prune --force on a worker node is removing resources of swarm services in the state of "Created", which then prevents the service from start properly. The service will be recreated anyway.
The 60s delay is just for demo purposes. In our environment we had delay of 24 hour in order to rerun a service every 24h. so if then a docker system prune --force would be issued on a daily base as well this service never will be started properly.

Bret, fyi: @BretFisher

Steps to reproduce the issue:

setup 2 node swarm cluster (1 manager / 1 worker)
create docker-compose file:

version: '3.7'

services:
  test1:
    image: alpine:3.9
    command: |
      ash -c 'echo "start test1: `date`" && sleep 20 && echo "end test1: `date`"'
    deploy:
      restart_policy:
        condition: any
        delay: 60s
        max_attempts: 0
        window: 0s
      placement:
        constraints:
          - node.role == worker

deploy stack: docker stack deploy --compose-file docker-compose test
check on worker node if service is created docker ps -a

088764be4063        alpine:3.9                       "ash -c 'echo \"start…"   13 seconds ago       Created                                             test_test1.1.dc3ykvthbuvaa03jiypen18a9

running docker system prune --force on the worker node while the service is in "Created" state

Describe the results you received:
docker service logs test_test1 --follow

# service is not starting because of lack of resources

Describe the results you expected:
docker service logs test_test1 --follow

test_test1.1.vriry1pw67wc@worker-01    | start test1: Thu Feb 14 08:48:46 UTC 2019
test_test1.1.vriry1pw67wc@worker-01    | end test1: Thu Feb 14 08:49:06 UTC 2019
test_test1.1.cx795i38pc5d@worker-01    | start test1: Thu Feb 14 08:50:07 UTC 2019
test_test1.1.cx795i38pc5d@worker-01    | end test1: Thu Feb 14 08:50:27 UTC 2019
test_test1.1.uyfdrkdkm0q3@worker-01    | start test1: Thu Feb 14 08:51:29 UTC 2019
test_test1.1.uyfdrkdkm0q3@worker-01    | end test1: Thu Feb 14 08:51:49 UTC 2019
test_test1.1.tzrkcl3qawdf@worker-01    | start test1: Thu Feb 14 08:52:50 UTC 2019
test_test1.1.tzrkcl3qawdf@worker-01    | end test1: Thu Feb 14 08:53:10 UTC 2019
test_test1.1.czae6d3lbyqf@worker-01    | start test1: Thu Feb 14 08:54:11 UTC 2019
test_test1.1.czae6d3lbyqf@worker-01    | end test1: Thu Feb 14 08:54:31 UTC 2019
test_test1.1.6g0ltlwn86aj@worker-01    | start test1: Thu Feb 14 08:55:33 UTC 2019

Additional information you deem important (e.g. issue happens only occasionally):
Reproducible

Output of docker version:

Client:
 Version:           18.09.1
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        4c52b90
 Built:             Wed Jan  9 19:35:23 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Sun Feb 10 03:42:13 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Output of docker info:

Containers: 5
 Running: 5
 Paused: 0
 Stopped: 0
Images: 157
Server Version: 18.09.2
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
 NodeID: 81xj0robkzx43mn1cni5nce36
 Is Manager: true
 ClusterID: 7ussq9p95kz0buyiip6eja26s
 Managers: 5
 Nodes: 18
 Default Address Pool: 10.0.0.0/8
 SubnetSize: 24
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 10
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 172.16.44.111
 Manager Addresses:
  172.16.44.111:2377
  172.16.44.112:2377
  172.16.44.113:2377
  172.16.44.114:2377
  172.16.44.115:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9754871865f7fe2f4e74d43e2fc7ccd237edcbce
runc version: 09c8266bf2fcf9519a651b04ae54c967b9ab86ec
init version: fec3683
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-139-generic
Operating System: Ubuntu 16.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.67GiB
Name: manager-01
ID: 6K72:WUAQ:AMXN:6XH5:PI6W:3C7R:LCOI:NAHF:IB7U:6OGW:R3Q7:YBFH
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
 172.16.44.0/24
 127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):
Running on VMware vSpheres

The text was updated successfully, but these errors were encountered:

BretFisher · 2019-02-25T06:02:19Z

@yorap I know we talked about this before, so a few new and old thoughts:

prune is likely not swarm-aware, so it won't know about resources that are "not being used but may soon be used". This would include containers that are created but not running.
I thought you were doing docker image prune --all --force, which is what I currently recommend, It will clean up the big "space offender" but only remove images with no containers (stopped or started) depending on it.
system prune is more likely to cause issues because stopped containers could get removed, along with their images. Swarm also keeps 4 containers for each task around after they have stopped and system prune would likely break that too.
Note that docker stack rm doesn't remove named volumes, but people usually don't want those auto-pruned for safety.

Can you replicate if you use docker image prune --all --force instead?

GordonTheTurtle added the area/swarm label Feb 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docker system prune deletes resources from services in "Created" state #38733

docker system prune deletes resources from services in "Created" state #38733

yorap commented Feb 14, 2019

BretFisher commented Feb 25, 2019

docker system prune deletes resources from services in "Created" state #38733

docker system prune deletes resources from services in "Created" state #38733

Comments

yorap commented Feb 14, 2019

BretFisher commented Feb 25, 2019