Unexpected stop-grace-period behaviour during service update #36111

antongocode · 2018-01-25T11:07:35Z

When updating a service with stop-grace-period set to > 1m, swarm starts a second instance after one minute instead of waiting for the first instance to shutdown.

The following service log shows the behaviour:

grace-test_grace.1.utgh2ncsc4d0@linuxkit-025000000001    | 2018/01/25 10:53:54 Starting new server
grace-test_grace.1.utgh2ncsc4d0@linuxkit-025000000001    | 2018/01/25 10:54:03 Signal terminated received, sleeping for 1m30s
grace-test_grace.1.f5puij54d7jj@linuxkit-025000000001    | 2018/01/25 10:55:01 Starting new server
grace-test_grace.1.utgh2ncsc4d0@linuxkit-025000000001    | 2018/01/25 10:55:33 Exit

The service was set to have a 2m grace period and sleeps for 1m30s before it actually exits.

During the overlap docker ps and docker service ls gives the following output:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS               NAMES
063c1791e518        grace-test:local    "/gracetest --grace=…"   About a minute ago   Up 10 seconds                           grace-test_grace.1.f5puij54d7jjdhczilvicy58q
bd670300df3d        grace-test:local    "/gracetest --grace=…"   About a minute ago   Up About a minute                       grace-test_grace.1.utgh2ncsc4d0targxw53n181f
$ docker service ls
ID                  NAME                MODE                REPLICAS            IMAGE               PORTS
o8vhn4wan8at        grace-test_grace    replicated          2/1                 grace-test:local

Steps to reproduce the issue:

I've created a repo to reproduce the issue.

Describe the results you received:

Number of service instance greater than one during service update.

Describe the results you expected:

Docker should wait for running instance to exit before starting a new instance.

Output of docker version:

Client:
 Version:	17.12.0-ce
 API version:	1.35
 Go version:	go1.9.2
 Git commit:	c97c6d6
 Built:	Wed Dec 27 20:03:51 2017
 OS/Arch:	darwin/amd64

Server:
 Engine:
  Version:	17.12.0-ce
  API version:	1.35 (minimum version 1.12)
  Go version:	go1.9.2
  Git commit:	c97c6d6
  Built:	Wed Dec 27 20:12:29 2017
  OS/Arch:	linux/amd64
  Experimental:	true

Output of docker info:

Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 174
Server Version: 17.12.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: w2g9pkg9miox6yli2p49i4txc
 Is Manager: true
 ClusterID: lgw4odktbkldt5s1qv6q5smsx
 Managers: 1
 Nodes: 1
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 192.168.65.2
 Manager Addresses:
  192.168.65.2:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.60-linuxkit-aufs
Operating System: Docker for Mac
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 2.933GiB
Name: linuxkit-025000000001
ID: PJT5:IYFU:5UXH:IT2J:TLSN:7XKD:UFLR:IOBN:YYAM:VYTJ:T7G2:356N
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 49
 Goroutines: 201
 System Time: 2018-01-25T11:04:35.386691279Z
 EventsListeners: 2
HTTP Proxy: docker.for.mac.http.internal:3128
HTTPS Proxy: docker.for.mac.http.internal:3129
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

The text was updated successfully, but these errors were encountered:

kurtrwall · 2018-11-12T16:28:27Z

Running into this issue right now and the workaround I've come up with isn't very nice. Any guidance or attention to this "bug"?

kevb · 2019-05-05T12:11:18Z

Same issue -

Expected: SIGKILL after stop-grace-period, and expected next container to start after first terminates (soon after SIGKILL).

Actual: SIGKILL is sent and new container starts around the same time, regardless if first container has terminated completely.

Any workaround I can think of right now involves some sort of 'lock' and feels bad... really appreciate if someone can validate this bug and/or suggest workarounds

GordonTheTurtle added area/swarm platform/desktop labels Jan 25, 2018

antongocode changed the title ~~Unexpected stop-grace-period behaviour during update~~ Unexpected stop-grace-period behaviour during service update Jan 25, 2018

GordonTheTurtle added area/swarm platform/desktop labels Jan 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected stop-grace-period behaviour during service update #36111

Unexpected stop-grace-period behaviour during service update #36111

antongocode commented Jan 25, 2018

kurtrwall commented Nov 12, 2018

kevb commented May 5, 2019

Unexpected stop-grace-period behaviour during service update #36111

Unexpected stop-grace-period behaviour during service update #36111

Comments

antongocode commented Jan 25, 2018

kurtrwall commented Nov 12, 2018

kevb commented May 5, 2019