BUG: update out of sequence with docker stack deploy #39891

leshik · 2019-09-10T04:32:11Z

Description

docker stack deploy command sporadically fails with update out of sequence error.

Steps to reproduce the issue:

The issue started to occur after I added a few services that aren't replicated initially, like this:

deploy:
  replicas: 0
  restart_policy:
    condition: none

These services are then executed by https://github.com/crazy-max/swarm-cronjob on 5 and 10 minutes intervals. I suspect the issue might occur when a service is launched and the stack deploy happens at the same time.

I use several single-node swarm installations. The issue randomly occurs on any 1-3 out of 12 nodes.

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.1
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        74b1e89
 Built:             Thu Jul 25 21:21:05 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89
  Built:            Thu Jul 25 21:19:41 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 30
  Running: 14
  Paused: 0
  Stopped: 16
 Images: 58
 Server Version: 19.03.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: qyrpyf47byo492puqam7sx76l
  Is Manager: true
  ClusterID: dvsyttzga29c0s6f2jtw2f9ds
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 192.168.1.74
  Manager Addresses:
   192.168.1.74:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.18.0-25-generic
 Operating System: Ubuntu 18.04.2 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.683GiB
 Name: server6
 ID: TIZD:ZHVG:DHGA:MFMG:EWVO:NGKO:GBNF:LRDQ:OHOL:DXI5:4EW6:GS4V
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

Bare metal. Lenovo TS150.

Related to #30794.

The text was updated successfully, but these errors were encountered:

trajano · 2019-11-03T18:25:54Z

I am also getting this sporadically in AWS. Though normally a redeploy fixes it. However, today (well the last 3 days) it hasn't been working. So what I've done was just delete the stack and recreate it, though I would rather know why and avoid this if possible.

misaon · 2021-02-26T00:20:05Z

+1

moraveclukas · 2021-07-07T10:07:48Z

+1

cpuguy83 · 2021-07-07T15:37:49Z

Out of sequence happens when the incorrect sequence number is used to update a swarm object.
Sequence numbers are used prevent multiple conflicting updates at the same time.
So generally out of sequence would be either:

API user bug not sending the correct sequence number
At least 1 update occurred before your update but after you fetched the object

gfmellado · 2021-10-20T09:32:29Z

+1

trajano · 2021-10-20T21:48:58Z

Out of sequence happens when the incorrect sequence number is used to update a swarm object. Sequence numbers are used prevent multiple conflicting updates at the same time. So generally out of sequence would be either:

That would be strange if you're the only one doing a stack deploy for a given stack.

thaJeztah · 2021-10-21T17:31:11Z

That would be strange if you're the only one doing a stack deploy for a given stack.

If I'm not mistaken, some state changes (e.g. status starting -> running) that happen server-side also increment the sequence number. I know there has been some discussion around separating some of those to not increment the sequence, but there were some complications in doing so.

renepardon · 2022-05-25T08:20:33Z

I have the same problem when deploying multiple stack at the same time. So the build pipeline finishes multiple projects/services and deploys their stack on a single node Docker Swarm. Related output from Gitlab Pipeline with Ansible:

STDOUT:
Updating service mystack_media_cron (id: rlrglli46i0fmdspcgk9rx3ge)
Updating service mystack_media_db (id: 15bjupmo3d829k9owhz7ig6r5)
Updating service mystack_media_horizon (id: v0gw27bwh0eb9guodh2pxj4rl)
Updating service mystack_x-cli-healthcheck (id: i4p16j901pes746mpnlq2r83b)
STDERR:
failed to update service mystack_x-cli-healthcheck: Error response from daemon: rpc error: code = Unknown desc = update out of sequence
MSG:
non-zero return code

My executed command with Ansible at this time:

docker stack deploy -c <(docker-compose --env-file './media.env' -f './media.yml' config) --with-registry-auth mystack;

thaJeztah · 2022-05-26T14:13:41Z

If I understand your comment correctly, and those multiple deploys act on the same stack/services, then that may be a race condition between them (and expected behaviour for the error); what would happen in that case is;

job 1 updates a service;
- it fetches the current definition of the service (which returns, say, version=1)
- it patches/updates the service
- it sends the updated definition ("update version=1 with these changes")
job 2 updates a service;
- it fetches the current definition of the service (which returns, say, version=1)
- it patches/updates the service
- it sends the updated definition ("update version=1 with these changes")

In the above, both job 1 and 2 try to update the service from "version=1", but one of them will already have sent the updated definition, at which point the swarm manager will reject it (because now "version=1" is no longer the current version).

thaJeztah added the area/swarm label Oct 1, 2019

misaon mentioned this issue Feb 23, 2021

Error when update existing stack via Portainer API portainer/portainer#4877

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: update out of sequence with docker stack deploy #39891

BUG: update out of sequence with docker stack deploy #39891

leshik commented Sep 10, 2019

trajano commented Nov 3, 2019 •

edited

misaon commented Feb 26, 2021

moraveclukas commented Jul 7, 2021

cpuguy83 commented Jul 7, 2021

gfmellado commented Oct 20, 2021

trajano commented Oct 20, 2021

thaJeztah commented Oct 21, 2021

renepardon commented May 25, 2022

thaJeztah commented May 26, 2022

BUG: update out of sequence with docker stack deploy #39891

BUG: update out of sequence with docker stack deploy #39891

Comments

leshik commented Sep 10, 2019

trajano commented Nov 3, 2019 • edited

misaon commented Feb 26, 2021

moraveclukas commented Jul 7, 2021

cpuguy83 commented Jul 7, 2021

gfmellado commented Oct 20, 2021

trajano commented Oct 20, 2021

thaJeztah commented Oct 21, 2021

renepardon commented May 25, 2022

thaJeztah commented May 26, 2022

trajano commented Nov 3, 2019 •

edited