Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: update out of sequence with docker stack deploy #39891

Open
leshik opened this issue Sep 10, 2019 · 9 comments
Open

BUG: update out of sequence with docker stack deploy #39891

leshik opened this issue Sep 10, 2019 · 9 comments

Comments

@leshik
Copy link

leshik commented Sep 10, 2019

Description

docker stack deploy command sporadically fails with update out of sequence error.

Steps to reproduce the issue:

The issue started to occur after I added a few services that aren't replicated initially, like this:

deploy:
  replicas: 0
  restart_policy:
    condition: none

These services are then executed by https://github.com/crazy-max/swarm-cronjob on 5 and 10 minutes intervals. I suspect the issue might occur when a service is launched and the stack deploy happens at the same time.

I use several single-node swarm installations. The issue randomly occurs on any 1-3 out of 12 nodes.

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.1
 API version:       1.40
 Go version:        go1.12.5
 Git commit:        74b1e89
 Built:             Thu Jul 25 21:21:05 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.1
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.5
  Git commit:       74b1e89
  Built:            Thu Jul 25 21:19:41 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.6
  GitCommit:        894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc:
  Version:          1.0.0-rc8
  GitCommit:        425e105d5a03fabd737a126ad93d62a9eeede87f
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 30
  Running: 14
  Paused: 0
  Stopped: 16
 Images: 58
 Server Version: 19.03.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: qyrpyf47byo492puqam7sx76l
  Is Manager: true
  ClusterID: dvsyttzga29c0s6f2jtw2f9ds
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 192.168.1.74
  Manager Addresses:
   192.168.1.74:2377
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
 runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.18.0-25-generic
 Operating System: Ubuntu 18.04.2 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.683GiB
 Name: server6
 ID: TIZD:ZHVG:DHGA:MFMG:EWVO:NGKO:GBNF:LRDQ:OHOL:DXI5:4EW6:GS4V
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):

Bare metal. Lenovo TS150.

Related to #30794.

@trajano
Copy link

trajano commented Nov 3, 2019

I am also getting this sporadically in AWS. Though normally a redeploy fixes it. However, today (well the last 3 days) it hasn't been working. So what I've done was just delete the stack and recreate it, though I would rather know why and avoid this if possible.

@misaon
Copy link

misaon commented Feb 26, 2021

+1

1 similar comment
@moraveclukas
Copy link

+1

@cpuguy83
Copy link
Member

cpuguy83 commented Jul 7, 2021

Out of sequence happens when the incorrect sequence number is used to update a swarm object.
Sequence numbers are used prevent multiple conflicting updates at the same time.
So generally out of sequence would be either:

  1. API user bug not sending the correct sequence number
  2. At least 1 update occurred before your update but after you fetched the object

@gfmellado
Copy link

+1

@trajano
Copy link

trajano commented Oct 20, 2021

Out of sequence happens when the incorrect sequence number is used to update a swarm object. Sequence numbers are used prevent multiple conflicting updates at the same time. So generally out of sequence would be either:

That would be strange if you're the only one doing a stack deploy for a given stack.

@thaJeztah
Copy link
Member

That would be strange if you're the only one doing a stack deploy for a given stack.

If I'm not mistaken, some state changes (e.g. status starting -> running) that happen server-side also increment the sequence number. I know there has been some discussion around separating some of those to not increment the sequence, but there were some complications in doing so.

@renepardon
Copy link

I have the same problem when deploying multiple stack at the same time. So the build pipeline finishes multiple projects/services and deploys their stack on a single node Docker Swarm. Related output from Gitlab Pipeline with Ansible:

STDOUT:
Updating service mystack_media_cron (id: rlrglli46i0fmdspcgk9rx3ge)
Updating service mystack_media_db (id: 15bjupmo3d829k9owhz7ig6r5)
Updating service mystack_media_horizon (id: v0gw27bwh0eb9guodh2pxj4rl)
Updating service mystack_x-cli-healthcheck (id: i4p16j901pes746mpnlq2r83b)
STDERR:
failed to update service mystack_x-cli-healthcheck: Error response from daemon: rpc error: code = Unknown desc = update out of sequence
MSG:
non-zero return code

My executed command with Ansible at this time:

docker stack deploy -c <(docker-compose --env-file './media.env' -f './media.yml' config) --with-registry-auth mystack;

@thaJeztah
Copy link
Member

If I understand your comment correctly, and those multiple deploys act on the same stack/services, then that may be a race condition between them (and expected behaviour for the error); what would happen in that case is;

  • job 1 updates a service;
    • it fetches the current definition of the service (which returns, say, version=1)
    • it patches/updates the service
    • it sends the updated definition ("update version=1 with these changes")
  • job 2 updates a service;
    • it fetches the current definition of the service (which returns, say, version=1)
    • it patches/updates the service
    • it sends the updated definition ("update version=1 with these changes")

In the above, both job 1 and 2 try to update the service from "version=1", but one of them will already have sent the updated definition, at which point the swarm manager will reject it (because now "version=1" is no longer the current version).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants