Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swarm mode : number of replicas exceeding the desired count #36553

Open
vce-xx opened this issue Mar 10, 2018 · 2 comments
Open

Swarm mode : number of replicas exceeding the desired count #36553

vce-xx opened this issue Mar 10, 2018 · 2 comments

Comments

@vce-xx
Copy link

vce-xx commented Mar 10, 2018

Description

Notice bellow that docker stack services reported 2/1 replicas.

 vce > docker stack services blue
ID                  NAME                MODE                REPLICAS            IMAGE                                            PORTS
...
lm3nueuigt0x        blue_info           replicated          2/1                 xxx/info:723                                     *:30018->80/tcp
...
 vce > docker service ps blue_info
ID               NAME                IMAGE            NODE      DESIRED STATE     CURRENT STATE           ERROR               PORTS
jns2wmbzl9vx     blue_info.1         xxx/info:721     node1     Shutdown          Shutdown 14 hours ago
0b5dhaw8g1kb      \_ blue_info.1     xxx/info:626     node1     Shutdown          Running 4 days ago
zrju1v7w3f68      \_ blue_info.1     xxx/info:622     node1     Shutdown          Shutdown 4 days ago
l29nz3bed8pg     blue_info.2         xxx/info:723     node2     Running           Running 2 minutes ago

Steps to reproduce the issue:

No recipe to reproduce yet.
What I have done :

  1. The info service was defined as described bellow
  2. The stack was deployed and redeployed over and over with docker stack deploy -c
  3. Downscaled the managers from 3 to 1 at some point
  4. Continued to redeploy util I noticed this oddity
version: "3.5"

services:

  info:
    image: xxx/info:${TAG}
    deploy:
      restart_policy:
        condition: on-failure
        delay: 5s
        max_attempts: 3
        window: 15s
    healthcheck:
      interval: 10s
    networks:
      - mynet
    ports:
      - 80

Describe the results you received:

We can see 2 tasks in Running state.

Describe the results you expected:

1 task in Running state is expected.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

docker version
Client:
 Version:	18.03.0-ce-rc1
 API version:	1.35 (downgraded from 1.37)
 Go version:	go1.9.4
 Git commit:	c160c73
 Built:	Thu Feb 22 02:34:03 2018
 OS/Arch:	darwin/amd64
 Experimental:	false
 Orchestrator:	swarm

Server:
 Engine:
  Version:	17.12.0-ce
  API version:	1.35 (minimum version 1.12)
  Go version:	go1.9.2
  Git commit:	c97c6d6
  Built:	Wed Dec 27 20:12:30 2017
  OS/Arch:	linux/amd64
  Experimental:	true

Output of docker info:

Containers: 19
 Running: 16
 Paused: 0
 Stopped: 3
Images: 52
Server Version: 17.12.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: awslogs
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: onoskypeib1tvsu0nheyjwdsg
 Is Manager: true
 ClusterID: mgn365vf830acn90tf81ftmvv
 Managers: 1
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: xxx
 Manager Addresses:
  xxx
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 89623f28b87a6004d4b785663257362d1658a729
runc version: b2567b37d7b75eb4cf325b77297b140ea686ce8f
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.75-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 3.853GiB
Name: xxx
ID: xxx
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 249
 Goroutines: 429
 System Time: 2018-03-10T09:14:34.020756034Z
 EventsListeners: 12
Registry: https://index.docker.io/v1/
Labels:
 availability_zone=xxx
 instance_type=xxx
 node_type=manager
 os=linux
 region=xxx
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.):
Swarm cluster with only one manager in this occurence.

@rnataraja
Copy link

Seen on 17.06.2-ce too. Potentially the trigger was NTP time sync difference of about 15mins.

@vce-xx
Copy link
Author

vce-xx commented Mar 23, 2018

In this case it happened with Docker for AWS.
Can time desync happen with Docker for AWS ?
Does Docker for AWS use Amazon Time Sync Service ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants