New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replicated services ignore --replicas #42741
Comments
For a case "replicated-job":
concurrent := options.maxConcurrent.Value()
if concurrent == nil {
concurrent = options.replicas.Value()
}
serviceMode.ReplicatedJob = &swarm.ReplicatedJob{
MaxConcurrent: concurrent,
TotalCompletions: options.replicas.Value(),
} @dperny PTAL |
I did a little more troubleshooting with a simpler service, using an alpine image to echo hello: docker service create \
--restart-max-attempts 0 \
--mode replicated-job \
--max-concurrent 1 \
--replicas 1 \
--restart-condition none \
--name git-gc \
--workdir /git-repo/configs_single_repo.git \
alpine:latest echo hello It appears that the replicated job functionalities are completely ignored and docker will spin up more and more containers: $ docker service logs git-gc | wc -l
64
$ docker service logs git-gc | wc -l
72
$ docker service logs git-gc | wc -l
76
$ docker service logs git-gc | wc -l
84
$ docker service logs git-gc | wc -l
88 I expected to find a single "hello" in my logs, instead more and more are logged until I remove the service. help is appreciated, thanks |
I am also hitting this issue... I need to run only 1 job, and docker is starting 2... no matter what values I use for replicas and max-concurrent, there is no way to run only 1 task... I am using 20.10.17 |
@dperny PTAL |
I have same issue here, can we have an update on this ? Thanks |
Is there work being done here? I can confirm this is still an issue with Docker 23.0.1 |
Can we update the issue label to show the current version? I can also confirm this issue with Docker Engine 23.0.1 |
I can reproduce in Docker 23 if have multiple managers: |
I can confirm this is also fixed for me now. Trying in version |
Trying to gather more information in order to submit a new issue, but it appears that while this particular issue is fixed, swarm starts all |
We've been using Docker version 24.0.5 from the Ubuntu docker.io package, build 24.0.5-0ubuntu1~20.04.1 in swarm mode on Ubuntu 20.04 since September '23 when this version was released. In the past 14 days the behaviour of replicated-job has changed. With a command such as the one below to create replicated-job with 1 replica, we now see two or more tasks are started instead of just one. If I constrain the service to a particular node, all tasks start on the same node: boite@mynode-1:~$ sudo docker service create --name my_job --detach --max-concurrent 1 --secret source=env_local_2023_08_17_01,target=/app/.env.local,uid=33
,gid=33 --mode replicated-job --limit-memory 2GB --limit-cpu 2.0 --restart-condition none --with-registry-auth --constraint node.hostname==mynode-3 ghcr.io/acme/aservice:worker php /app/bin/console app:multitask:device --recency=PT3M retryFailedUploads && sudo docker service logs -ft my_job
x4uoj3h62i395aaf5ij3nl2r6
2024-03-14T11:41:21.358156432Z my_job.0.k6jie6184rjw@mynode-3 | Successfully dumped .env files in .env.local.php
2024-03-14T11:41:21.362644645Z my_job.0.vz4kwem551f1@mynode-3 | Successfully dumped .env files in .env.local.php
2024-03-14T11:41:21.775511964Z my_job.0.vz4kwem551f1@mynode-3 |
2024-03-14T11:41:21.775567432Z my_job.0.vz4kwem551f1@mynode-3 | I shall task 187 devices to RetryFailedUploads in batches of 10, pausing for 60 second
s between each batch.
2024-03-14T11:41:21.779584492Z my_job.0.k6jie6184rjw@mynode-3 |
2024-03-14T11:41:21.779627370Z my_job.0.k6jie6184rjw@mynode-3 | I shall task 187 devices to RetryFailedUploads in batches of 10, pausing for 60 second
s between each batch.
2024-03-14T11:41:21.792793554Z my_job.0.vz4kwem551f1@mynode-3 | 10 devices so far have been tasked.
2024-03-14T11:41:21.795525639Z my_job.0.k6jie6184rjw@mynode-3 | 10 devices so far have been tasked.
^C
boite@mynode-1:~$ sudo docker service rm my_job
my_job This was working perfectly fine earlier this month - just one task would start in this scenario. Does anyone have any clue what's wrong and how to work around it this issue? |
I can confirm that bug still exists. Seems it doesn't depend on version, but on some cluster state. Maybe this bug is caused by If cluster state is buggy, it is easy to reproduce:
It will run continuously, looping
In log multiple messages are repeated:
I've seen it on v25.0.3 and two manager nodes. Workaround: demote one manager node, that has errors in log
|
@Vanav we have been hitting the same issue. Just hit it now, and when inspecting the manager nodes logs, on one of the managers I found the
I don't know much about the overflow error, but I note there are some errors leading up to this event, like "the swarm does not have a leader" - it seems there was a leader re-election. I wonder if this has confused the cluster state causing one manager to run away with scheduling additional instances of the job? Here are my logs (I have sanitised machine names, IP Addresses etc) Note: the replicated-job that proceeded to "run away" was created at 12:08 UTC Manager Node 3 - which is the one with the "overflow" error Manager Node 2 Manager Node 1 |
Description
When trying to configure a replicated service I found docker ignores the
--replicas 1
and creates more than one task.Steps to reproduce the issue:
when attempting to create the following service:
docker service create \ --mount 'type=volume,dst=/git-repo,volume-driver=local,volume-opt=type=nfs,volume-opt=device=:/my-device/sub-dir/,"volume-opt=o=addr=10.10.10.10"' \ --with-registry-auth --restart-max-attempts 0 --mode replicated-job --replicas 1 --restart-condition none \ --name git-gc --workdir /git-repo/my-repo \ alpine/git:latest gc
Describe the results you received:
the service runs and completes, the restart conditions are honoured, however the replication is not:
despite
--replicas 1
, docker still creates two duplicated tasks, which leads to the service failing.the service inspection still shows the correct parameters:
Describe the results you expected:
Docker creates a single task that runs
git gc
.Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
The text was updated successfully, but these errors were encountered: