Remove containers on swarm service scale down #1372

igrcic · 2016-08-15T10:03:15Z

Hi everyone,

first I'd like to ask if this is the right repo to be asking questions regarding to docker swarm * engine commands?

I am currently testing docker swarm engine in order to see if I can use this new functionality for some of our (non-critical) production environment.

I noticed that when scaling down services, containers in exited status remain (1.12) in docker. When I was testing beta2 or beta3 the containers were automatically removed.

Can one achieve this behavior somehow in v1.12?

Thank you,
Ivan

The text was updated successfully, but these errors were encountered:

aaronlehmann · 2016-08-15T16:26:36Z

By default, we leave the most recent 5 containers per replica in place to aid troubleshooting (for example, providing the ability to execute a shell in one of these containers and inspect logs). This is a configurable setting, so you could set the number of containers to retain to 0 with docker swarm update --task-history-limit 0. This should provide the behavior you're looking for. Note that it will cause exited or failed tasks not to show up in docker service ps, since those entries are removed when the containers are cleaned up.

cc @sfsmithcha - not sure if this is well-documented.

stevvooe · 2016-08-16T02:05:16Z

@igrcic Note that it is also safe to remove exited containers manually. Something like the following can run on the work nodes periodically:

$ docker ps -qa -f label=com.docker.swarm.task -f status=exited | xargs docker rm -f

This can be used if you'd like to have a more aggressive policy about cleaning up resources but would like to have some task history for debugging purposes.

igrcic · 2016-08-17T17:01:01Z

Thank you @aaronlehmann,

i have tried with --task-history-limit 0, but it doesnt really do anything. Scaling from 7 to 1 instances leaves 6 exited ones.

Docker version 1.12.0, build 8eab29e

@stevvooe tnx i'm already doing that. I just thought that TaskReaper thingy can do it for us though ;)

aaronlehmann · 2016-08-17T17:03:33Z

I know there is a bug where --task-history-limit 0 is not honored as an argument to docker swarm init. I'd expect it to work with docker swarm update, though. If it isn't, we should investigate that. Did you update the swarm or create a new one?

igrcic · 2016-08-17T17:11:09Z

Hi,

i just tried with both, update and init. Task history limit is not honored (nor is the default value of 5)

igrcic · 2016-08-17T17:13:37Z

I guess its related to moby/moby#24394

aaronlehmann · 2016-08-17T17:15:24Z

The limit is per replica, so you wouldn't see the default limit of 5 come into play until at least one replica restarts 5 times.

When you try with a limit of 0, do the old scaled-down tasks disappear from docker service ps?

stevvooe · 2016-08-17T20:27:32Z

@igrcic For the most part, orphan containers have been mitigated. There may still be a slight race condition described in moby/moby#24858, but it should be unrealistic.

In addition to troubleshooting suggested by @aaronlehmann, it would also be good to get the output of the logs during the period when you expect removal to happen. There could be some condition on your hosts that are preventing the removal from proceeding and we'd need to track that down.

igrcic · 2016-08-18T16:17:56Z

tnx @aaronlehmann now i see what you mean, this actually applies for service replica history count (docker service ps {serviceName})

In that case, no it doesnt work for me, it always stays at 4+1 replicas

@stevvooe the only thing I can see in logs is:

time="2016-08-18T17:37:45.738067694+02:00" level=info msg="Failed to delete real server 10.255.0.39 for vip 10.255.0.8 fwmark 260: no such file or directory" 
time="2016-08-18T17:37:45.738237636+02:00" level=info msg="Failed to delete real server 10.255.0.39 for vip 10.255.0.8 fwmark 260: no such file or directory" 
time="2016-08-18T17:37:45.738683745+02:00" level=info msg="Failed to delete real server 10.255.0.19 for vip 10.255.0.8 fwmark 260: no such file or directory" 
time="2016-08-18T17:37:45.738881686+02:00" level=info msg="Failed to delete real server 10.255.0.19 for vip 10.255.0.8 fwmark 260: no such file or directory" 
time="2016-08-18T17:37:45.739098041+02:00" level=info msg="Failed to delete real server 10.255.0.34 for vip 10.255.0.8 fwmark 260: no such file or directory" 
time="2016-08-18T17:37:45.739244895+02:00" level=info msg="Failed to delete real server 10.255.0.34 for vip 10.255.0.8 fwmark 260: no such file or directory" 
time="2016-08-18T17:37:45.739463884+02:00" level=info msg="Failed to delete real server 10.255.0.28 for vip 10.255.0.8 fwmark 260: no such file or directory" 
time="2016-08-18T17:37:45.739611479+02:00" level=info msg="Failed to delete real server 10.255.0.28 for vip 10.255.0.8 fwmark 260: no such file or directory"

Do not know if that is related though.

Tnx,
Ivan

aaronlehmann · 2016-08-18T18:45:34Z

I tried this out and changing --task-history-limit seems to work for me. Note that the change won't take effect instantly - tasks need to restart for old containers to get cleaned up according to the new limit. But when I create a swarm with a task history limit set, it looks like that limit is respected. And when I update that limit with swarm update, that's also respected as soon as the a task restarts.

BTW, I misremembered how the limit is applied. It actually counts all tasks for a given instance, not just the old tasks. So --task-history-limit 1 should do what you want (although 0 should do the same thing).

aluzzardi · 2016-08-28T00:45:56Z

@aaronlehmann Should we actually nuke tasks when we scale down? No point in keeping the history for those I guess?

Slot history makes sense for crashes, rolling updates, etc but when you scale down they're pretty useless.

aaronlehmann · 2016-08-29T01:14:16Z

The only reason I can think of to keep them is to show a record of the scaling itself. It's not a very strong reason. Deleting the tasks immediately when scaling down would probably be fine.

stevvooe · 2016-08-29T21:29:29Z

@aaronlehmann @aluzzardi Delete or not, these can be removed from the assignment set.

igrcic · 2016-09-26T14:17:56Z

Hi all,

+1 for nuking the tasks :)

@stevvooe can you explain what do you mean by that?

stevvooe · 2016-09-26T19:40:04Z

@igrcic We dispatch tasks by maintaining an "Assignment Set" to the target node. The dispatcher protocol maintains this assignment set between the worker and manager. If a task is outside the assignment set, the node can choose to remove the resources associated with that task (ie delete the container).

The point here is, the discussion of deletion is moot. All that needs to be done from the perspective of the work is to not include these tasks in the assignment set and the node will delete them. The manager can choose to keep them around or delete them separately.

aluzzardi · 2016-09-26T21:29:10Z

The consensus is to reduce the number of slots upon scale down

/cc @aaronlehmann

igrcic changed the title ~~Remove containers on scale down~~ Remove containers on swarm service scale down Aug 15, 2016

dperny added the kind/question label Aug 15, 2016

aluzzardi added kind/enhancement and removed kind/question labels Sep 26, 2016

aluzzardi added this to the 1.13.0 milestone Sep 26, 2016

aluzzardi added the priority/P2 label Sep 26, 2016

aaronlehmann mentioned this issue Sep 27, 2016

replicated orchestrator: Delete tasks on scale down #1574

Merged

aaronlehmann closed this as completed in #1574 Sep 29, 2016

danfromtitan mentioned this issue May 19, 2017

Docker 1.12 services - After Galera scale down maxscale's autodiscovery runs into trouble toughIQ/docker-maxscale#1

Open

natlibfi-arlehiko mentioned this issue Jun 28, 2019

Docker Swarm - Scaling down service to 0 removes logs moby/moby#39429

Open

githubsaturn mentioned this issue Nov 6, 2020

Replica number going beyond what is specified caprover/caprover#896

Closed

githubsaturn mentioned this issue Apr 1, 2021

Deployed latest static files is not reflecting over Caprover instance. caprover/caprover#1074

Closed

githubsaturn mentioned this issue Jul 5, 2021

Old containers are not stopped after deployment caprover/caprover#1142

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove containers on swarm service scale down #1372

Remove containers on swarm service scale down #1372

igrcic commented Aug 15, 2016

aaronlehmann commented Aug 15, 2016

stevvooe commented Aug 16, 2016

igrcic commented Aug 17, 2016

aaronlehmann commented Aug 17, 2016

igrcic commented Aug 17, 2016

igrcic commented Aug 17, 2016

aaronlehmann commented Aug 17, 2016

stevvooe commented Aug 17, 2016

igrcic commented Aug 18, 2016

aaronlehmann commented Aug 18, 2016

aluzzardi commented Aug 28, 2016 •

edited

aaronlehmann commented Aug 29, 2016

stevvooe commented Aug 29, 2016

igrcic commented Sep 26, 2016

stevvooe commented Sep 26, 2016

aluzzardi commented Sep 26, 2016

Remove containers on swarm service scale down #1372

Remove containers on swarm service scale down #1372

Comments

igrcic commented Aug 15, 2016

aaronlehmann commented Aug 15, 2016

stevvooe commented Aug 16, 2016

igrcic commented Aug 17, 2016

aaronlehmann commented Aug 17, 2016

igrcic commented Aug 17, 2016

igrcic commented Aug 17, 2016

aaronlehmann commented Aug 17, 2016

stevvooe commented Aug 17, 2016

igrcic commented Aug 18, 2016

aaronlehmann commented Aug 18, 2016

aluzzardi commented Aug 28, 2016 • edited

aaronlehmann commented Aug 29, 2016

stevvooe commented Aug 29, 2016

igrcic commented Sep 26, 2016

stevvooe commented Sep 26, 2016

aluzzardi commented Sep 26, 2016

aluzzardi commented Aug 28, 2016 •

edited