Scheduler limits the # of ctn at 40 per nodes worker (overlay network limit is 252 ctn) | Swarm 1.12.1 #26702
Comments
GordonTheTurtle
added the
version/1.12
label
Sep 19, 2016
pascalandy
changed the title from
Scheduler is limited the # of ctn at 40 per nodes worker | Swarm 1.12.1 to Scheduler limits the # of ctn at 40 per nodes worker | Swarm 1.12.1
Sep 19, 2016
|
it's very strange report on limited container count |
thaJeztah
added status/more-info-needed area/swarm
labels
Sep 19, 2016
|
Could you provide more information on how to reproduce? How are your services started, with what options? You say that your nodes should be able to run much more containers, but what's the actual memory use on those nodes? Does the daemon logs on those nodes give more information why the containers are not running/started? |
pascalandy
commented
Sep 19, 2016
•
|
Sure and thanks for the prompt answers :) Again the container (g9999903-h)is a basic nginx container with few HTML files. Launching the container:
scaling up docker service scale g99999003-h=300
For 250 containers it's basically 7.6 GO while my workers have collectivelly 14 Go of RAM. Thrust me there is room for more ctn on those workers. They are limited to 40 container each !! The same behavior happen on the 3 nodes I shared during the dockerswarm2k experience. EDIT: Right now I can not SSH on each workers and do a Cheers! |
pascalandy
commented
Sep 19, 2016
•
|
Hi, @chanwit did you observed that we could not deploy more than +/- 40 containers per nodes during the dockerswarm2k experience? This is what I saw on the 3 nodes I shared. |
chanwit
commented
Sep 19, 2016
|
@pascalandy at that time, I thought it's limited at 40 because it's the average number across Swarm2K. |
pascalandy
commented
Sep 19, 2016
|
For swarm3k, it would be nice to try to max out let's say only 100 workers and see how it reacts. Then go full blast :) |
chanwit
commented
Sep 19, 2016
|
@pascalandy will do !! |
pascalandy
commented
Sep 19, 2016
|
You'R the Man! |
pascalandy
commented
Sep 20, 2016
•
|
Hey guys, FULL REPORT UPDATE :) I recreated my issue many... many times. Here are the
cloud-a (master 2GB) I run:
I run:
I wait for a while …
root@cloud-a:~/deploy-setup# docker service inspect redisF
cloud-a (master 2GB)root@cloud-a:~/deploy-setup# docker ps
cloud-01 (worker 2GB)root@cloud-01:~# docker ps
cloud-03 (worker 2GB)root@cloud-03:~# docker ps
cloud-04 (worker 8GB)root@cloud-04:~# docker ps
cloud-05 (worker 8GB)root@cloud-05:~# docker ps
|
pascalandy
commented
Sep 20, 2016
|
It hurts ... root@cloud-a:~/deploy-setup# docker service ls
web-test4 points to website |
|
/cc @dperny @nishanttotla @pascalandy I see you're using @mrjana @aaronlehmann: Can we have better error reporting? |
mrjana
was assigned
by aluzzardi
Sep 20, 2016
pascalandy
commented
Sep 20, 2016
•
|
EDIT: First thank you @aluzzardi . I'll play with this! docker network create How to I decide IPs like 172.28.0.0, 172.28.5.0, 172.28.5.254 ? EDIT: I run docker network create --driver=overlay --opt=encrypted --subnet 172.28.1.0/16 frontend
or
Clearly, I don't know what I'm doing here. I used the docs here https://docs.docker.com/engine/reference/commandline/network_create/ but I completely messed up my cluster. Would be nice to have production example to accomplish:
Cheers! |
pascalandy
commented
Sep 21, 2016
•
|
Here is the prove that @aluzzardi was right:
|
|
@aluzzardi should the default subnet be larger? |
|
@thaJeztah I would think so, yes. @mavenugo @mrjana ? Also, no matter what, we need a better status reporting. Tasks were stuck in "NEW" (the state before "ALLOCATED") but it would make more sense to have an "ALLOCATING" state (tasks were stuck in the allocator because we ran out of IPs). |
|
Not sure I see the need for an additional state. I think it would be more useful to have the allocator put an error message on the tasks, and make that error clearly visible. |
|
@aaronlehmann However the task would be in |
Yeah. I think it's mainly a UI problem where the user doesn't have a good way of knowing that a task in a state like |
pascalandy
commented
Sep 22, 2016
Gents, could you show me how I can create a new network with such requirement? (I didn't find how and broke my stack many times LOL.) |
|
@pascalandy Creating a network with larger subnet is simple. Just create one like this |
|
@thaJeztah @aluzzardi Surely the submitter of this issue needs a bigger subnet. But it's a tradeoff between number of networks and number of containers per network. There was a suggestion to use a /20 default that would strike a balance but I don't think we have enough data points yet to change the default subnet size. However if we cut over to ipv6 as the default for all overlay networks all of these problems instantly disappear. |
pascalandy
commented
Sep 23, 2016
•
It works but is totally unstable. The reverse proxy goes nuts. Like a simple webpage is online/offline randomly. If online, the page is blank most and there is nothing in the source of that page !? Rampage! I reproduced these issue over DigitalOcean and Scaleway. Using basic docker network create is solid. My conclusion at this time is to live with a maximum of 252 containers per network. |
pascalandy
commented
Sep 23, 2016
|
BTW, I think you could remove the label 'status more-info-needed' :) |
thaJeztah
removed the
status/more-info-needed
label
Sep 23, 2016
pascalandy
changed the title from
Scheduler limits the # of ctn at 40 per nodes worker | Swarm 1.12.1 to Scheduler limits the # of ctn at 40 per nodes worker (overlay network limit is 252 ctn) | Swarm 1.12.1
Sep 23, 2016
|
I'm getting lost on this issue between was seemed originally as a request for enhancement, which now turns into a potential bug report :-) @pascalandy: are you ok to consider the original issue resolved (i.e., there's no artificial limit, it's only a matter of picking a bigger subnet), and filing a new issue for the load-balancing issue you're seeing? I also would recommend testing master or 1.12.2-RC1, as many fixes went into this area. Thanks! |
pascalandy
commented
Sep 27, 2016
|
@icecrime I agree this is not a bug issue. At first I tought there was a bug about having only 40 ctn per nodes. It turns out the overlay network is only support 252 containers at the moment. Feel free to close or keeping it open for documentation :) |
|
Well, thanks for reporting |

pascalandy commentedSep 19, 2016
•
edited
Description
It looks like Swarm can only schedule 38 to 40 container per worker nodes.
Steps to reproduce the issue:
Create then scale …
At this point nothing is scheduled anymore:
Describe the results you received:
node ls:
docker service ps g99999003-h (extract of the results)
Describe the results you expected:
That each node max out their memory or CPU limits.
My workers have 2Go of RAM. They can handle much more containers.
Additional information you deem important (e.g. issue happens only occasionally):
Output of
docker version:Output of
docker info:Additional environment details (AWS, VirtualBox, physical, etc.):
physical, each vps created from the official image ‘docker 1.12.1’ from scaleway.
No firewall while testing.
Cheers!
Pascal