limit loadbalancer max sockets #2954

killianmuldoon · 2022-10-04T18:46:04Z

I'm currently dealing with this issue with haproxy: docker-library/haproxy#194 It also impacts the kindest/haproxy docker images (which is where I came across it).

Setting a global maxconn in the haproxy.cfg file would fix the impact of the issue, though not the source issue which is somewhere between haproxy and docker AFAIK.

I was wondering if it would be acceptable to set that value in the haproxy config, and if so what value would be a good default? For fixing the above issue I think it can be arbitrarily high (as the issue seems to be the value being unset rather than what level it's set to).

The text was updated successfully, but these errors were encountered:

aojea · 2022-10-06T09:18:05Z

reading the linked issue it seems we have 2 options:

set maxconn
run the LB container setting the ulimits to a conservative number

I'm inclined towards 2, WDYT @BenTheElder ?

killianmuldoon · 2022-10-06T10:47:35Z

run the LB container setting the ulimits to a conservative number

This is what the solution for Cluster API does - kubernetes-sigs/cluster-api#7344 - it works, but getting the number right is difficult.

I'm not sure how many other places run kindest/haproxy independently, but those would need to be set the ulimits too whereas setting the maxconn in the config file would solve it for all consumers

aojea · 2022-10-06T10:54:41Z

yeah, but we got bitten for the ulimit things multiple times , setting the ulimit is common in all "stable" distros like RHEL and SLES, however, more "edge" distros like Fedora, Arch, ... set it to NoLimit,

#760 (comment)

My main pro of this approach is that will also remove the dependency on haproxy in case we want to switch the loadbalancer, I don't know, 1048576 or 65536 sounds good enough for Kind users

I'm fine with one or the other though

BenTheElder · 2022-10-06T16:15:25Z

Sorry for the delayed response -- I think we should set a reasonable ulimit on the container, we have only the api server load balancing workload and don't support user based workloads other than their connections to the api server. We can pick a pretty reasonable upper bound for concurrent connections supported +1 for the config, and setting ulimit is more comprehensive.

BenTheElder · 2022-10-06T16:18:57Z

We don't formally support reusing this image without importing kind or closely matching behavior, any future revisions can make any number of breaking changes freely.

At the moment the fact that we even use haproxy at all is an under designed internal detail that just happens to be working OK at the moment. We once used nginx, and HA probably deserves an overhaul at some point (pending more demand, kubernetes mostly doesn't have staffing to focus on testing HA at the moment).

CAPD is a bit of an odd duck here 😅

killianmuldoon · 2022-10-06T16:22:10Z

CAPD is a bit of an odd duck here 😅

Fair :D It does copy-past the folder into a third_party directory (and the version used there is many versions out of date)

BenTheElder · 2022-10-06T17:46:09Z

An odd duck we love :-)

Yeah, I think copying in is the safest approach, so CAPD can continue to update when ready.

Since both projects are kubernetes org owned and identical license, at least there's limited copy-paste license concerns 🙃

dlipovetsky · 2023-03-03T17:01:23Z

Setting a global maxconn in the haproxy.cfg file would fix the impact of the issue, though not the source issue which is somewhere between haproxy and docker AFAIK.

From what I understand, the root cause is that HAproxy allocates memory for each possible concurrent connection, up to the maximum defined by the maxconn field in its configuration.

Kind's HAproxy configuration does not set maxconn, so HAProxy derives it from the file descriptor limit, which the container inherits:

If this value is not set, it will automatically be calculated based on the current file descriptors limit reported by the "ulimit -n" command, possibly reduced to a lower value if a memory limit is enforced, based on the buffer size, memory allocated to compression, SSL cache size, and use or not of SSL and the associated maxsslconn (which can also be automatic). -- https://cbonte.github.io/haproxy-dconv/2.2/configuration.html#maxconn

Here's a demonstration using the latest image:

❯ docker run --ulimit nofile=1000 --rm -it --memory=1gb --name haproxy-test kindest/haproxy:v20230227-d46f45b6 -d | grep maxconn
Note: setting global.maxconn to 453.

❯ docker run --ulimit nofile=10000 --rm -it --memory=1gb --name haproxy-test kindest/haproxy:v20230227-d46f45b6 -d | grep maxconn
Note: setting global.maxconn to 4953.

❯ docker run --ulimit nofile=100000 --rm -it --memory=1gb --name haproxy-test kindest/haproxy:v20230227-d46f45b6 -d | grep maxconn
Note: setting global.maxconn to 49953.

❯ docker run --ulimit nofile=1000000 --rm -it --memory=1gb --name haproxy-test kindest/haproxy:v20230227-d46f45b6 -d | grep maxconn
Note: setting global.maxconn to 499953.

Please see the gist for a demonstration of how the kindest/haproxy memory container memory usage scales with the file descriptor limit: https://gist.github.com/dlipovetsky/23443bef17371a56acd8cf0579e3f6b4

There are two to ways to control the memory usage: limit maxconn in the configuration, or limit per-process memory usage using the -m flag (the flag is explained in http://docs.haproxy.org/2.2/management.html#3).

dlipovetsky · 2023-03-03T17:15:56Z

To follow up on my previous comment: It seems reasonable to limit either the maximum connections, or impose a memory limit, on HAProxy. Either of these changes would be explicit and clear. Changing the file descriptor limit is, at beast, an implicit, indirect way to change the number of maximum connections.

repnop · 2023-04-05T17:37:41Z

Both my coworker (running Fedora) and I (running Arch) are still experiencing this issue when we attempt to create a cluster, even though I'm using the latest release version which mentions the fix for this being included. Looks like the image version is hard coded into kind and hasn't been updated to reflect that change, perhaps? (Edit: manually retagged the newest image (v20230330-2f738c2) as the older image that's inside the code (v20230227-d46f45b6) and confirmed that it works now, the hardcoded image version definitely needs updated!)

kind/pkg/cluster/internal/loadbalancer/const.go

Line 20 in 2f72217

const Image = "docker.io/kindest/haproxy:v20230227-d46f45b6"

> kind create cluster --config cluster.yaml
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.26.3) 🖼
 ✓ Preparing nodes 📦 📦 📦 📦 📦 📦
 ✗ Configuring the external load balancer ⚖
Deleted nodes: ["kind-external-load-balancer" "kind-worker3" "kind-worker2" "kind-control-plane" "kind-control-plane3"
 "kind-control-plane2" "kind-worker"]
ERROR: failed to create cluster: failed to copy loadbalancer config to node: failed to create directory /usr/local/etc
/haproxy: command "docker exec --privileged kind-external-load-balancer mkdir -p /usr/local/etc/haproxy" failed with e
rror: exit status 1
Command Output: Error response from daemon: Container edad0ec6d6a7b5a0f9f8845469109c76bded2e345c4c81412fa88b1043c47c9b
 is not running

cluster.yml

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f
- role: control-plane
  image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f
- role: control-plane
  image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f
- role: worker
  image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f
- role: worker
  image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f
- role: worker
  image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f

docker logs -f kind-external-load-balancer
[WARNING] 094/172723 (1) : config : missing timeouts for frontend 'controlPlane'.
   | While not properly invalid, you will certainly encounter various problems
   | with such a configuration. To fix this, please ensure that all following
   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[NOTICE] 094/172723 (1) : haproxy version is 2.2.9-2+deb11u4
[NOTICE] 094/172723 (1) : path to executable is /usr/sbin/haproxy
[ALERT] 094/172723 (1) : Not enough memory to allocate 1073741816 entries for fdtab!
[ALERT] 094/172723 (1) : No polling mechanism available.
  It is likely that haproxy was built with TARGET=generic and that FD_SETSIZE
  is too low on this platform to support maxconn and the number of listeners
  and servers. You should rebuild haproxy specifying your system using TARGET=
  in order to support other polling systems (poll, epoll, kqueue) or reduce the
  global maxconn setting to accommodate the system's limitation. For reference,
  FD_SETSIZE=1024 on this system, global.maxconn=536870885 resulting in a maximum of
  1073741816 file descriptors. You should thus reduce global.maxconn by 536870396. Also,
  check build settings using 'haproxy -vv'.

[WARNING] 094/172828 (1) : config : missing timeouts for frontend 'controlPlane'.
   | While not properly invalid, you will certainly encounter various problems
   | with such a configuration. To fix this, please ensure that all following
   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[NOTICE] 094/172828 (1) : haproxy version is 2.2.9-2+deb11u4
[NOTICE] 094/172828 (1) : path to executable is /usr/sbin/haproxy
[ALERT] 094/172828 (1) : Not enough memory to allocate 1073741816 entries for fdtab!
[ALERT] 094/172828 (1) : No polling mechanism available.
  It is likely that haproxy was built with TARGET=generic and that FD_SETSIZE
  is too low on this platform to support maxconn and the number of listeners
  and servers. You should rebuild haproxy specifying your system using TARGET=
  in order to support other polling systems (poll, epoll, kqueue) or reduce the
  global maxconn setting to accommodate the system's limitation. For reference,
  FD_SETSIZE=1024 on this system, global.maxconn=536870885 resulting in a maximum of
  1073741816 file descriptors. You should thus reduce global.maxconn by 536870396. Also,
  check build settings using 'haproxy -vv'.

BenTheElder · 2023-04-05T17:57:31Z

Both my coworker (running Fedora) and I (running Arch) are still experiencing this issue when we attempt to create a cluster, even though I'm using the latest release version which mentions the fix for this being included. Looks like the image version is hard coded into kind and hasn't been updated to reflect that change, perhaps? (Edit: manually retagged the newest image (v20230330-2f738c2) as the older image that's inside the code (v20230227-d46f45b6) and confirmed that it works now, the hardcoded image version definitely needs updated!)

Yes, this is #3159

BenTheElder · 2023-04-05T18:00:07Z

If you use https://kind.sigs.k8s.io/docs/user/quick-start/#installing-from-source with either ~~@latest~~ @main (go) or a source checkout from just now, it will contain this fix early.

Otherwise it will be rolled up in the next release, TBD.

If you don't mind: Can you share your use case for multiple-control-plane nodes?
This is a relatively rarely used feature and usually not applicable in kind, it's something that needs more attention in the future and I'd like to make sure we drive improvements with concrete use-cases in mind.

killianmuldoon added the kind/support Categorizes issue or PR as a support question. label Oct 4, 2022

BenTheElder added kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed kind/support Categorizes issue or PR as a support question. labels Oct 6, 2022

BenTheElder changed the title ~~Consider setting maxconn in haproxy loadbalancer config~~ Consider limiting loadbalancer max sockets Oct 6, 2022

BenTheElder changed the title ~~Consider limiting loadbalancer max sockets~~ limit loadbalancer max sockets Oct 6, 2022

BenTheElder added this to the v0.17.0 milestone Oct 6, 2022

BenTheElder modified the milestones: v0.17.0, v0.18.0 Oct 27, 2022

BenTheElder mentioned this issue Dec 14, 2022

set nofile for loadbalancer #3028

Closed

dlipovetsky mentioned this issue Mar 3, 2023

fix: Limit HAProxy maximum concurrent connections #3115

Merged

k8s-ci-robot closed this as completed in #3115 Mar 5, 2023

dlipovetsky mentioned this issue Mar 7, 2023

🐛 Set nofile ulimit for loadbalancer container kubernetes-sigs/cluster-api#7344

Closed

killianmuldoon mentioned this issue Mar 9, 2023

CAPD load balancer crashes on startup kubernetes-sigs/cluster-api#8257

Closed

dlipovetsky mentioned this issue Mar 9, 2023

🐛 [cherrypick] Set nofile ulimit for loadbalancer container mesosphere/cluster-api#16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

limit loadbalancer max sockets #2954

limit loadbalancer max sockets #2954

killianmuldoon commented Oct 4, 2022

aojea commented Oct 6, 2022

killianmuldoon commented Oct 6, 2022

aojea commented Oct 6, 2022 •

edited

Loading

BenTheElder commented Oct 6, 2022

BenTheElder commented Oct 6, 2022

killianmuldoon commented Oct 6, 2022

BenTheElder commented Oct 6, 2022

dlipovetsky commented Mar 3, 2023 •

edited

Loading

dlipovetsky commented Mar 3, 2023

repnop commented Apr 5, 2023 •

edited

Loading

BenTheElder commented Apr 5, 2023

BenTheElder commented Apr 5, 2023 •

edited

Loading

limit loadbalancer max sockets #2954

limit loadbalancer max sockets #2954

Comments

killianmuldoon commented Oct 4, 2022

aojea commented Oct 6, 2022

killianmuldoon commented Oct 6, 2022

aojea commented Oct 6, 2022 • edited Loading

BenTheElder commented Oct 6, 2022

BenTheElder commented Oct 6, 2022

killianmuldoon commented Oct 6, 2022

BenTheElder commented Oct 6, 2022

dlipovetsky commented Mar 3, 2023 • edited Loading

dlipovetsky commented Mar 3, 2023

repnop commented Apr 5, 2023 • edited Loading

BenTheElder commented Apr 5, 2023

BenTheElder commented Apr 5, 2023 • edited Loading

aojea commented Oct 6, 2022 •

edited

Loading

dlipovetsky commented Mar 3, 2023 •

edited

Loading

repnop commented Apr 5, 2023 •

edited

Loading

BenTheElder commented Apr 5, 2023 •

edited

Loading