OCPBUGS-21803: haproxy-template: Add 'no strict-limits' to address HAProxy 2.6 issue #527

frobware · 2023-10-18T08:41:18Z

In HAProxy 2.6, the 'strict limits' setting, which enforces resource
limits (e.g., the number of open files), became the default behaviour.
Consequently, any failure to configure resource limits now results in
fatal errors. This change introduces 'no strict-limits' in the
template to revert to the behaviour of HAProxy 2.2. This is necessary
to ensure the router starts successfully when a maxconn setting cannot
be satisfied.

The HAProxy documentation for 'strict-limits' states:

Makes process fail at startup when a setrlimit fails. HAProxy
tries to set the best setrlimit according to what has been
calculated. If it fails, it will emit a warning. This option is
here to guarantee an explicit failure of HAProxy when those limits
fail. It is enabled by default. It may still be forcibly disabled
by prefixing it with the "no" keyword.

For example, and without this change, if you tune the
IngressController to have maxConnections: 2000000, and if the
requirement for 2000000 connections cannot be satisfied at runtime,
then the router pod will log the following message, and the HAProxy
process will exit with:

sh-4.4$ ../reload-haproxy
[NOTICE]   (62) : haproxy version is 2.6.13-234aa6d
[NOTICE]   (62) : path to executable is /usr/sbin/haproxy
[ALERT]    (62) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.

Setting 'no strict-limits' changes this to:

sh-4.4$ ../reload-haproxy
[NOTICE]   (50) : haproxy version is 2.6.13-234aa6d
[NOTICE]   (50) : path to executable is /usr/sbin/haproxy
[WARNING]  (50) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.
[ALERT]    (50) : [/usr/sbin/haproxy.main()] FD limit (1048576) too low for maxconn=2000000/maxsock=4000237. Please raise 'ulimit-n' to 4000237 or more to avoid any trouble.
 - Checking http://localhost:80 ...
 - Health check ok : 0 retry attempt(s).

And, despite the ALERT, the haproxy process will not exit.

I always recommend using -1 (or 'auto') for the maxConnections setting
if you want to increase the limit beyond the current default. This
choice results in HAProxy dynamically calculating the maximum value
based on the available resource limits in the running container.
Importantly, when using the 'auto' setting, there are no warnings
emitted, regardless of whether 'no strict-limits' is used or not.

This solution will be re-evaluated after the 4.14.0 release.

Fixes: https://issues.redhat.com/browse/OCPBUGS-21803

In HAProxy 2.6, the 'strict limits' setting, which enforces resource limits (e.g., the number of open files), became the default behaviour. Consequently, any failure to configure resource limits now results in fatal errors. This change introduces 'no strict-limits' in the template to revert to the behaviour of HAProxy 2.2. This is necessary to ensure the router starts successfully when a maxconn setting cannot be satisfied. The HAProxy documentation for 'strict-limits' states: Makes process fail at startup when a setrlimit fails. HAProxy tries to set the best setrlimit according to what has been calculated. If it fails, it will emit a warning. This option is here to guarantee an explicit failure of HAProxy when those limits fail. It is enabled by default. It may still be forcibly disabled by prefixing it with the "no" keyword. For example, and without this change, if you tune the IngressController to have maxConnections: 2000000, and if the requirement for 2000000 connections cannot be satisfied at runtime, then the router pod will log the following message, and the HAProxy process will exit fatally with: sh-4.4$ ../reload-haproxy [NOTICE] (62) : haproxy version is 2.6.13-234aa6d [NOTICE] (62) : path to executable is /usr/sbin/haproxy [ALERT] (62) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576. Setting 'no strict-limits' changes this to: sh-4.4$ ../reload-haproxy [NOTICE] (50) : haproxy version is 2.6.13-234aa6d [NOTICE] (50) : path to executable is /usr/sbin/haproxy [WARNING] (50) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576. [ALERT] (50) : [/usr/sbin/haproxy.main()] FD limit (1048576) too low for maxconn=2000000/maxsock=4000237. Please raise 'ulimit-n' to 4000237 or more to avoid any trouble. - Checking http://localhost:80 ... - Health check ok : 0 retry attempt(s). And, despite the ALERT, the haproxy process will not fatally exit. I always recommend using -1 (or 'auto') for the maxConnections setting if you want to increase the limit beyond the current default. This choice results in HAProxy dynamically calculating the maximum value based on the available resource limits in the running container. Importantly, when using the 'auto' setting, there are no warnings emitted, regardless of whether 'no strict-limits' is used or not. This solution will be re-evaluated after the 4.14.0 release. Fixes: https://issues.redhat.com/browse/OCPBUGS-21803

openshift-ci-robot · 2023-10-18T08:41:41Z

@frobware: This pull request references Jira Issue OCPBUGS-21803, which is invalid:

expected the bug to target the "4.15.0" version, but it targets "4.14.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

In HAProxy 2.6, the 'strict limits' setting, which enforces resource
limits (e.g., the number of open files), became the default behaviour.
Consequently, any failure to configure resource limits now results in
fatal errors. This change introduces 'no strict-limits' in the
template to revert to the behaviour of HAProxy 2.2. This is necessary
to ensure the router starts successfully when a maxconn setting cannot
be satisfied.

The HAProxy documentation for 'strict-limits' states:

Makes process fail at startup when a setrlimit fails. HAProxy
tries to set the best setrlimit according to what has been
calculated. If it fails, it will emit a warning. This option is
here to guarantee an explicit failure of HAProxy when those limits
fail. It is enabled by default. It may still be forcibly disabled
by prefixing it with the "no" keyword.

For example, and without this change, if you tune the
IngressController to have maxConnections: 2000000, and if the
requirement for 2000000 connections cannot be satisfied at runtime,
then the router pod will log the following message, and the HAProxy
process will exit fatally with:

sh-4.4$ ../reload-haproxy
[NOTICE] (62) : haproxy version is 2.6.13-234aa6d
[NOTICE] (62) : path to executable is /usr/sbin/haproxy
[ALERT] (62) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.

Setting 'no strict-limits' changes this to:

sh-4.4$ ../reload-haproxy
[NOTICE] (50) : haproxy version is 2.6.13-234aa6d
[NOTICE] (50) : path to executable is /usr/sbin/haproxy
[WARNING] (50) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.
[ALERT] (50) : [/usr/sbin/haproxy.main()] FD limit (1048576) too low for maxconn=2000000/maxsock=4000237. Please raise 'ulimit-n' to 4000237 or more to avoid any trouble.
- Checking http://localhost:80 ...
- Health check ok : 0 retry attempt(s).

And, despite the ALERT, the haproxy process will not fatally exit.

I always recommend using -1 (or 'auto') for the maxConnections setting
if you want to increase the limit beyond the current default. This
choice results in HAProxy dynamically calculating the maximum value
based on the available resource limits in the running container.
Importantly, when using the 'auto' setting, there are no warnings
emitted, regardless of whether 'no strict-limits' is used or not.

This solution will be re-evaluated after the 4.14.0 release.

Fixes: https://issues.redhat.com/browse/OCPBUGS-21803

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

frobware · 2023-10-18T08:42:41Z

/jira refresh

openshift-ci-robot · 2023-10-18T08:42:49Z

@frobware: This pull request references Jira Issue OCPBUGS-21803, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.15.0) matches configured target version for branch (4.15.0)
bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2023-10-18T09:05:54Z

@frobware: This pull request references Jira Issue OCPBUGS-21803, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.15.0) matches configured target version for branch (4.15.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

In response to this:

In HAProxy 2.6, the 'strict limits' setting, which enforces resource
limits (e.g., the number of open files), became the default behaviour.
Consequently, any failure to configure resource limits now results in
fatal errors. This change introduces 'no strict-limits' in the
template to revert to the behaviour of HAProxy 2.2. This is necessary
to ensure the router starts successfully when a maxconn setting cannot
be satisfied.

The HAProxy documentation for 'strict-limits' states:

Makes process fail at startup when a setrlimit fails. HAProxy
tries to set the best setrlimit according to what has been
calculated. If it fails, it will emit a warning. This option is
here to guarantee an explicit failure of HAProxy when those limits
fail. It is enabled by default. It may still be forcibly disabled
by prefixing it with the "no" keyword.

For example, and without this change, if you tune the
IngressController to have maxConnections: 2000000, and if the
requirement for 2000000 connections cannot be satisfied at runtime,
then the router pod will log the following message, and the HAProxy
process will exit with:

sh-4.4$ ../reload-haproxy
[NOTICE] (62) : haproxy version is 2.6.13-234aa6d
[NOTICE] (62) : path to executable is /usr/sbin/haproxy
[ALERT] (62) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.

Setting 'no strict-limits' changes this to:

sh-4.4$ ../reload-haproxy
[NOTICE] (50) : haproxy version is 2.6.13-234aa6d
[NOTICE] (50) : path to executable is /usr/sbin/haproxy
[WARNING] (50) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.
[ALERT] (50) : [/usr/sbin/haproxy.main()] FD limit (1048576) too low for maxconn=2000000/maxsock=4000237. Please raise 'ulimit-n' to 4000237 or more to avoid any trouble.
- Checking http://localhost:80 ...
- Health check ok : 0 retry attempt(s).

And, despite the ALERT, the haproxy process will not exit.

I always recommend using -1 (or 'auto') for the maxConnections setting
if you want to increase the limit beyond the current default. This
choice results in HAProxy dynamically calculating the maximum value
based on the available resource limits in the running container.
Importantly, when using the 'auto' setting, there are no warnings
emitted, regardless of whether 'no strict-limits' is used or not.

This solution will be re-evaluated after the 4.14.0 release.

Fixes: https://issues.redhat.com/browse/OCPBUGS-21803

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

This commit adds a test case to verify that the maxConnections setting can be configured with the absolute upper limit of 2000000, as defined by the API % oc explain ingresscontroller.spec.tuningOptions.maxConnections KIND: IngressController VERSION: operator.openshift.io/v1 FIELD: maxConnections <integer> DESCRIPTION: maxConnections defines the maximum number of simultaneous connections that can be established per HAProxy process. Increasing this value allows each ingress controller pod to handle more connections but at the cost of additional system resources being consumed. Permitted values are: empty, 0, -1, and the range 2000-2000000. This change requires openshift/router#527. Fixes: https://issues.redhat.com/browse/OCPBUGS-21803.

frobware · 2023-10-18T10:31:05Z

Added test case for 2,000,000 max connections in openshift/cluster-ingress-operator#983.

ShudiLi · 2023-10-18T11:01:31Z

Ran the case successfully:
`
1.
% oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.14.0-0.ci.test-2023-10-18-092000-ci-ln-27xjin2-latest True False 62m Cluster version is 4.14.0-0.ci.test-2023-10-18-092000-ci-ln-27xjin2-latest

% ./bin/extended-platform-tests run all --dry-run | grep 50926 | ./bin/extended-platform-tests run -f -
...
passed: (12m6s) 2023-10-18T10:34:32 "[sig-network-edge] Network_Edge should Author:shudili-NonPreRelease-Longduration-High-50926-Support a Configurable ROUTER_MAX_CONNECTIONS in HAproxy"

1 pass, 0 skip (12m6s)
shudi@Shudis-MacBook-Pro openshift-tests-private %
`

/qe-approved
thanks

openshift-ci · 2023-10-18T11:18:48Z

@frobware: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

This commit adds a test case to verify that the maxConnections setting can be configured with the absolute upper limit of 2000000, as defined by the API % oc explain ingresscontroller.spec.tuningOptions.maxConnections KIND: IngressController VERSION: operator.openshift.io/v1 FIELD: maxConnections <integer> DESCRIPTION: maxConnections defines the maximum number of simultaneous connections that can be established per HAProxy process. Increasing this value allows each ingress controller pod to handle more connections but at the cost of additional system resources being consumed. Permitted values are: empty, 0, -1, and the range 2000-2000000. This change requires openshift/router#527. Fixes: https://issues.redhat.com/browse/OCPBUGS-21803.

Miciah · 2023-10-18T14:40:23Z

Thanks!
/approve
/lgtm

openshift-ci · 2023-10-18T14:45:48Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Miciah]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2023-10-18T14:58:12Z

@frobware: Jira Issue OCPBUGS-21803: Some pull requests linked via external trackers have merged:

openshift/router#527

The following pull requests linked via external trackers have not merged:

openshift/cluster-ingress-operator#983 is open

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-21803 has not been moved to the MODIFIED state.

In response to this:

In HAProxy 2.6, the 'strict limits' setting, which enforces resource
limits (e.g., the number of open files), became the default behaviour.
Consequently, any failure to configure resource limits now results in
fatal errors. This change introduces 'no strict-limits' in the
template to revert to the behaviour of HAProxy 2.2. This is necessary
to ensure the router starts successfully when a maxconn setting cannot
be satisfied.

The HAProxy documentation for 'strict-limits' states:

Makes process fail at startup when a setrlimit fails. HAProxy
tries to set the best setrlimit according to what has been
calculated. If it fails, it will emit a warning. This option is
here to guarantee an explicit failure of HAProxy when those limits
fail. It is enabled by default. It may still be forcibly disabled
by prefixing it with the "no" keyword.

For example, and without this change, if you tune the
IngressController to have maxConnections: 2000000, and if the
requirement for 2000000 connections cannot be satisfied at runtime,
then the router pod will log the following message, and the HAProxy
process will exit with:

sh-4.4$ ../reload-haproxy
[NOTICE] (62) : haproxy version is 2.6.13-234aa6d
[NOTICE] (62) : path to executable is /usr/sbin/haproxy
[ALERT] (62) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.

Setting 'no strict-limits' changes this to:

sh-4.4$ ../reload-haproxy
[NOTICE] (50) : haproxy version is 2.6.13-234aa6d
[NOTICE] (50) : path to executable is /usr/sbin/haproxy
[WARNING] (50) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.
[ALERT] (50) : [/usr/sbin/haproxy.main()] FD limit (1048576) too low for maxconn=2000000/maxsock=4000237. Please raise 'ulimit-n' to 4000237 or more to avoid any trouble.
- Checking http://localhost:80 ...
- Health check ok : 0 retry attempt(s).

And, despite the ALERT, the haproxy process will not exit.

I always recommend using -1 (or 'auto') for the maxConnections setting
if you want to increase the limit beyond the current default. This
choice results in HAProxy dynamically calculating the maximum value
based on the available resource limits in the running container.
Importantly, when using the 'auto' setting, there are no warnings
emitted, regardless of whether 'no strict-limits' is used or not.

This solution will be re-evaluated after the 4.14.0 release.

Fixes: https://issues.redhat.com/browse/OCPBUGS-21803

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

frobware · 2023-10-18T15:34:29Z

/cherry-pick release-4.14

openshift-cherrypick-robot · 2023-10-18T15:35:13Z

@frobware: new pull request created: #528

In response to this:

/cherry-pick release-4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

This commit adds a test case to verify that the maxConnections setting can be configured with the absolute upper limit of 2000000, as defined by the API % oc explain ingresscontroller.spec.tuningOptions.maxConnections KIND: IngressController VERSION: operator.openshift.io/v1 FIELD: maxConnections <integer> DESCRIPTION: maxConnections defines the maximum number of simultaneous connections that can be established per HAProxy process. Increasing this value allows each ingress controller pod to handle more connections but at the cost of additional system resources being consumed. Permitted values are: empty, 0, -1, and the range 2000-2000000. This change requires openshift/router#527. Fixes: https://issues.redhat.com/browse/OCPBUGS-21803.

openshift-merge-robot · 2023-10-25T05:31:48Z

Fix included in accepted release 4.15.0-0.nightly-2023-10-24-230302

frobware changed the title ~~haproxy-template: Add 'no strict-limits' to address HAProxy 2.6 issue~~ OCPBUGS-21803: haproxy-template: Add 'no strict-limits' to address HAProxy 2.6 issue Oct 18, 2023

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 18, 2023

openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 18, 2023

openshift-ci bot requested review from ShudiLi, candita and knobunc October 18, 2023 08:42

frobware mentioned this pull request Oct 18, 2023

OCPBUGS-21803: test/e2e: Add test case for 2000000 maxConnections openshift/cluster-ingress-operator#983

Merged

openshift-ci bot assigned Miciah Oct 18, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2023

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 18, 2023

openshift-ci bot merged commit 9abf519 into openshift:master Oct 18, 2023
8 checks passed

openshift-cherrypick-robot mentioned this pull request Oct 18, 2023

[release-4.14] OCPBUGS-21898: haproxy-template: Add 'no strict-limits' to address HAProxy 2.6 issue #528

Merged

frobware deleted the OCPBUGS-21803-Ingress-stuck-in-progressing-when-maxConnections-increased-to-2000000 branch May 1, 2024 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCPBUGS-21803: haproxy-template: Add 'no strict-limits' to address HAProxy 2.6 issue #527

OCPBUGS-21803: haproxy-template: Add 'no strict-limits' to address HAProxy 2.6 issue #527

frobware commented Oct 18, 2023 •

edited

openshift-ci-robot commented Oct 18, 2023

frobware commented Oct 18, 2023

openshift-ci-robot commented Oct 18, 2023

openshift-ci-robot commented Oct 18, 2023

frobware commented Oct 18, 2023

ShudiLi commented Oct 18, 2023

openshift-ci bot commented Oct 18, 2023

Miciah commented Oct 18, 2023

openshift-ci bot commented Oct 18, 2023

openshift-ci-robot commented Oct 18, 2023

frobware commented Oct 18, 2023

openshift-cherrypick-robot commented Oct 18, 2023

openshift-merge-robot commented Oct 25, 2023

OCPBUGS-21803: haproxy-template: Add 'no strict-limits' to address HAProxy 2.6 issue #527

OCPBUGS-21803: haproxy-template: Add 'no strict-limits' to address HAProxy 2.6 issue #527

Conversation

frobware commented Oct 18, 2023 • edited

openshift-ci-robot commented Oct 18, 2023

frobware commented Oct 18, 2023

openshift-ci-robot commented Oct 18, 2023

openshift-ci-robot commented Oct 18, 2023

frobware commented Oct 18, 2023

ShudiLi commented Oct 18, 2023

openshift-ci bot commented Oct 18, 2023

Miciah commented Oct 18, 2023

openshift-ci bot commented Oct 18, 2023

openshift-ci-robot commented Oct 18, 2023

frobware commented Oct 18, 2023

openshift-cherrypick-robot commented Oct 18, 2023

openshift-merge-robot commented Oct 25, 2023

frobware commented Oct 18, 2023 •

edited