Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-21803: haproxy-template: Add 'no strict-limits' to address HAProxy 2.6 issue #527

Conversation

frobware
Copy link
Contributor

@frobware frobware commented Oct 18, 2023

In HAProxy 2.6, the 'strict limits' setting, which enforces resource
limits (e.g., the number of open files), became the default behaviour.
Consequently, any failure to configure resource limits now results in
fatal errors. This change introduces 'no strict-limits' in the
template to revert to the behaviour of HAProxy 2.2. This is necessary
to ensure the router starts successfully when a maxconn setting cannot
be satisfied.

The HAProxy documentation for 'strict-limits' states:

Makes process fail at startup when a setrlimit fails. HAProxy
tries to set the best setrlimit according to what has been
calculated. If it fails, it will emit a warning. This option is
here to guarantee an explicit failure of HAProxy when those limits
fail. It is enabled by default. It may still be forcibly disabled
by prefixing it with the "no" keyword.

For example, and without this change, if you tune the
IngressController to have maxConnections: 2000000, and if the
requirement for 2000000 connections cannot be satisfied at runtime,
then the router pod will log the following message, and the HAProxy
process will exit with:

sh-4.4$ ../reload-haproxy
[NOTICE]   (62) : haproxy version is 2.6.13-234aa6d
[NOTICE]   (62) : path to executable is /usr/sbin/haproxy
[ALERT]    (62) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.

Setting 'no strict-limits' changes this to:

sh-4.4$ ../reload-haproxy
[NOTICE]   (50) : haproxy version is 2.6.13-234aa6d
[NOTICE]   (50) : path to executable is /usr/sbin/haproxy
[WARNING]  (50) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.
[ALERT]    (50) : [/usr/sbin/haproxy.main()] FD limit (1048576) too low for maxconn=2000000/maxsock=4000237. Please raise 'ulimit-n' to 4000237 or more to avoid any trouble.
 - Checking http://localhost:80 ...
 - Health check ok : 0 retry attempt(s).

And, despite the ALERT, the haproxy process will not exit.

I always recommend using -1 (or 'auto') for the maxConnections setting
if you want to increase the limit beyond the current default. This
choice results in HAProxy dynamically calculating the maximum value
based on the available resource limits in the running container.
Importantly, when using the 'auto' setting, there are no warnings
emitted, regardless of whether 'no strict-limits' is used or not.

This solution will be re-evaluated after the 4.14.0 release.

Fixes: https://issues.redhat.com/browse/OCPBUGS-21803

In HAProxy 2.6, the 'strict limits' setting, which enforces resource
limits (e.g., the number of open files), became the default behaviour.
Consequently, any failure to configure resource limits now results in
fatal errors. This change introduces 'no strict-limits' in the
template to revert to the behaviour of HAProxy 2.2. This is necessary
to ensure the router starts successfully when a maxconn setting cannot
be satisfied.

The HAProxy documentation for 'strict-limits' states:

    Makes process fail at startup when a setrlimit fails. HAProxy
    tries to set the best setrlimit according to what has been
    calculated. If it fails, it will emit a warning. This option is
    here to guarantee an explicit failure of HAProxy when those limits
    fail. It is enabled by default. It may still be forcibly disabled
    by prefixing it with the "no" keyword.

For example, and without this change, if you tune the
IngressController to have maxConnections: 2000000, and if the
requirement for 2000000 connections cannot be satisfied at runtime,
then the router pod will log the following message, and the HAProxy
process will exit fatally with:

    sh-4.4$ ../reload-haproxy
    [NOTICE]   (62) : haproxy version is 2.6.13-234aa6d
    [NOTICE]   (62) : path to executable is /usr/sbin/haproxy
    [ALERT]    (62) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.

Setting 'no strict-limits' changes this to:

    sh-4.4$ ../reload-haproxy
    [NOTICE]   (50) : haproxy version is 2.6.13-234aa6d
    [NOTICE]   (50) : path to executable is /usr/sbin/haproxy
    [WARNING]  (50) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.
    [ALERT]    (50) : [/usr/sbin/haproxy.main()] FD limit (1048576) too low for maxconn=2000000/maxsock=4000237. Please raise 'ulimit-n' to 4000237 or more to avoid any trouble.
     - Checking http://localhost:80 ...
     - Health check ok : 0 retry attempt(s).

And, despite the ALERT, the haproxy process will not fatally exit.

I always recommend using -1 (or 'auto') for the maxConnections setting
if you want to increase the limit beyond the current default. This
choice results in HAProxy dynamically calculating the maximum value
based on the available resource limits in the running container.
Importantly, when using the 'auto' setting, there are no warnings
emitted, regardless of whether 'no strict-limits' is used or not.

This solution will be re-evaluated after the 4.14.0 release.

Fixes: https://issues.redhat.com/browse/OCPBUGS-21803
@frobware frobware changed the title haproxy-template: Add 'no strict-limits' to address HAProxy 2.6 issue OCPBUGS-21803: haproxy-template: Add 'no strict-limits' to address HAProxy 2.6 issue Oct 18, 2023
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 18, 2023
@openshift-ci-robot
Copy link
Contributor

@frobware: This pull request references Jira Issue OCPBUGS-21803, which is invalid:

  • expected the bug to target the "4.15.0" version, but it targets "4.14.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

In HAProxy 2.6, the 'strict limits' setting, which enforces resource
limits (e.g., the number of open files), became the default behaviour.
Consequently, any failure to configure resource limits now results in
fatal errors. This change introduces 'no strict-limits' in the
template to revert to the behaviour of HAProxy 2.2. This is necessary
to ensure the router starts successfully when a maxconn setting cannot
be satisfied.

The HAProxy documentation for 'strict-limits' states:

Makes process fail at startup when a setrlimit fails. HAProxy
tries to set the best setrlimit according to what has been
calculated. If it fails, it will emit a warning. This option is
here to guarantee an explicit failure of HAProxy when those limits
fail. It is enabled by default. It may still be forcibly disabled
by prefixing it with the "no" keyword.

For example, and without this change, if you tune the
IngressController to have maxConnections: 2000000, and if the
requirement for 2000000 connections cannot be satisfied at runtime,
then the router pod will log the following message, and the HAProxy
process will exit fatally with:

sh-4.4$ ../reload-haproxy
[NOTICE] (62) : haproxy version is 2.6.13-234aa6d
[NOTICE] (62) : path to executable is /usr/sbin/haproxy
[ALERT] (62) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.

Setting 'no strict-limits' changes this to:

sh-4.4$ ../reload-haproxy
[NOTICE] (50) : haproxy version is 2.6.13-234aa6d
[NOTICE] (50) : path to executable is /usr/sbin/haproxy
[WARNING] (50) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.
[ALERT] (50) : [/usr/sbin/haproxy.main()] FD limit (1048576) too low for maxconn=2000000/maxsock=4000237. Please raise 'ulimit-n' to 4000237 or more to avoid any trouble.
- Checking http://localhost:80 ...
- Health check ok : 0 retry attempt(s).

And, despite the ALERT, the haproxy process will not fatally exit.

I always recommend using -1 (or 'auto') for the maxConnections setting
if you want to increase the limit beyond the current default. This
choice results in HAProxy dynamically calculating the maximum value
based on the available resource limits in the running container.
Importantly, when using the 'auto' setting, there are no warnings
emitted, regardless of whether 'no strict-limits' is used or not.

This solution will be re-evaluated after the 4.14.0 release.

Fixes: https://issues.redhat.com/browse/OCPBUGS-21803

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@frobware
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 18, 2023
@openshift-ci-robot
Copy link
Contributor

@frobware: This pull request references Jira Issue OCPBUGS-21803, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link
Contributor

@frobware: This pull request references Jira Issue OCPBUGS-21803, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.15.0) matches configured target version for branch (4.15.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

In response to this:

In HAProxy 2.6, the 'strict limits' setting, which enforces resource
limits (e.g., the number of open files), became the default behaviour.
Consequently, any failure to configure resource limits now results in
fatal errors. This change introduces 'no strict-limits' in the
template to revert to the behaviour of HAProxy 2.2. This is necessary
to ensure the router starts successfully when a maxconn setting cannot
be satisfied.

The HAProxy documentation for 'strict-limits' states:

Makes process fail at startup when a setrlimit fails. HAProxy
tries to set the best setrlimit according to what has been
calculated. If it fails, it will emit a warning. This option is
here to guarantee an explicit failure of HAProxy when those limits
fail. It is enabled by default. It may still be forcibly disabled
by prefixing it with the "no" keyword.

For example, and without this change, if you tune the
IngressController to have maxConnections: 2000000, and if the
requirement for 2000000 connections cannot be satisfied at runtime,
then the router pod will log the following message, and the HAProxy
process will exit with:

sh-4.4$ ../reload-haproxy
[NOTICE] (62) : haproxy version is 2.6.13-234aa6d
[NOTICE] (62) : path to executable is /usr/sbin/haproxy
[ALERT] (62) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.

Setting 'no strict-limits' changes this to:

sh-4.4$ ../reload-haproxy
[NOTICE] (50) : haproxy version is 2.6.13-234aa6d
[NOTICE] (50) : path to executable is /usr/sbin/haproxy
[WARNING] (50) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.
[ALERT] (50) : [/usr/sbin/haproxy.main()] FD limit (1048576) too low for maxconn=2000000/maxsock=4000237. Please raise 'ulimit-n' to 4000237 or more to avoid any trouble.
- Checking http://localhost:80 ...
- Health check ok : 0 retry attempt(s).

And, despite the ALERT, the haproxy process will not exit.

I always recommend using -1 (or 'auto') for the maxConnections setting
if you want to increase the limit beyond the current default. This
choice results in HAProxy dynamically calculating the maximum value
based on the available resource limits in the running container.
Importantly, when using the 'auto' setting, there are no warnings
emitted, regardless of whether 'no strict-limits' is used or not.

This solution will be re-evaluated after the 4.14.0 release.

Fixes: https://issues.redhat.com/browse/OCPBUGS-21803

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

frobware added a commit to frobware/cluster-ingress-operator that referenced this pull request Oct 18, 2023
This commit adds a test case to verify that the maxConnections setting
can be configured with the absolute upper limit of 2000000, as defined
by the API

% oc explain ingresscontroller.spec.tuningOptions.maxConnections
KIND:     IngressController
VERSION:  operator.openshift.io/v1

FIELD:    maxConnections <integer>

DESCRIPTION:
     maxConnections defines the maximum number of simultaneous
     connections that can be established per HAProxy process.
     Increasing this value allows each ingress controller pod to
     handle more connections but at the cost of additional system
     resources being consumed. Permitted values are: empty, 0, -1, and
     the range 2000-2000000.

This change requires openshift/router#527.

Fixes: https://issues.redhat.com/browse/OCPBUGS-21803.
@frobware
Copy link
Contributor Author

Added test case for 2,000,000 max connections in openshift/cluster-ingress-operator#983.

@ShudiLi
Copy link
Member

ShudiLi commented Oct 18, 2023

Ran the case successfully:
`
1.
% oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.14.0-0.ci.test-2023-10-18-092000-ci-ln-27xjin2-latest True False 62m Cluster version is 4.14.0-0.ci.test-2023-10-18-092000-ci-ln-27xjin2-latest

% ./bin/extended-platform-tests run all --dry-run | grep 50926 | ./bin/extended-platform-tests run -f -
...
passed: (12m6s) 2023-10-18T10:34:32 "[sig-network-edge] Network_Edge should Author:shudili-NonPreRelease-Longduration-High-50926-Support a Configurable ROUTER_MAX_CONNECTIONS in HAproxy"

1 pass, 0 skip (12m6s)
shudi@Shudis-MacBook-Pro openshift-tests-private %
`

/qe-approved
thanks

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 18, 2023

@frobware: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

frobware added a commit to frobware/cluster-ingress-operator that referenced this pull request Oct 18, 2023
This commit adds a test case to verify that the maxConnections setting
can be configured with the absolute upper limit of 2000000, as defined
by the API

% oc explain ingresscontroller.spec.tuningOptions.maxConnections
KIND:     IngressController
VERSION:  operator.openshift.io/v1

FIELD:    maxConnections <integer>

DESCRIPTION:
     maxConnections defines the maximum number of simultaneous
     connections that can be established per HAProxy process.
     Increasing this value allows each ingress controller pod to
     handle more connections but at the cost of additional system
     resources being consumed. Permitted values are: empty, 0, -1, and
     the range 2000-2000000.

This change requires openshift/router#527.

Fixes: https://issues.redhat.com/browse/OCPBUGS-21803.
@Miciah
Copy link
Contributor

Miciah commented Oct 18, 2023

Thanks!
/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 18, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 18, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 18, 2023
@openshift-ci openshift-ci bot merged commit 9abf519 into openshift:master Oct 18, 2023
8 checks passed
@openshift-ci-robot
Copy link
Contributor

@frobware: Jira Issue OCPBUGS-21803: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-21803 has not been moved to the MODIFIED state.

In response to this:

In HAProxy 2.6, the 'strict limits' setting, which enforces resource
limits (e.g., the number of open files), became the default behaviour.
Consequently, any failure to configure resource limits now results in
fatal errors. This change introduces 'no strict-limits' in the
template to revert to the behaviour of HAProxy 2.2. This is necessary
to ensure the router starts successfully when a maxconn setting cannot
be satisfied.

The HAProxy documentation for 'strict-limits' states:

Makes process fail at startup when a setrlimit fails. HAProxy
tries to set the best setrlimit according to what has been
calculated. If it fails, it will emit a warning. This option is
here to guarantee an explicit failure of HAProxy when those limits
fail. It is enabled by default. It may still be forcibly disabled
by prefixing it with the "no" keyword.

For example, and without this change, if you tune the
IngressController to have maxConnections: 2000000, and if the
requirement for 2000000 connections cannot be satisfied at runtime,
then the router pod will log the following message, and the HAProxy
process will exit with:

sh-4.4$ ../reload-haproxy
[NOTICE] (62) : haproxy version is 2.6.13-234aa6d
[NOTICE] (62) : path to executable is /usr/sbin/haproxy
[ALERT] (62) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.

Setting 'no strict-limits' changes this to:

sh-4.4$ ../reload-haproxy
[NOTICE] (50) : haproxy version is 2.6.13-234aa6d
[NOTICE] (50) : path to executable is /usr/sbin/haproxy
[WARNING] (50) : [/usr/sbin/haproxy.main()] Cannot raise FD limit to 4000237, limit is 1048576.
[ALERT] (50) : [/usr/sbin/haproxy.main()] FD limit (1048576) too low for maxconn=2000000/maxsock=4000237. Please raise 'ulimit-n' to 4000237 or more to avoid any trouble.
- Checking http://localhost:80 ...
- Health check ok : 0 retry attempt(s).

And, despite the ALERT, the haproxy process will not exit.

I always recommend using -1 (or 'auto') for the maxConnections setting
if you want to increase the limit beyond the current default. This
choice results in HAProxy dynamically calculating the maximum value
based on the available resource limits in the running container.
Importantly, when using the 'auto' setting, there are no warnings
emitted, regardless of whether 'no strict-limits' is used or not.

This solution will be re-evaluated after the 4.14.0 release.

Fixes: https://issues.redhat.com/browse/OCPBUGS-21803

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@frobware
Copy link
Contributor Author

/cherry-pick release-4.14

@openshift-cherrypick-robot

@frobware: new pull request created: #528

In response to this:

/cherry-pick release-4.14

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-cherrypick-robot pushed a commit to openshift-cherrypick-robot/cluster-ingress-operator that referenced this pull request Oct 18, 2023
This commit adds a test case to verify that the maxConnections setting
can be configured with the absolute upper limit of 2000000, as defined
by the API

% oc explain ingresscontroller.spec.tuningOptions.maxConnections
KIND:     IngressController
VERSION:  operator.openshift.io/v1

FIELD:    maxConnections <integer>

DESCRIPTION:
     maxConnections defines the maximum number of simultaneous
     connections that can be established per HAProxy process.
     Increasing this value allows each ingress controller pod to
     handle more connections but at the cost of additional system
     resources being consumed. Permitted values are: empty, 0, -1, and
     the range 2000-2000000.

This change requires openshift/router#527.

Fixes: https://issues.redhat.com/browse/OCPBUGS-21803.
@openshift-merge-robot
Copy link
Contributor

Fix included in accepted release 4.15.0-0.nightly-2023-10-24-230302

@frobware frobware deleted the OCPBUGS-21803-Ingress-stuck-in-progressing-when-maxConnections-increased-to-2000000 branch May 1, 2024 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants