Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Issue: Ingress creation in a cluster with many existing ingresses get slow admission webhook response #11115

Closed
bysph opened this issue Mar 13, 2024 · 11 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@bysph
Copy link

bysph commented Mar 13, 2024

What happened:
When creating a new ingress in a cluster with over 1000 ingresses, the operation takes more than 3 seconds.
image

Even worse, if I create ingresses concurrently (150+), the response time of the ingress controller will increase to 10+ seconds, which will cause the API server time out.
image

In real-world scenarios, we will concurrently submit a large number of Spark tasks(Because they are scheduled to run at specific times), this issue can result in certain task submissions failing.

What you expected to happen:
I believe that processing time should ideally be within a few hundred milliseconds.

I have reviewed the admission webhook's code and noticed that it validates not only the new ingress but also all existing ingresses, potentially causing longer processing times.

The PR for the admission webhook (#3802) indeed confirms this intentional design. I'd like to understand the reasoning behind this decision and whether you have considered change to the another option mentioned by the original author.

Another option to get better performances could be to drop the test with all pre-existing ingresses, which could probably be good enough to detect most configuration errors (syntax errors mainly).

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
v0.43.0

Kubernetes version (use kubectl version):
v1.19.3

@bysph bysph added the kind/bug Categorizes issue or PR as related to a bug. label Mar 13, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Mar 13, 2024
@bysph
Copy link
Author

bysph commented Mar 13, 2024

@tjamet Hello, could you please help me take a look at this issue?

@longwuyuan
Copy link
Contributor

what is the output of kubectl -n ingess-nginx describe po | grep -i image:

@bysph
Copy link
Author

bysph commented Mar 13, 2024

kubectl -n ingess-nginx describe po | grep -i image:

{our-internal-harbor-address}/library/ingress-nginx/controller:v0.43.0@sha256:2b29d459bb978bc773fcfc824a837bb52d6223aba342411d22d61522e58f811b

@longwuyuan
Copy link
Contributor

longwuyuan commented Mar 13, 2024 via email

@longwuyuan
Copy link
Contributor

/assign

@Gacko
Copy link
Member

Gacko commented Mar 27, 2024

Ingress NGINX is always doing a full test of all Ingresses, so compiling all of them together and checking if the resulting nginx.conf is valid. You can disable this behavior via the --disable-full-test flag: https://kubernetes.github.io/ingress-nginx/user-guide/cli-arguments/

@strongjz
Copy link
Member

controller:v0.43.0 is an extremely old version, please upgrade.

There are several things we are trying to do to help with this issue. It is a known issue

#11046

#10884

Copy link

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

@github-actions github-actions bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Apr 28, 2024
@longwuyuan
Copy link
Contributor

Please re-open the issue when ready to re-engage and move forward on this. Thanls. For now we are closing the issue because there is no traction for long time and open issues are piling up

/close

@k8s-ci-robot
Copy link
Contributor

@longwuyuan: Closing this issue.

In response to this:

Please re-open the issue when ready to re-engage and move forward on this. Thanls. For now we are closing the issue because there is no traction for long time and open issues are piling up

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
Development

No branches or pull requests

5 participants