Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove client QPS limit #95825

Closed
wants to merge 1 commit into from
Closed

Conversation

chensheng0
Copy link

@chensheng0 chensheng0 commented Oct 23, 2020

What type of PR is this?

/kind bug

What this PR does / why we need it:

Recently we are using client-go in my cloud server in which we will interact with k8s cluster. When there are a lot of requests reached our server, we will allocate one goroutine per request. In the request we will use client in client-go to invoke k8s apiserver. After a while, we found a lot of goroutines got stucked in r.tryThrottle() in Do() and in result caused goroutines' leak in our server.

// func (r *Request) Do(ctx context.Context) Result { ... }

if err := r.tryThrottle(ctx); err != nil {
	return err
}

The QPS limit of client will influence client-go's performance, how about using no rate limiting when new a client ?

As API Server has already had --max-mutating-requests-inflight and --max-requests-inflight to prevent API Server overload from client requests, and API Server use this client in most scenes when client request is accepted. So I think we should remove this client QPS limit to raise client-go's concurrency.

Otherwise many developer will get stuck on this bug.

This pr is similar to the same bug of apiserver. #80465

Which issue(s) this PR fixes:

just same as the bug mentioned in #78766

Special notes for your reviewer:

Does this PR introduce a user-facing change?

None

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.

None
NONE

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 23, 2020
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 23, 2020
@k8s-ci-robot
Copy link
Contributor

Welcome @chensheng0!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 23, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @chensheng0. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 23, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chensheng0
To complete the pull request process, please assign lavalamp after the PR has been reviewed.
You can assign the PR to them by writing /assign @lavalamp in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@chensheng0
Copy link
Author

chensheng0 commented Oct 23, 2020

I signed it

image

@fedebongio
Copy link
Contributor

/assign @DirectXMan12
Solly I believe you were looking at something related? could you take a look to this one? Thanks!
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 27, 2020
@deads2k
Copy link
Contributor

deads2k commented Oct 27, 2020

The rate limiter can be adjusted by code level integrations and many commands have flags that can specify these values. The ratelimiter itself has worked well and as intended in our code. The max inflight doesn't prioritize requests, so a single accidental actor can still flood the apiserver and this helps.

Rather than remove this, your code could choose to disable it (or maybe you have a bug and want it on?) or perhaps you could make the implementation of the ratelimiter more efficient.

/hold

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Oct 27, 2020
@chensheng0
Copy link
Author

The rate limiter can be adjusted by code level integrations and many commands have flags that can specify these values. The ratelimiter itself has worked well and as intended in our code. The max inflight doesn't prioritize requests, so a single accidental actor can still flood the apiserver and this helps.

Rather than remove this, your code could choose to disable it (or maybe you have a bug and want it on?) or perhaps you could make the implementation of the ratelimiter more efficient.

/hold

The point is the default behavior of the client whether to limit qps or not. For stability, it should limit.

But the problem we mostly may encounter as the fact is many developers will ignore the qps limit and cause a series of problems just like goroutines' leak. @aojea also mentioned rate limit in openshift/origin#25606. So how about add some tips in client-go's readme or other place to let developers notice that?

@aojea
Copy link
Member

aojea commented Oct 28, 2020

Absolutely agree with @deads2k
I think that documentation is the way to go ...

@chensheng0
Copy link
Author

Absolutely agree with @deads2k
I think that documentation is the way to go ...

Any guidelines of docs, maybe I can do some work ^_^

@DirectXMan12
Copy link
Contributor

FWIW, depending on how you use the client, 5 QPS is rather low, and it's generally not obvious to end-users that client-side throttling is occurring. We've had several users hit this issue, and had to manually raise the default in CR, just like controller-manager does.

I'm not certain about removing the limit entirely, but the fact that:

a) controller-manager doesn't use the default, IIRC
b) controller-runtime doesn't use the default
c) several issues across several repos have been filed about this

suggests that maybe the default or the defaulting strategy is not the best

@chensheng0 chensheng0 closed this Jan 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/bug Categorizes issue or PR as related to a bug. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants