remove client QPS limit #95825

chensheng0 · 2020-10-23T10:06:27Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

Recently we are using client-go in my cloud server in which we will interact with k8s cluster. When there are a lot of requests reached our server, we will allocate one goroutine per request. In the request we will use client in client-go to invoke k8s apiserver. After a while, we found a lot of goroutines got stucked in r.tryThrottle() in Do() and in result caused goroutines' leak in our server.

// func (r *Request) Do(ctx context.Context) Result { ... }

if err := r.tryThrottle(ctx); err != nil {
	return err
}

The QPS limit of client will influence client-go's performance, how about using no rate limiting when new a client ?

As API Server has already had --max-mutating-requests-inflight and --max-requests-inflight to prevent API Server overload from client requests, and API Server use this client in most scenes when client request is accepted. So I think we should remove this client QPS limit to raise client-go's concurrency.

Otherwise many developer will get stuck on this bug.

This pr is similar to the same bug of apiserver. #80465

Which issue(s) this PR fixes:

just same as the bug mentioned in #78766

Special notes for your reviewer:

Does this PR introduce a user-facing change?

None

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.

None

NONE

k8s-ci-robot · 2020-10-23T10:06:31Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.
If you have done the above and are still having issues with the CLA being reported as unsigned, please log a ticket with the Linux Foundation Helpdesk: https://support.linuxfoundation.org/
Should you encounter any issues with the Linux Foundation Helpdesk, send a message to the backup e-mail support address at: login-issues@jira.linuxfoundation.org

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-ci-robot · 2020-10-23T10:06:35Z

Welcome @chensheng0!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2020-10-23T10:06:36Z

Hi @chensheng0. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2020-10-23T10:07:34Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chensheng0
To complete the pull request process, please assign lavalamp after the PR has been reviewed.
You can assign the PR to them by writing /assign @lavalamp in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

staging/src/k8s.io/client-go/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

chensheng0 · 2020-10-23T14:34:14Z

I signed it

fedebongio · 2020-10-27T20:07:38Z

/assign @DirectXMan12
Solly I believe you were looking at something related? could you take a look to this one? Thanks!
/triage accepted

deads2k · 2020-10-27T20:25:46Z

The rate limiter can be adjusted by code level integrations and many commands have flags that can specify these values. The ratelimiter itself has worked well and as intended in our code. The max inflight doesn't prioritize requests, so a single accidental actor can still flood the apiserver and this helps.

Rather than remove this, your code could choose to disable it (or maybe you have a bug and want it on?) or perhaps you could make the implementation of the ratelimiter more efficient.

/hold

chensheng0 · 2020-10-28T02:43:01Z

The rate limiter can be adjusted by code level integrations and many commands have flags that can specify these values. The ratelimiter itself has worked well and as intended in our code. The max inflight doesn't prioritize requests, so a single accidental actor can still flood the apiserver and this helps.

Rather than remove this, your code could choose to disable it (or maybe you have a bug and want it on?) or perhaps you could make the implementation of the ratelimiter more efficient.

/hold

The point is the default behavior of the client whether to limit qps or not. For stability, it should limit.

But the problem we mostly may encounter as the fact is many developers will ignore the qps limit and cause a series of problems just like goroutines' leak. @aojea also mentioned rate limit in openshift/origin#25606. So how about add some tips in client-go's readme or other place to let developers notice that?

aojea · 2020-10-28T07:45:32Z

Absolutely agree with @deads2k
I think that documentation is the way to go ...

chensheng0 · 2020-10-28T12:19:14Z

Absolutely agree with @deads2k
I think that documentation is the way to go ...

Any guidelines of docs, maybe I can do some work ^_^

DirectXMan12 · 2020-10-28T23:30:27Z

FWIW, depending on how you use the client, 5 QPS is rather low, and it's generally not obvious to end-users that client-side throttling is occurring. We've had several users hit this issue, and had to manually raise the default in CR, just like controller-manager does.

I'm not certain about removing the limit entirely, but the fact that:

a) controller-manager doesn't use the default, IIRC
b) controller-runtime doesn't use the default
c) several issues across several repos have been filed about this

suggests that maybe the default or the defaulting strategy is not the best

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Oct 23, 2020

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 23, 2020

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 23, 2020

k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 23, 2020

k8s-ci-robot requested review from juanvallejo and nikhiljindal October 23, 2020 10:07

chensheng0 force-pushed the master branch 3 times, most recently from 43d7c92 to 89b3831 Compare October 23, 2020 14:22

remove client QPS limit

b4b61f5

chensheng0 force-pushed the master branch from 89b3831 to b4b61f5 Compare October 23, 2020 14:30

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Oct 23, 2020

aojea mentioned this pull request Oct 23, 2020

Bug 1886620: deflake e2e test "Application behind service load balancer with PDB is not disrupted " openshift/origin#25606

Merged

chensheng0 mentioned this pull request Oct 26, 2020

Kubernetes 贡献指南 - 面向信仰编程 · /kubernetes-contributor draveness/blog-comments#205

Closed

k8s-ci-robot assigned DirectXMan12 Oct 27, 2020

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 27, 2020

chensheng0 closed this Jan 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove client QPS limit #95825

remove client QPS limit #95825

chensheng0 commented Oct 23, 2020 •

edited

k8s-ci-robot commented Oct 23, 2020

k8s-ci-robot commented Oct 23, 2020

k8s-ci-robot commented Oct 23, 2020

k8s-ci-robot commented Oct 23, 2020

chensheng0 commented Oct 23, 2020 •

edited

fedebongio commented Oct 27, 2020

deads2k commented Oct 27, 2020

chensheng0 commented Oct 28, 2020

aojea commented Oct 28, 2020

chensheng0 commented Oct 28, 2020

DirectXMan12 commented Oct 28, 2020

remove client QPS limit #95825

remove client QPS limit #95825

Conversation

chensheng0 commented Oct 23, 2020 • edited

k8s-ci-robot commented Oct 23, 2020

k8s-ci-robot commented Oct 23, 2020

k8s-ci-robot commented Oct 23, 2020

k8s-ci-robot commented Oct 23, 2020

chensheng0 commented Oct 23, 2020 • edited

fedebongio commented Oct 27, 2020

deads2k commented Oct 27, 2020

chensheng0 commented Oct 28, 2020

aojea commented Oct 28, 2020

chensheng0 commented Oct 28, 2020

DirectXMan12 commented Oct 28, 2020

chensheng0 commented Oct 23, 2020 •

edited

chensheng0 commented Oct 23, 2020 •

edited