Bump discovery burst for kubectl to 300 #105520

soltysh · 2021-10-06T17:10:41Z

What type of PR is this?

/kind cleanup
/sig cli
/priority backlog

What this PR does / why we need it:

This bumps discovery burst for kubectl command from 100 defined in

kubernetes/staging/src/k8s.io/cli-runtime/pkg/genericclioptions/config_flags.go

Lines 416 to 419 in a861de6

    
           // The more groups you have, the more discovery requests you need to make. 
        
           // given 25 groups (our groups + a few custom resources) with one-ish version each, discovery needs to make 50 requests 
        
           // double it just so we don't end up here again for a while.  This config is only used for discovery. 
        
           discoveryBurst: 100,

to 150.

Which issue(s) this PR fixes:

Fixes kubernetes/kubectl#1126

Special notes for your reviewer:

/assign @seans3 @lavalamp @justinsb

Does this PR introduce a user-facing change?

NONE

soltysh · 2021-10-06T17:11:29Z

/triage accept

k8s-ci-robot · 2021-10-06T17:11:30Z

@soltysh: The label(s) triage/accept cannot be applied, because the repository doesn't have them.

In response to this:

/triage accept

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

soltysh · 2021-10-06T17:11:38Z

/triage accepted

liggitt · 2021-10-06T17:39:26Z

staging/src/k8s.io/cli-runtime/pkg/genericclioptions/config_flags.go

@@ -263,9 +263,6 @@ func (f *ConfigFlags) toDiscoveryClient() (discovery.CachedDiscoveryInterface, e
 		return nil, err
 	}

-	// The more groups you have, the more discovery requests you need to make.
-	// given 25 groups (our groups + a few custom resources) with one-ish version each, discovery needs to make 50 requests
-	// double it just so we don't end up here again for a while.  This config is only used for discovery.


added in b3dad83 ... I guess 3 years qualifies as "a while"

liggitt · 2021-10-06T17:40:15Z

staging/src/k8s.io/kubectl/pkg/cmd/cmd.go

-	kubeConfigFlags := genericclioptions.NewConfigFlags(true).WithDeprecatedPasswordFlag()
+	// The more groups you have, the more discovery requests you need to make.
+	// given 25 groups (our groups + a few custom resources) with one-ish version each, discovery needs to make 50 requests
+	// tripple it just so we don't end up here again for a while. This is updated from the


this burst seems like the kubernetes equivalent of the debt ceiling ... if our response every time we hit it is to raise it, I'm not sure I see the point

Yeah, I think we should disable this and let APF slow the client if necessary

~~sgtm, I'll re-work this to entirely remove this functionality from kubectl~~

actually, the burst is part of client-go:

kubernetes/staging/src/k8s.io/client-go/rest/config.go

Line 120 in 79ee735

// If it's zero, the created RESTClient will use DefaultBurst: 10.

and the default there is even smaller than we we already set in kubectl. With that I'm seeing two options, we either drop it even from client-go, but I'll let you decided that, or we can set it artificially big in kubectl, 999, for example. wdyt?

Setting it to -1 will disable it.

I agree with the core point of this comment that increasing the value is kicking the can down the road.
However is disabling it completely not a risk that something like kubectl breaks the API server? That'd be annoying 😄

So I'm unsure if the real solution should be more something like @justinsb suggested where we lower the amount of API requests needed in the first place.

That being said, the change in this PR probably helps us to kick the can down just a little further and buy more time to implement a root-cause fix.

Yeah, I think we should disable this and let APF slow the client if necessary

Without knowing the full details of how APF would handle this, this would be my preference. Along with @justinsb's suggestion that the discovery process potentially be revisited to make fewer requests where possible. Full details in kubernetes/kubectl#1126 (comment), but we have cases where it's possible ~2,000 CRDs may end up installed and I don't think an additional 50 burst qps is going to make a meaningful difference in that situation.

soltysh · 2021-10-29T14:09:00Z

@lavalamp @seans3 disabled it for kubectl, ptal

negz · 2021-10-30T00:52:06Z

I was curious whether this PR would fix issues I've been seeing with discovery taking forever when there are many (hundreds) of CRDs, but found it did not work:

$ KUBECONFIG=~/control/negz/crossplane-scale/cluster-aws.kcfg _output/dockerized/bin/linux/amd64/kubectl get nodes
error: rate: Wait(n=1) exceeds limiter's burst -1

I opened #106016 which takes an alternative approach.

negz · 2021-11-01T19:37:49Z

How do folks feel about this (or #106016) as a candidate for cherry-picking? We in @crossplane land have a feature that uses a lot of CRDs and is thus pretty degraded by huge (6+ minute) discovery wait times so we'd really appreciate being able to get this fix into the hands of our users. That said, I understand that removing client-side rate limits could be a hard sell as a cherry-pick.

soltysh · 2021-11-02T15:14:30Z

/retest

k8s-ci-robot · 2021-11-16T22:38:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eddiezane, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~staging/src/k8s.io/cli-runtime/OWNERS~~ [eddiezane,soltysh]
~~staging/src/k8s.io/kubectl/OWNERS~~ [eddiezane,soltysh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-triage-robot · 2021-11-17T00:59:03Z

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

The PR does have any do-not-merge/* labels
The PR does not have the needs-ok-to-test label
The PR is mergeable (does not have a needs-rebase label)
The PR is approved (has cncf-cla: yes, lgtm, approved labels)
The PR is failing tests required for merge

You can:

Review the full test history for this PR
Prevent this bot from retesting with /lgtm cancel or /hold
Help make our tests less flaky by following our Flaky Tests Guide

/retest

jonnylangefeld · 2021-12-10T03:40:46Z

I just updated to the latest kubectl version via brew and see no improvement to the behavior before.

╰─ kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.0", GitCommit:"ab69524f795c42094a6630298ff53f3c3ebab7f4", GitTreeState:"clean", BuildDate:"2021-12-07T18:08:39Z", GoVersion:"go1.17.3", Compiler:"gc", Platform:"darwin/amd64"}

The commit ab69524 has the change of this PR:

kubernetes/staging/src/k8s.io/kubectl/pkg/cmd/cmd.go

Line 297 in ab69524

    
           kubeConfigFlags = genericclioptions.NewConfigFlags(true).WithDeprecatedPasswordFlag().WithDiscoveryBurst(300).WithDiscoveryQPS(50.0)

After removing the local cache via rm -rf ~/.kube/cache/discovery/<HOST_IP>, I still get throttled

╰─ kubectl get pods
I1209 19:22:06.899491   29710 request.go:665] Waited for 1.190499503s due to client-side throttling, not priority and fairness, request: GET:https://10.216.1.114/apis/cloudscheduler.cnrm.cloud.google.com/v1beta1?timeout=32s
I1209 19:22:17.101035   29710 request.go:665] Waited for 11.391773142s due to client-side throttling, not priority and fairness, request: GET:https://10.216.1.114/apis/bigtable.cnrm.cloud.google.com/v1beta1?timeout=32s

This cluster has 295 CRDs and 125 group versions:

╰─ kubectl get crd | wc -l
     295

╰─ kubectl get crd -o json | jq -r '.items[].spec | "Group: " + .group + "; Version: " + .versions[].name' | sort | uniq | wc -l
     125

This jq query was first introduced in #101634 (comment) and adapted to account for all versions, not just the first, as mentioned in @lavalamp's comment #101634 (comment).

If the cache is not there, there are 170 GET requests made. Once it is there, then only 4 GET requests are made (3 of the 4 are to /apis/external.metrics.k8s.io/v1beta1).

╰─ kubectl get pods -v 8 2>&1 | grep "GET https" | wc -l
     170

╰─ kubectl get pods -v 8 2>&1 | grep "GET https" | wc -l
       4

This is a follow up to kubernetes#105520 which only changed the new default config flags in the `NewKubectlCommand` function if `kubeConfigFlags == nil`. However they are not nil because they were initialized before here: https://github.com/kubernetes/kubernetes/blob/2fe968deb6cef4feea5bd0eb435e71844e397eed/staging/src/k8s.io/kubectl/pkg/cmd/cmd.go#L97 This fix uses the same defaults for both functions Signed-off-by: Jonny Langefeld <jonny.langefeld@gmail.com>

jonnylangefeld · 2021-12-20T04:08:47Z

I debugged this a bit and noticed that kubeConfigFlags is never nil (or at least not in a regular kubectl command) because it is already initialized here:

kubernetes/staging/src/k8s.io/kubectl/pkg/cmd/cmd.go

Line 97 in 2fe968d

    
           ConfigFlags:   genericclioptions.NewConfigFlags(true).WithDeprecatedPasswordFlag(),

So the changes in this PR of adding .WithDiscoveryBurst(300).WithDiscoveryQPS(50.0) never came into effect.

I created a fix with #107131. The results are evident as on the current master branch we still get the Waited for 1.190499503s due to client-side throttling and they don't come up with the fix anymore. It's also a bit faster finally as it's not running into the rate limiting. It still does 100s of unnecessary requests when the cash is invalid, but I opened a separate issue for that.

This is a follow up to kubernetes#105520 which only changed the new default config flags in the `NewKubectlCommand` function if `kubeConfigFlags == nil`. However they are not nil because they were initialized before here: https://github.com/kubernetes/kubernetes/blob/2fe968deb6cef4feea5bd0eb435e71844e397eed/staging/src/k8s.io/kubectl/pkg/cmd/cmd.go#L97 This fix uses the same defaults for both functions Signed-off-by: Jonny Langefeld <jonny.langefeld@gmail.com>

Bump discovery burst for kubectl to 300

k8s-ci-robot assigned justinsb, lavalamp and seans3 Oct 6, 2021

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 6, 2021

k8s-ci-robot requested review from deads2k and dixudx October 6, 2021 17:11

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubectl labels Oct 6, 2021

liggitt reviewed Oct 6, 2021

View reviewed changes

soltysh force-pushed the bump_burst branch from 74ecf3f to 31bb0ee Compare October 29, 2021 14:08

soltysh mentioned this pull request Oct 29, 2021

Discovery is throttled when there are lots of resources (CRDs) kubernetes/kubectl#1126

Closed

This was referenced Oct 29, 2021

Don't rate limit discovery clients by default? kubernetes-sigs/controller-runtime#1707

Closed

API Server (and clients) becomes unresponsive with too many CRDs crossplane/crossplane#2649

Closed

Don't rate limit kubectl API discovery; defer to APF #106016

Closed

eddiezane approved these changes Nov 16, 2021

View reviewed changes

k8s-ci-robot assigned eddiezane Nov 16, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2021

k8s-ci-robot merged commit 0c47669 into kubernetes:master Nov 17, 2021

soltysh deleted the bump_burst branch November 17, 2021 09:38

This was referenced Dec 20, 2021

Make ttl of discovery cache configurable and raise default #107130

Closed

Fix default config flags #107131

Merged

anencore94 mentioned this pull request Feb 26, 2022

kubectl get <something> command return - Throttling request log kubernetes/client-go#1029

Closed

eddiezane mentioned this pull request Mar 2, 2022

Automated cherry pick of #107131: Fix default config flags #108401

Merged

ulucinar mentioned this pull request Mar 30, 2022

Bump default burst limit for discovery client to 300 #109141

Merged

liubog2008 mentioned this pull request Apr 19, 2022

Support only install partial CRDs GoogleCloudPlatform/k8s-config-connector#650

Open

3 tasks

negz mentioned this pull request Jun 29, 2022

Add ability to disable certain CRDs on installation. crossplane/crossplane#2869

Closed

2 tasks

chobostar mentioned this pull request Jul 11, 2022

Support discovery cache or up both QPS/BurstQPS tektoncd/cli#1641

Closed

jashandeep-sohi mentioned this pull request Jul 14, 2022

Adjust Discovery throttling kptdev/kpt#3368

Merged

negz mentioned this pull request Aug 17, 2022

Disable client-side rate-limiting when AP&F is enabled #111880

Open

negz mentioned this pull request Aug 27, 2022

Encourage Kubernetes clients to bump discovery rate limits to 300+ rps crossplane/crossplane#3272

Closed

YitzyD pushed a commit to YitzyD/kubernetes that referenced this pull request Mar 1, 2023

Merge pull request kubernetes#105520 from soltysh/bump_burst

15bcfd9

Bump discovery burst for kubectl to 300

YitzyD pushed a commit to YitzyD/kubernetes that referenced this pull request Mar 1, 2023

Merge pull request kubernetes#105520 from soltysh/bump_burst

915ecf9

Bump discovery burst for kubectl to 300

CecileRobertMichon mentioned this pull request Mar 29, 2023

🐛 make clusterctl REST client throttling configurable kubernetes-sigs/cluster-api#8411

Closed

negz mentioned this pull request May 6, 2023

Support only the Kubernetes releases that Kubernetes supports crossplane/crossplane#4049

Closed

vadasambar mentioned this pull request Aug 17, 2023

refactor: expose default kubectl config #120024

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump discovery burst for kubectl to 300 #105520

Bump discovery burst for kubectl to 300 #105520

soltysh commented Oct 6, 2021

soltysh commented Oct 6, 2021

k8s-ci-robot commented Oct 6, 2021

soltysh commented Oct 6, 2021

liggitt Oct 6, 2021

liggitt Oct 6, 2021

lavalamp Oct 6, 2021

soltysh Oct 7, 2021 •

edited

Loading

soltysh Oct 7, 2021

lavalamp Oct 7, 2021

jonnylangefeld Oct 8, 2021

negz Oct 29, 2021

soltysh commented Oct 29, 2021

negz commented Oct 30, 2021

negz commented Nov 1, 2021

soltysh commented Nov 2, 2021

k8s-ci-robot commented Nov 16, 2021

k8s-triage-robot commented Nov 17, 2021

jonnylangefeld commented Dec 10, 2021 •

edited

Loading

jonnylangefeld commented Dec 20, 2021 •

edited

Loading

	// The more groups you have, the more discovery requests you need to make.
	// given 25 groups (our groups + a few custom resources) with one-ish version each, discovery needs to make 50 requests
	// double it just so we don't end up here again for a while. This config is only used for discovery.
	discoveryBurst: 100,

Bump discovery burst for kubectl to 300 #105520

Bump discovery burst for kubectl to 300 #105520

Conversation

soltysh commented Oct 6, 2021

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

soltysh commented Oct 6, 2021

k8s-ci-robot commented Oct 6, 2021

soltysh commented Oct 6, 2021

liggitt Oct 6, 2021

Choose a reason for hiding this comment

liggitt Oct 6, 2021

Choose a reason for hiding this comment

lavalamp Oct 6, 2021

Choose a reason for hiding this comment

soltysh Oct 7, 2021 • edited Loading

Choose a reason for hiding this comment

soltysh Oct 7, 2021

Choose a reason for hiding this comment

lavalamp Oct 7, 2021

Choose a reason for hiding this comment

jonnylangefeld Oct 8, 2021

Choose a reason for hiding this comment

negz Oct 29, 2021

Choose a reason for hiding this comment

soltysh commented Oct 29, 2021

negz commented Oct 30, 2021

negz commented Nov 1, 2021

soltysh commented Nov 2, 2021

k8s-ci-robot commented Nov 16, 2021

k8s-triage-robot commented Nov 17, 2021

jonnylangefeld commented Dec 10, 2021 • edited Loading

jonnylangefeld commented Dec 20, 2021 • edited Loading

soltysh Oct 7, 2021 •

edited

Loading

jonnylangefeld commented Dec 10, 2021 •

edited

Loading

jonnylangefeld commented Dec 20, 2021 •

edited

Loading