-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump discovery burst for kubectl to 300 #105520
Conversation
/triage accept |
@soltysh: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/triage accepted |
@@ -263,9 +263,6 @@ func (f *ConfigFlags) toDiscoveryClient() (discovery.CachedDiscoveryInterface, e | |||
return nil, err | |||
} | |||
|
|||
// The more groups you have, the more discovery requests you need to make. | |||
// given 25 groups (our groups + a few custom resources) with one-ish version each, discovery needs to make 50 requests | |||
// double it just so we don't end up here again for a while. This config is only used for discovery. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in b3dad83 ... I guess 3 years qualifies as "a while"
kubeConfigFlags := genericclioptions.NewConfigFlags(true).WithDeprecatedPasswordFlag() | ||
// The more groups you have, the more discovery requests you need to make. | ||
// given 25 groups (our groups + a few custom resources) with one-ish version each, discovery needs to make 50 requests | ||
// tripple it just so we don't end up here again for a while. This is updated from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this burst seems like the kubernetes equivalent of the debt ceiling ... if our response every time we hit it is to raise it, I'm not sure I see the point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we should disable this and let APF slow the client if necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm, I'll re-work this to entirely remove this functionality from kubectl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, the burst is part of client-go:
// If it's zero, the created RESTClient will use DefaultBurst: 10. |
and the default there is even smaller than we we already set in kubectl. With that I'm seeing two options, we either drop it even from client-go, but I'll let you decided that, or we can set it artificially big in kubectl, 999, for example. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting it to -1 will disable it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the core point of this comment that increasing the value is kicking the can down the road.
However is disabling it completely not a risk that something like kubectl breaks the API server? That'd be annoying 😄
So I'm unsure if the real solution should be more something like @justinsb suggested where we lower the amount of API requests needed in the first place.
That being said, the change in this PR probably helps us to kick the can down just a little further and buy more time to implement a root-cause fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think we should disable this and let APF slow the client if necessary
Without knowing the full details of how APF would handle this, this would be my preference. Along with @justinsb's suggestion that the discovery process potentially be revisited to make fewer requests where possible. Full details in kubernetes/kubectl#1126 (comment), but we have cases where it's possible ~2,000 CRDs may end up installed and I don't think an additional 50 burst qps is going to make a meaningful difference in that situation.
I was curious whether this PR would fix issues I've been seeing with discovery taking forever when there are many (hundreds) of CRDs, but found it did not work:
I opened #106016 which takes an alternative approach. |
How do folks feel about this (or #106016) as a candidate for cherry-picking? We in @crossplane land have a feature that uses a lot of CRDs and is thus pretty degraded by huge (6+ minute) discovery wait times so we'd really appreciate being able to get this fix into the hands of our users. That said, I understand that removing client-side rate limits could be a hard sell as a cherry-pick. |
/retest |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: eddiezane, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
I just updated to the latest kubectl version via brew and see no improvement to the behavior before.
The commit ab69524 has the change of this PR:
After removing the local cache via
This cluster has 295 CRDs and 125 group versions:
This jq query was first introduced in #101634 (comment) and adapted to account for all versions, not just the first, as mentioned in @lavalamp's comment #101634 (comment). If the cache is not there, there are 170 GET requests made. Once it is there, then only 4 GET requests are made (3 of the 4 are to
|
This is a follow up to kubernetes#105520 which only changed the new default config flags in the `NewKubectlCommand` function if `kubeConfigFlags == nil`. However they are not nil because they were initialized before here: https://github.com/kubernetes/kubernetes/blob/2fe968deb6cef4feea5bd0eb435e71844e397eed/staging/src/k8s.io/kubectl/pkg/cmd/cmd.go#L97 This fix uses the same defaults for both functions Signed-off-by: Jonny Langefeld <jonny.langefeld@gmail.com>
I debugged this a bit and noticed that
So the changes in this PR of adding .WithDiscoveryBurst(300).WithDiscoveryQPS(50.0) never came into effect.
I created a fix with #107131. The results are evident as on the current master branch we still get the |
This is a follow up to kubernetes#105520 which only changed the new default config flags in the `NewKubectlCommand` function if `kubeConfigFlags == nil`. However they are not nil because they were initialized before here: https://github.com/kubernetes/kubernetes/blob/2fe968deb6cef4feea5bd0eb435e71844e397eed/staging/src/k8s.io/kubectl/pkg/cmd/cmd.go#L97 This fix uses the same defaults for both functions Signed-off-by: Jonny Langefeld <jonny.langefeld@gmail.com>
Bump discovery burst for kubectl to 300
Bump discovery burst for kubectl to 300
What type of PR is this?
/kind cleanup
/sig cli
/priority backlog
What this PR does / why we need it:
This bumps discovery burst for kubectl command from 100 defined in
kubernetes/staging/src/k8s.io/cli-runtime/pkg/genericclioptions/config_flags.go
Lines 416 to 419 in a861de6
Which issue(s) this PR fixes:
Fixes kubernetes/kubectl#1126
Special notes for your reviewer:
/assign @seans3 @lavalamp @justinsb
Does this PR introduce a user-facing change?