Skip to content
This repository has been archived by the owner on Nov 9, 2022. It is now read-only.

Fix discovery rate limits #112

Merged
merged 3 commits into from
Feb 18, 2021

Conversation

timebertt
Copy link
Contributor

How to categorize this PR?

/area robustness
/kind bug
/priority normal

What this PR does / why we need it:

Follow-up on #111.

We observed many throttled discovery calls (see #109), because the target REST mapper was rediscovering resources with a rate of 1 / 10s, but the default QPS/burst settings of 5 / 100 were used in the discovery client.
For grm's targeting shoot clusters, grm tries to do a rediscovery in every reconciliation loop because it checks if any HVPA resources are present, that scale resources applied by grm.

This becomes a problem in conjunction with unavailable APIServices (the discovery client will run into the default timeout of 32s) and the reconciliation timeouts added in #102. I.e. if there are unavailable APIServices the throttled discovery call will take long enough to exceed the reconciliations timeout, giving the actual reconciliation no chance to do its work.

Additionally this PR:

  • adds the entire k8s scheme to the source client instead of only the core scheme
  • deduplicates client-side warnings about deprecated API versions

Which issue(s) this PR fixes:
Fixes #109

Special notes for your reviewer:

Release note:

A problem with long running ManagedResource reconciliations caused by unavailable `APIServices` was fixed.

@timebertt timebertt requested a review from a team as a code owner February 18, 2021 08:31
@gardener-robot gardener-robot added area/robustness Robustness, reliability, resilience related kind/bug Bug priority/normal needs/review Needs review size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) labels Feb 18, 2021
@gardener-robot-ci-1 gardener-robot-ci-1 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Feb 18, 2021
@@ -90,7 +90,7 @@ func (o *SourceClientOptions) Completed() *SourceClientConfig {

func getSourceScheme() *runtime.Scheme {
scheme := runtime.NewScheme()
utilruntime.Must(corev1.AddToScheme(scheme))
utilruntime.Must(kubernetesscheme.AddToScheme(scheme))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: We have seen issues with event reporting for Lease objects now that GRM uses configmapsleases as default leader election resource lock. We decided that we can simply add all Kubernetes APIs to our source scheme here.

Copy link
Contributor

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review labels Feb 18, 2021
@timebertt timebertt changed the title Fix/discovery rate limits Fix discovery rate limits Feb 18, 2021
Copy link
Contributor

@timuthy timuthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
Nice finding!!!!

@timuthy timuthy merged commit a18b9e6 into gardener-attic:master Feb 18, 2021
@timebertt timebertt deleted the fix/discovery-rate-limits branch February 18, 2021 08:47
timebertt added a commit that referenced this pull request Feb 18, 2021
[release-v0.21] Cherry-pick of #112: Fix discovery rate limits
@gardener-robot gardener-robot added priority/3 Priority (lower number equals higher priority) and removed priority/3 Priority (lower number equals higher priority) labels Mar 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/robustness Robustness, reliability, resilience related kind/bug Bug needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/lgtm Has approval for merging size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ManagedResource reconciliation takes more than 20m when APIService in unavailable
5 participants