Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use ClusterCacheSyncTimeout for resources on fed control plane as well #3874

Merged

Conversation

lxtywypc
Copy link
Contributor

@lxtywypc lxtywypc commented Aug 1, 2023

What type of PR is this?
/kind feature

What this PR does / why we need it:
Now, the default timeout for waiting cache syncing in controller-runtime is 2 minutes. If we have a lot resources on the fed control plane, the default timeout might be reached and the process would return with an error like:

E0801 06:15:22.538099       1 controllermanager.go:154] controller manager exits unexpectedly: [failed to wait for build-resource-informers for work caches to sync: timed out waiting for cache to be synced, failed waiting for all runnables to end within grace period of 30s: context deadline exceeded

And the karmada-controller-manager would be hard to restart successfully.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
The log printed above is from karmada-controller-manager v1.3.1, but it still exists in the latest version.

Does this PR introduce a user-facing change?:

`karmada-controller-manager`: The `--cluster-cache-sync-timeout` flag is now used to specify the sync timeout of the control plane cache in addition to the member cluster's cache. The default value has been increased to 2 minutes.

@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 1, 2023
@karmada-bot karmada-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Aug 1, 2023
@lxtywypc lxtywypc force-pushed the enable-cache-sync-timeout-on-fed branch from 5d6df18 to fad172d Compare August 1, 2023 10:22
@@ -68,6 +68,7 @@ type FederatedHPAController struct {
RESTMapper meta.RESTMapper
EventRecorder record.EventRecorder
TypedInformerManager typedmanager.MultiClusterInformerManager
ClusterCacheSyncTimeout metav1.Duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, 100% agree that we need to use the same configuration for informers inside of a controller.

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks @lxtywypc .

I updated the release notes a little bit by the way.

I also revisited the #1112, thanks @snowplayfire for introducing the flag. It means a lot to Karmada's scalability.

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 2, 2023
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 2, 2023
@karmada-bot karmada-bot merged commit a2dc2e8 into karmada-io:master Aug 2, 2023
12 checks passed
@lxtywypc lxtywypc deleted the enable-cache-sync-timeout-on-fed branch August 2, 2023 03:34
@liangyuanpeng
Copy link
Contributor

It's useful , let me cherry pick it to 1.6

@RainbowMango
Copy link
Member

We don't usually cherry-pick features, but considering the severity of this issue(and the fact that you have done it #3880 :)), I'm ok with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants