make buildInformerForCluster configurable #1858

snowplayfire · 2022-05-21T19:00:18Z

What type of PR is this?

/kind feature

What this PR does / why we need it:
When member cluster is large(3k nodes, 10w pods), karmada-controller-manager is easy to be OOMkilled and retry to sync nodes and pods again and again, the member cluster also stays in unknown.

Whether to execute buildInformerForCluster() to sync nodes and pods should be configurable especially for large cluster.

Does this PR introduce a user-facing change?:

A new flag `build-informer-for-cluster` for controller-manager and agent.

karmada-bot · 2022-05-21T19:00:22Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign kevin-wangzefeng after the PR has been reviewed.
You can assign the PR to them by writing /assign @kevin-wangzefeng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

RainbowMango · 2022-05-23T01:47:06Z

When member cluster is large(3k nodes, 10w pods), karmada-controller-manager is easy to be OOMkilled

That's very bad.
I wonder to know how many memory reserved for karmada-controller-manager?

dddddai · 2022-05-23T02:52:53Z

Yeah I have the same concern, control-plane stores all pods and nodes(for push mode clusters) in the informer cache, which consumes lots of memory in large scale scenarios

IMO we can transform objects in informers, this is a new feature in client-go 0.24 while we are using 0.23 now

RainbowMango · 2022-05-23T03:51:17Z

@dddddai Great idea, I like this feature! Just created #1866 to track this. Let's do it at release-1.3.

Before that feature, I still want to know more details about how bad the situation is now. It can also evaluate the effect of this feature in the future.

snowplayfire · 2022-05-23T04:36:01Z

When member cluster is large(3k nodes, 10w pods), karmada-controller-manager is easy to be OOMkilled

That's very bad. I wonder to know how many memory reserved for karmada-controller-manager?

One cluster with 3k nodes and 10w pods needs rough100G( far exceed 100G when cluster failure occurs) for kube-controller-manager, so karmada-controller-manager may need more than 100G * n, n is the number of clusters.

RainbowMango · 2022-05-23T13:21:42Z

One cluster with 3k nodes and 10w pods needs rough100G( far exceed 100G when cluster failure occurs) for kube-controller-manager, so karmada-controller-manager may need more than 100G * n, n is the number of clusters.

I'm very surprised! @Poor12 is leading the effort to do performance testing and improvement. He might confirm it later.

But if we don't sync node and pods here, we can't get the resource summary, then the general estimator will always deny workloads as no resources left .

karmada/pkg/estimator/client/general.go

Lines 36 to 39 in d9b043b

    
           resourceSummary := cluster.Status.ResourceSummary 
        
           if resourceSummary == nil { 
        
           	return 0 
        
           }

snowplayfire · 2022-05-30T03:14:05Z

@RainbowMango member cluster probably can calculate resource summary and report to fed cluster.

RainbowMango · 2022-05-30T06:29:47Z

@RainbowMango member cluster probably can calculate resource summary and report to fed cluster.

I'm not sure if you are saying that karmada-agent now takes the responsibility to report resource summary for clusters in Pull mode.

RainbowMango · 2022-06-16T08:11:21Z

I want to let you know that @Poor12 is trying to figure out how serious the issue is.

Let's wait for a while. cc guys who met this situation here.
cc @likakuli

likakuli · 2022-06-16T08:28:27Z

copy that, thx

RainbowMango · 2022-08-05T07:15:42Z

@snowplayfire
We have been doing performance tests on this for the last few weeks. Now we can confirm that there is a performance issue but seems not so bad as mentioned on #1858 (comment).

There are several improvement ideas:

1. change the generic informer to typed informer which is addressed by reactor: use the version informer instead of dynamic #2318
1. Add the cache then we don't need to list all things out. it is addressed by Performance improve for karmada-controller-manager #2034
1. Adopt informer transform func to save memory. It is addressed by Adopt transform func to save memory #2089

I think it's fair enough to make a flag for people to enable/disable the resource-modeling thing.
Since we are going to make changes to the code that this PR also touching, I'd like to move this forward first to avoid conflict.

I'm sorry for let this sit so long, now there has conflicts that need to be resolved, please let me know if you can solve them and push again.

RainbowMango · 2022-08-05T08:17:21Z

cmd/agent/app/options/options.go

@@ -108,5 +110,6 @@ func (o *Options) AddFlags(fs *pflag.FlagSet, allControllers []string) {
 	fs.DurationVar(&o.ResyncPeriod.Duration, "resync-period", 0, "Base frequency the informers are resynced.")
 	fs.IntVar(&o.ConcurrentClusterSyncs, "concurrent-cluster-syncs", 5, "The number of Clusters that are allowed to sync concurrently.")
 	fs.IntVar(&o.ConcurrentWorkSyncs, "concurrent-work-syncs", 5, "The number of Works that are allowed to sync concurrently.")
+	fs.BoolVar(&o.BuildInformerForCluster, "build-informer-for-cluster", true, "Enable this when need cluster informer.")


I suggest change the flag to --enable-cluster-resource-modeling:

Suggested change

fs.BoolVar(&o.BuildInformerForCluster, "build-informer-for-cluster", true, "Enable this when need cluster informer.")

fs.BoolVar(&o.EnableClusterResourceModeling, "enable-cluster-resource-modeling", true,

"Enable means controller would build resource modeling for each cluster by syncing Nodes and Pods resources.\n"+

"The resource modeling might be used by the scheduler to make scheduling decisions in scenario of dynamic replica assignment based on cluster free resources.\n"+

"Disable if it does not fit your cases for better performance.")

Also, I'm thinking if we should add a field to Cluster API.

change the flag to --enable-cluster-resource-modeling, done

add a field to Cluster API, a field to declare synced resources?

add a field to Cluster API, a field to declare synced resources?

never mind. Let's finish this first.

change the flag to --enable-cluster-resource-modeling, done

I also proposed the flag usage, how do you think?

"Enable means controller would build resource modeling for each cluster by syncing Nodes and Pods resources.\n"+ "The resource modeling might be used by the scheduler to make scheduling decisions in scenario of dynamic replica assignment based on cluster free resources.\n"+ "Disable if it does not fit your cases for better performance."

ikaven1024 · 2022-08-05T10:31:30Z

pkg/controllers/status/cluster_status_controller.go

 	}

 	// init the lease controller for every cluster
-	c.initLeaseController(clusterInformerManager.Context(), cluster)
+	c.initLeaseController(context.TODO(), cluster)


if use context.TODO() here, lease lock will not be released lease.

Yes, you are right. We expect the lease controller could be stopped after the cluster is gone.

RainbowMango · 2022-08-10T06:28:49Z

Hi @snowplayfire Your branch has conflicts with the master, you might need to rebase your branch.

By the way, I see you are using your Chinese name in commit author:

Author: ljx李静雪 <xxx@trip.com>

Signed-off-by: jingxueli <jingxueli@trip.com>

snowplayfire · 2022-08-10T07:39:41Z

Hi @snowplayfire Your branch has conflicts with the master, you might need to rebase your branch.

By the way, I see you are using your Chinese name in commit author:
Author: ljx李静雪 <xxx@trip.com>

done

RainbowMango · 2022-08-10T08:04:34Z

pkg/controllers/status/cluster_status_controller.go

@@ -154,7 +156,7 @@ func (c *ClusterStatusController) syncClusterStatus(cluster *clusterv1alpha1.Clu
 	}

 	// skip collecting cluster status if not ready
-	if online && healthy && readyCondition.Status == metav1.ConditionTrue {


Thanks for your quick response.
But I think the flag is used to disable resource modeling which means only disable collect Nodes and Pods.

When I looked at the code, I found it's difficult to just disable Nodes/Pods collection because the lease controller is using the context.

Maybe we can do this by following, and after that, I can send another patch to handle the context thing.

diff --git a/pkg/controllers/status/cluster_status_controller.go b/pkg/controllers/status/cluster_status_controller.go index ddc4241a..01ea6585 100644 --- a/pkg/controllers/status/cluster_status_controller.go +++ b/pkg/controllers/status/cluster_status_controller.go @@ -181,20 +181,22 @@ func (c *ClusterStatusController) syncClusterStatus(cluster *clusterv1alpha1.Clu klog.Warningf("Maybe get partial(%d) APIs installed in Cluster %s. Error: %v.", len(apiEnables), cluster.GetName(), err) } - nodes, err := listNodes(clusterInformerManager) - if err != nil { - klog.Errorf("Failed to list nodes for Cluster %s. Error: %v.", cluster.GetName(), err) - } - - pods, err := listPods(clusterInformerManager) - if err != nil { - klog.Errorf("Failed to list pods for Cluster %s. Error: %v.", cluster.GetName(), err) - } - currentClusterStatus.KubernetesVersion = clusterVersion currentClusterStatus.APIEnablements = apiEnables - currentClusterStatus.NodeSummary = getNodeSummary(nodes) - currentClusterStatus.ResourceSummary = getResourceSummary(nodes, pods) + + if c.EnableClusterResourceModeling { + nodes, err := listNodes(clusterInformerManager) + if err != nil { + klog.Errorf("Failed to list nodes for Cluster %s. Error: %v.", cluster.GetName(), err) + } + + pods, err := listPods(clusterInformerManager) + if err != nil { + klog.Errorf("Failed to list pods for Cluster %s. Error: %v.", cluster.GetName(), err) + } + currentClusterStatus.NodeSummary = getNodeSummary(nodes) + currentClusterStatus.ResourceSummary = getResourceSummary(nodes, pods) + } } setTransitionTime(currentClusterStatus.Conditions, readyCondition) @@ -252,7 +254,10 @@ func (c *ClusterStatusController) buildInformerForCluster(cluster *clusterv1alph singleClusterInformerManager = c.InformerManager.ForCluster(clusterClient.ClusterName, clusterClient.DynamicClientSet, 0) } - gvrs := []schema.GroupVersionResource{nodeGVR, podGVR} + var gvrs []schema.GroupVersionResource + if c.EnableClusterResourceModeling { + gvrs = append(gvrs, nodeGVR, podGVR) + } // create the informer for pods and nodes allSynced := true

RainbowMango · 2022-08-16T13:25:31Z

Hi @snowplayfire I picked your commit and fixed the conflicts as well as my comments at #2387.

karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label May 21, 2022

karmada-bot requested review from lonelyCZ and XiShanYongYe-Chang May 21, 2022 19:00

karmada-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 21, 2022

snowplayfire force-pushed the flag-clusterInformer branch 2 times, most recently from 0ea75d7 to 61ea25b Compare May 21, 2022 19:25

RainbowMango mentioned this pull request May 23, 2022

Transform objects in informer #1866

Closed

1 task

Poor12 mentioned this pull request Jun 20, 2022

Performance improve for karmada-controller-manager #2034

Closed

dddddai mentioned this pull request Jun 29, 2022

Adopt transform func to save memory #2089

Merged

RainbowMango reviewed Aug 5, 2022

View reviewed changes

ikaven1024 reviewed Aug 5, 2022

View reviewed changes

snowplayfire force-pushed the flag-clusterInformer branch from 46b355b to 527c0b5 Compare August 10, 2022 04:19

snowplayfire force-pushed the flag-clusterInformer branch from 527c0b5 to c671056 Compare August 10, 2022 07:11

karmada-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 10, 2022

make buildInformerForCluster configurable

9f4e485

Signed-off-by: jingxueli <jingxueli@trip.com>

snowplayfire force-pushed the flag-clusterInformer branch from c671056 to 9f4e485 Compare August 10, 2022 07:13

RainbowMango reviewed Aug 10, 2022

View reviewed changes

This was referenced Aug 16, 2022

Start lease controller with separated context. #2385

Merged

Introduce --enable-cluster-resource-modeling to customize resource modeling feature #2387

Merged

karmada-bot closed this in #2387 Aug 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make buildInformerForCluster configurable #1858

make buildInformerForCluster configurable #1858

snowplayfire commented May 21, 2022 •

edited

Loading

karmada-bot commented May 21, 2022

RainbowMango commented May 23, 2022

dddddai commented May 23, 2022 •

edited

Loading

RainbowMango commented May 23, 2022

snowplayfire commented May 23, 2022 •

edited

Loading

RainbowMango commented May 23, 2022

snowplayfire commented May 30, 2022

RainbowMango commented May 30, 2022

RainbowMango commented Jun 16, 2022

likakuli commented Jun 16, 2022

RainbowMango commented Aug 5, 2022

RainbowMango Aug 5, 2022

snowplayfire Aug 10, 2022

snowplayfire Aug 10, 2022

RainbowMango Aug 10, 2022

RainbowMango Aug 10, 2022

snowplayfire Aug 10, 2022

ikaven1024 Aug 5, 2022

RainbowMango Aug 6, 2022

snowplayfire Aug 10, 2022

RainbowMango commented Aug 10, 2022

snowplayfire commented Aug 10, 2022

RainbowMango Aug 10, 2022

RainbowMango commented Aug 16, 2022

-	fs.BoolVar(&o.BuildInformerForCluster, "build-informer-for-cluster", true, "Enable this when need cluster informer.")
+	fs.BoolVar(&o.EnableClusterResourceModeling, "enable-cluster-resource-modeling", true,
+		"Enable means controller would build resource modeling for each cluster by syncing Nodes and Pods resources.\n"+
+			"The resource modeling might be used by the scheduler to make scheduling decisions in scenario of dynamic replica assignment based on cluster free resources.\n"+
+			"Disable if it does not fit your cases for better performance.")

make buildInformerForCluster configurable #1858

make buildInformerForCluster configurable #1858

Conversation

snowplayfire commented May 21, 2022 • edited Loading

karmada-bot commented May 21, 2022

RainbowMango commented May 23, 2022

dddddai commented May 23, 2022 • edited Loading

RainbowMango commented May 23, 2022

snowplayfire commented May 23, 2022 • edited Loading

RainbowMango commented May 23, 2022

snowplayfire commented May 30, 2022

RainbowMango commented May 30, 2022

RainbowMango commented Jun 16, 2022

likakuli commented Jun 16, 2022

RainbowMango commented Aug 5, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RainbowMango commented Aug 10, 2022

snowplayfire commented Aug 10, 2022

Choose a reason for hiding this comment

RainbowMango commented Aug 16, 2022

snowplayfire commented May 21, 2022 •

edited

Loading

dddddai commented May 23, 2022 •

edited

Loading

snowplayfire commented May 23, 2022 •

edited

Loading