Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make buildInformerForCluster configurable #1858

Closed

Conversation

snowplayfire
Copy link
Contributor

@snowplayfire snowplayfire commented May 21, 2022

What type of PR is this?

/kind feature

What this PR does / why we need it:
When member cluster is large(3k nodes, 10w pods), karmada-controller-manager is easy to be OOMkilled and retry to sync nodes and pods again and again, the member cluster also stays in unknown.

Whether to execute buildInformerForCluster() to sync nodes and pods should be configurable especially for large cluster.

Does this PR introduce a user-facing change?:

A new flag `build-informer-for-cluster` for controller-manager and agent.

@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label May 21, 2022
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign kevin-wangzefeng after the PR has been reviewed.
You can assign the PR to them by writing /assign @kevin-wangzefeng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 21, 2022
@snowplayfire snowplayfire force-pushed the flag-clusterInformer branch 2 times, most recently from 0ea75d7 to 61ea25b Compare May 21, 2022 19:25
@RainbowMango
Copy link
Member

When member cluster is large(3k nodes, 10w pods), karmada-controller-manager is easy to be OOMkilled

That's very bad.
I wonder to know how many memory reserved for karmada-controller-manager?

@dddddai
Copy link
Member

dddddai commented May 23, 2022

Yeah I have the same concern, control-plane stores all pods and nodes(for push mode clusters) in the informer cache, which consumes lots of memory in large scale scenarios

IMO we can transform objects in informers, this is a new feature in client-go 0.24 while we are using 0.23 now

@RainbowMango RainbowMango mentioned this pull request May 23, 2022
1 task
@RainbowMango
Copy link
Member

@dddddai Great idea, I like this feature! Just created #1866 to track this. Let's do it at release-1.3.

Before that feature, I still want to know more details about how bad the situation is now. It can also evaluate the effect of this feature in the future.

@snowplayfire
Copy link
Contributor Author

snowplayfire commented May 23, 2022

When member cluster is large(3k nodes, 10w pods), karmada-controller-manager is easy to be OOMkilled

That's very bad. I wonder to know how many memory reserved for karmada-controller-manager?

One cluster with 3k nodes and 10w pods needs rough100G( far exceed 100G when cluster failure occurs) for kube-controller-manager, so karmada-controller-manager may need more than 100G * n, n is the number of clusters.

@RainbowMango
Copy link
Member

One cluster with 3k nodes and 10w pods needs rough100G( far exceed 100G when cluster failure occurs) for kube-controller-manager, so karmada-controller-manager may need more than 100G * n, n is the number of clusters.

I'm very surprised! @Poor12 is leading the effort to do performance testing and improvement. He might confirm it later.

But if we don't sync node and pods here, we can't get the resource summary, then the general estimator will always deny workloads as no resources left .

resourceSummary := cluster.Status.ResourceSummary
if resourceSummary == nil {
return 0
}

@snowplayfire
Copy link
Contributor Author

@RainbowMango member cluster probably can calculate resource summary and report to fed cluster.

@RainbowMango
Copy link
Member

@RainbowMango member cluster probably can calculate resource summary and report to fed cluster.

I'm not sure if you are saying that karmada-agent now takes the responsibility to report resource summary for clusters in Pull mode.

@RainbowMango
Copy link
Member

I want to let you know that @Poor12 is trying to figure out how serious the issue is.

Let's wait for a while. cc guys who met this situation here.
cc @likakuli

@likakuli
Copy link
Contributor

copy that, thx

@RainbowMango
Copy link
Member

@snowplayfire
We have been doing performance tests on this for the last few weeks. Now we can confirm that there is a performance issue but seems not so bad as mentioned on #1858 (comment).

There are several improvement ideas:

I think it's fair enough to make a flag for people to enable/disable the resource-modeling thing.
Since we are going to make changes to the code that this PR also touching, I'd like to move this forward first to avoid conflict.

I'm sorry for let this sit so long, now there has conflicts that need to be resolved, please let me know if you can solve them and push again.

@@ -108,5 +110,6 @@ func (o *Options) AddFlags(fs *pflag.FlagSet, allControllers []string) {
fs.DurationVar(&o.ResyncPeriod.Duration, "resync-period", 0, "Base frequency the informers are resynced.")
fs.IntVar(&o.ConcurrentClusterSyncs, "concurrent-cluster-syncs", 5, "The number of Clusters that are allowed to sync concurrently.")
fs.IntVar(&o.ConcurrentWorkSyncs, "concurrent-work-syncs", 5, "The number of Works that are allowed to sync concurrently.")
fs.BoolVar(&o.BuildInformerForCluster, "build-informer-for-cluster", true, "Enable this when need cluster informer.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest change the flag to --enable-cluster-resource-modeling:

Suggested change
fs.BoolVar(&o.BuildInformerForCluster, "build-informer-for-cluster", true, "Enable this when need cluster informer.")
fs.BoolVar(&o.EnableClusterResourceModeling, "enable-cluster-resource-modeling", true,
"Enable means controller would build resource modeling for each cluster by syncing Nodes and Pods resources.\n"+
"The resource modeling might be used by the scheduler to make scheduling decisions in scenario of dynamic replica assignment based on cluster free resources.\n"+
"Disable if it does not fit your cases for better performance.")

Also, I'm thinking if we should add a field to Cluster API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the flag to --enable-cluster-resource-modeling, done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a field to Cluster API, a field to declare synced resources?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a field to Cluster API, a field to declare synced resources?

never mind. Let's finish this first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the flag to --enable-cluster-resource-modeling, done

I also proposed the flag usage, how do you think?

"Enable means controller would build resource modeling for each cluster by syncing Nodes and Pods resources.\n"+
			"The resource modeling might be used by the scheduler to make scheduling decisions in scenario of dynamic replica assignment based on cluster free resources.\n"+
			"Disable if it does not fit your cases for better performance."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggree

}

// init the lease controller for every cluster
c.initLeaseController(clusterInformerManager.Context(), cluster)
c.initLeaseController(context.TODO(), cluster)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if use context.TODO() here, lease lock will not be released lease.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. We expect the lease controller could be stopped after the cluster is gone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@RainbowMango
Copy link
Member

Hi @snowplayfire Your branch has conflicts with the master, you might need to rebase your branch.

By the way, I see you are using your Chinese name in commit author:

Author: ljx李静雪 <xxx@trip.com>

@karmada-bot karmada-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 10, 2022
Signed-off-by: jingxueli <jingxueli@trip.com>
@snowplayfire
Copy link
Contributor Author

Hi @snowplayfire Your branch has conflicts with the master, you might need to rebase your branch.

By the way, I see you are using your Chinese name in commit author:

Author: ljx李静雪 <xxx@trip.com>

done

@@ -154,7 +156,7 @@ func (c *ClusterStatusController) syncClusterStatus(cluster *clusterv1alpha1.Clu
}

// skip collecting cluster status if not ready
if online && healthy && readyCondition.Status == metav1.ConditionTrue {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your quick response.
But I think the flag is used to disable resource modeling which means only disable collect Nodes and Pods.

When I looked at the code, I found it's difficult to just disable Nodes/Pods collection because the lease controller is using the context.

Maybe we can do this by following, and after that, I can send another patch to handle the context thing.

diff --git a/pkg/controllers/status/cluster_status_controller.go b/pkg/controllers/status/cluster_status_controller.go                          
index ddc4241a..01ea6585 100644                                                                                                                 
--- a/pkg/controllers/status/cluster_status_controller.go                                                                                       
+++ b/pkg/controllers/status/cluster_status_controller.go                                                                                       
@@ -181,20 +181,22 @@ func (c *ClusterStatusController) syncClusterStatus(cluster *clusterv1alpha1.Clu                                          
                        klog.Warningf("Maybe get partial(%d) APIs installed in Cluster %s. Error: %v.", len(apiEnables), cluster.GetName(), err)
                }

-               nodes, err := listNodes(clusterInformerManager)
-               if err != nil {
-                       klog.Errorf("Failed to list nodes for Cluster %s. Error: %v.", cluster.GetName(), err)
-               }
-
-               pods, err := listPods(clusterInformerManager)
-               if err != nil {
-                       klog.Errorf("Failed to list pods for Cluster %s. Error: %v.", cluster.GetName(), err)
-               }
-
                currentClusterStatus.KubernetesVersion = clusterVersion
                currentClusterStatus.APIEnablements = apiEnables
-               currentClusterStatus.NodeSummary = getNodeSummary(nodes)
-               currentClusterStatus.ResourceSummary = getResourceSummary(nodes, pods)
+
+               if c.EnableClusterResourceModeling {
+                       nodes, err := listNodes(clusterInformerManager)
+                       if err != nil {
+                               klog.Errorf("Failed to list nodes for Cluster %s. Error: %v.", cluster.GetName(), err)
+                       }
+
+                       pods, err := listPods(clusterInformerManager)
+                       if err != nil {
+                               klog.Errorf("Failed to list pods for Cluster %s. Error: %v.", cluster.GetName(), err)
+                       }
+                       currentClusterStatus.NodeSummary = getNodeSummary(nodes)
+                       currentClusterStatus.ResourceSummary = getResourceSummary(nodes, pods)
+               }
        }

        setTransitionTime(currentClusterStatus.Conditions, readyCondition)
@@ -252,7 +254,10 @@ func (c *ClusterStatusController) buildInformerForCluster(cluster *clusterv1alph
                singleClusterInformerManager = c.InformerManager.ForCluster(clusterClient.ClusterName, clusterClient.DynamicClientSet, 0)
        }

-       gvrs := []schema.GroupVersionResource{nodeGVR, podGVR}
+       var gvrs []schema.GroupVersionResource
+       if c.EnableClusterResourceModeling {
+               gvrs = append(gvrs, nodeGVR, podGVR)
+       }

        // create the informer for pods and nodes
        allSynced := true

@RainbowMango
Copy link
Member

Hi @snowplayfire I picked your commit and fixed the conflicts as well as my comments at #2387.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants