Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improve for karmada-controller-manager #2034

Closed
wants to merge 1 commit into from

Conversation

Poor12
Copy link
Member

@Poor12 Poor12 commented Jun 20, 2022

Signed-off-by: Poor12 shentiecheng@huawei.com

What type of PR is this?
/kind feature

What this PR does / why we need it:
As is talked in #1858,when member cluster is large(3k nodes, 10w pods), karmada-controller-manager is easy to be OOMkilled and retry to sync nodes and pods again and again.

Based on #2008, I tested the two clusters with 100 nodes and almost 300 pods. We can easily see that in cluster status controllers, list nodes and pods periodicly cost a lot when the cluster is large(a lot of unmarshal works) according to the frame graph.
1

So I modified the mechanism from full amount list to watch based on event to make the memory increase more stable.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:
None

@karmada-bot karmada-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels Jun 20, 2022
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign rainbowmango after the PR has been reviewed.
You can assign the PR to them by writing /assign @rainbowmango in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 20, 2022
@Poor12 Poor12 force-pushed the performance branch 2 times, most recently from 9079179 to 9477ac5 Compare June 20, 2022 07:54
@Poor12 Poor12 marked this pull request as ready for review June 20, 2022 07:55
@karmada-bot karmada-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 20, 2022
@Poor12
Copy link
Member Author

Poor12 commented Jun 20, 2022

cc @RainbowMango

@Poor12 Poor12 force-pushed the performance branch 3 times, most recently from 4817c1e to 7f00732 Compare June 20, 2022 08:40
n.Lock()
defer n.Unlock()

totalNum := len(n.clusterSummary[clusterName].nodeSummaryMap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to call read from map? Will it panic without clusterName key?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out.

runtimeObj, err := helper.GetObjectFromCache(c.RESTMapper, c.InformerManager, fedKey)
if err != nil {
if apierrors.IsNotFound(err) {
c.ClusterSummaryCache.delete(fedKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NotFound error shall be returned?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

miss return statement

func (c *ClusterStatusController) genHandlerUpdateFunc(clusterName string) func(oldObj, newObj interface{}) {
return func(oldObj, newObj interface{}) {
curObj := newObj.(runtime.Object)
if !reflect.DeepEqual(oldObj, newObj) {
Copy link
Member

@ikaven1024 ikaven1024 Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you think about comparing with resource version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok to comparing with obj, because we don't know the type of obj. If comparing with resource version, may add the cost of converting type.

Copy link
Member

@ikaven1024 ikaven1024 Jun 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I test it on my side with go 1.17.5. It tells type assert is more faster than reflect.DeepEqual

var (
	obj1 interface{}
	obj2 interface{}
)

func init() {
        pod1Yaml := `<skipped>`
        pod2Yaml := `<skipped>`

	unstructuredObj1 := &unstructured.Unstructured{}
	unstructuredObj2 := &unstructured.Unstructured{}
	unstructured.UnstructuredJSONScheme.Decode([]byte(pod1Yaml), nil, unstructuredObj1)
	unstructured.UnstructuredJSONScheme.Decode([]byte(pod2Yaml), nil, unstructuredObj2)

        obj1, obj2 = unstructuredObj1, unstructuredObj2
}

func BenchmarkDeepEqual(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_ = reflect.DeepEqual(obj1, obj2)
	}
}

func BenchmarkAssert(b *testing.B) {
	for i := 0; i < b.N; i++ {
		o1, _ := meta.Accessor(obj1)
		o2, _ := meta.Accessor(obj2)
		_ = o1.GetResourceVersion() == o2.GetResourceVersion()
	}
}
goos: darwin
goarch: arm64
pkg: github.com/ikaven1024/gists/k8s/benchmark
BenchmarkDeepEqual
BenchmarkDeepEqual-8   	 7717909	       145.2 ns/op
BenchmarkAssert
BenchmarkAssert-8        	33085802	        36.39 ns/op
PASS

@Poor12 Poor12 force-pushed the performance branch 2 times, most recently from 1000df9 to 8dcdbf6 Compare June 24, 2022 07:36
Signed-off-by: Poor12 <shentiecheng@huawei.com>
@@ -254,9 +264,9 @@ func (c *ClusterStatusController) buildInformerForCluster(cluster *clusterv1alph
// create the informer for pods and nodes
allSynced := true
for _, gvr := range gvrs {
if !singleClusterInformerManager.IsInformerSynced(gvr) {
if !singleClusterInformerManager.IsInformerSynced(gvr) || !singleClusterInformerManager.IsHandlerExist(gvr, c.getEventHandler(cluster.Name)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Informer synced for gvr doesn't mean the eventHandler exit.
Since all controllers share the MultiClusterInformerManager, the informer might be created(and synced from another place).

Comment on lines +244 to +246
if oldMetaInfo.GetResourceVersion() != newMetaInfo.GetResourceVersion() {
return true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks weird that .metadata.ResourceVersion is not changed in an update event.
I doubt it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants