Performance problem about pod informer #1079

hykych · 2019-09-11T07:37:10Z

	// Create pod informer.
	podInformer := kubeInformerFactory.Core().V1().Pods()

	// Set up an event handler for when pod resources change
	podInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
		AddFunc:    jc.AddPod,
		UpdateFunc: jc.UpdatePod,
		DeleteFunc: jc.DeletePod,
	})

I've seen this informer in the code. Doest it have performance problem since it doesn't filter any kind of pod, when the replicas of tf-operator become larger the api-server will send the same event to each one of tf-operator which brings a lot of pressure to api-server and etcd.

The text was updated successfully, but these errors were encountered:

issue-label-bot · 2019-09-11T07:37:11Z

Issue Label Bot is not confident enough to auto-label this issue. See dashboard for more details.

gaocegege · 2019-09-11T07:45:55Z

Thanks for your issue. If you mean FilteringResourceEventHandler, it applies the provided filter to all events coming in. Thus it does not help us solve the performance issue. many controllers in Kubernetes, such as job_controller, also use ResourceEventHandlerFuncs.

And we have filter logic in jc.AddPod, thus the operator will avoid redundant reconcile.

Welcome any suggestion and help on it.

hykych · 2019-09-11T07:54:55Z

@gaocegege so do you guys have any plan or ideas to solve this? or have you benchmarked to show how severe the problem is?

gaocegege · 2019-09-11T08:08:28Z

Benchmark is here #829

gaocegege · 2019-09-11T08:27:09Z

@hykych Can you give me more info about your cluster? I think tf-operator will not be the bottleneck.

jtfogarty · 2020-01-14T20:27:49Z

/area engprod
/priority p2

stale · 2020-04-20T07:56:31Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

gaocegege added the kind/question label Oct 10, 2019

k8s-ci-robot added area/engprod priority/p2 labels Jan 14, 2020

terrytangyuan mentioned this issue Apr 12, 2020

Continue to optimize reconciler performance and reduce latency to take actions on CR events kubeflow/common#68

Open

stale bot added the lifecycle/stale label Apr 20, 2020

stale bot closed this as completed Apr 27, 2020

Jeffwan mentioned this issue Sep 1, 2020

[Release 1.2] Feature Planning / Roadmap kubeflow/kubeflow#5224

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance problem about pod informer #1079

Performance problem about pod informer #1079

hykych commented Sep 11, 2019 •

edited

Loading

issue-label-bot bot commented Sep 11, 2019

gaocegege commented Sep 11, 2019 •

edited

Loading

hykych commented Sep 11, 2019 •

edited

Loading

gaocegege commented Sep 11, 2019

gaocegege commented Sep 11, 2019

jtfogarty commented Jan 14, 2020

stale bot commented Apr 20, 2020

Performance problem about pod informer #1079

Performance problem about pod informer #1079

Comments

hykych commented Sep 11, 2019 • edited Loading

issue-label-bot bot commented Sep 11, 2019

gaocegege commented Sep 11, 2019 • edited Loading

hykych commented Sep 11, 2019 • edited Loading

gaocegege commented Sep 11, 2019

gaocegege commented Sep 11, 2019

jtfogarty commented Jan 14, 2020

stale bot commented Apr 20, 2020

hykych commented Sep 11, 2019 •

edited

Loading

gaocegege commented Sep 11, 2019 •

edited

Loading

hykych commented Sep 11, 2019 •

edited

Loading