add more scheduler benchmark testcases #69898

Huang-Wei · 2018-10-16T21:20:07Z

What this PR does / why we need it:

This PR adds more benchmark testcases:

add benchmark test for PodAffinity
add benchmark test for NodeAffinity

Special notes for your reviewer:

Motivation of this PR is to evaluate performance of implementation of Inter-PodAffinity is calculated on multiple pods. And NodeAffinity test is added as a bonus.

(based on this PR, here is an example of testing result, checkout the "before" sheet)

Release note:

Add scheduler benchmark tests for PodAffinity and NodeAffinity.

/kind feature
/sig scheduling
/cc @resouer @bsalamat @misterikkit

Huang-Wei · 2018-10-16T21:22:29Z

test/integration/scheduler_perf/scheduler_bench_test.go

+		map[string]string{"foo": ""},
+	)
+	testStrategy := testutils.NewCustomCreatePodStrategy(testBasePod)
+	nodeStrategy := testutils.NewLabelNodePrepareStrategy(apis.LabelZoneFailureDomain, "zone1")


I just put this simple scenario as a start.

But feel free to comment if we'd like a more typical scenario like "all nodes are split into X regions, Y zones", etc.

Huang-Wei · 2018-10-16T21:23:20Z

test/integration/scheduler_perf/scheduler_bench_test.go

+	}
+	basePod.Spec.Affinity = &v1.Affinity{
+		PodAntiAffinity: &v1.PodAntiAffinity{
+			RequiredDuringSchedulingIgnoredDuringExecution: []v1.PodAffinityTerm{


similar like https://github.com/kubernetes/kubernetes/pull/69898/files#r225712449, it's the most basic case. We can extend it to have multiple terms, like region in X, zone in Y, etc.

Huang-Wei · 2018-10-16T22:23:36Z

/retest

Huang-Wei · 2018-10-16T22:50:02Z

/test pull-kubernetes-integration

bsalamat · 2018-10-18T00:05:49Z

test/integration/scheduler_perf/scheduler_bench_test.go

 // PodAntiAffinity rules when the cluster has various quantities of nodes and
 // scheduled pods.
-func BenchmarkSchedulingAntiAffinity(b *testing.B) {
+func BenchmarkSchedulingPodAntiAffinity(b *testing.B) {


I am not sure where the code for the perf dashboard lives, but it is worth checking to make sure that renaming this function does not affect the dashboard.

Sure. From below snapshot, it's very likely it's running -bench=. option:

@wojtek-t @shyamjvs If function BenchmarkSchedulingAntiAffinity is renamed to BenchmarkSchedulingPodAntiAffinity, I guess old datapoints would stop growing and become stale? And maybe we need to do some trick (manually rename the local dataset file) to make old data connect with new data, so that the graph is still consistent.

I'm not familiar with perf-dash.
@krzysied - can you please suggest something?

I guess old datapoints would stop growing and become stale

Yes. The new datapoints will by assigned to new function name. The old one will remain unchanged.

And maybe we need to do some trick (manually rename the local dataset file) to make old data connect with new data, so that the graph is still consistent.

Can be done. The question is whether we need this.
Regarding benchmarks, Perfdash presents 100 newest results. Assuming that there is one test every hour, after a week there will be only results with the new function name available.

Thanks @krzysied @wojtek-t , then don't bother to "migrate" old dataset :)

bsalamat · 2018-10-18T00:09:11Z

test/integration/scheduler_perf/scheduler_bench_test.go

@@ -99,11 +148,66 @@ func makeBasePodWithAntiAffinity(podLabels, affinityLabels map[string]string) *v
 	return basePod
 }

+// makeBasePodWithAffinity creates a Pod object to be used as a template.
+// The Pod has a PodAntiAffinity requirement against pods with the given labels.
+func makeBasePodWithAffinity(podLabels, affinityZoneLabels map[string]string) *v1.Pod {


Please consider renaming this function to makeBasePodWithPodAffinity to be consistent.

bsalamat · 2018-10-18T00:24:23Z

test/integration/scheduler_perf/scheduler_bench_test.go

+	for _, test := range tests {
+		name := fmt.Sprintf("%vNodes/%vPods", test.nodes, test.existingPods)
+		b.Run(name, func(b *testing.B) {
+			benchmarkScheduling(test.nodes, test.existingPods, test.minPods, defaultNodeStrategy, setupStrategy, testStrategy, b)


Shouldn't you use LabelNodePrepareStrategy instead of defaultNodeStrategy to make the test more meaningful?

Huang-Wei · 2018-10-18T19:47:01Z

/retest

resouer

Just left a few review comments, thanks!

resouer · 2018-10-19T06:21:49Z

test/integration/scheduler_perf/scheduler_bench_test.go

+	}
+	// The setup strategy creates pods with no affinity rules.
+	setupStrategy := testutils.NewSimpleWithControllerCreatePodStrategy("setup")
+	// The test strategy creates pods with affinity for each other.


I'd choose to move this comment to testStrategy := ... line below, ditto for BenchmarkSchedulingNodeAffinity

resouer · 2018-10-19T06:43:17Z

test/integration/scheduler_perf/scheduler_bench_test.go

+func BenchmarkSchedulingPodAffinity(b *testing.B) {
+	tests := []struct{ nodes, existingPods, minPods int }{
+		{nodes: 500, existingPods: 250, minPods: 250},
+		{nodes: 500, existingPods: 5000, minPods: 250},


I am a bit curious why we had different number of nodes and pods in BenchmarkScheduling VS BenchmarkSchedulingPodAffinity/AntiAffinity from the beginning? It should've been a good chance to do cross comparison.

In the past, we could hardly go beyond 500 nodes for inter-pod affinity/anti-affinity benchmarks without having test timeouts. With the improved performance of the feature, I think we should be able to run the test for 1000 node clusters with no problem.

resouer · 2018-10-19T06:58:17Z

test/integration/scheduler_perf/scheduler_bench_test.go

+		Spec: testutils.MakePodSpec(),
+	}
+	basePod.Spec.Affinity = &v1.Affinity{
+		PodAntiAffinity: &v1.PodAntiAffinity{


Wait ... PodAntiAffinity in makeBasePodWithPodAffinity?

Suggested change

PodAntiAffinity: &v1.PodAntiAffinity{

PodAffinity: &v1.PodAffinity{

Also, ditto the comment of this function :-)

Apologize... A copy/paste error..

Fixed.

Huang-Wei · 2018-10-19T16:23:04Z

/retest

bsalamat · 2018-10-19T23:50:55Z

test/integration/scheduler_perf/scheduler_bench_test.go

 // PodAntiAffinity rules when the cluster has various quantities of nodes and
 // scheduled pods.
-func BenchmarkSchedulingAntiAffinity(b *testing.B) {
+func BenchmarkSchedulingPodAntiAffinity(b *testing.B) {
 	tests := []struct{ nodes, existingPods, minPods int }{
 		{nodes: 500, existingPods: 250, minPods: 250},
 		{nodes: 500, existingPods: 5000, minPods: 250},


Now that you are at it, could you add nodes: 1000, existing: 1000, minPods: 500 test here and in the next test?

@bsalamat to confirm, would you like the following tests to be added this 1000-node test entry?

BenchmarkSchedulingPodAntiAffinity

BenchmarkSchedulingPodAffinity

BenchmarkSchedulingNodeAffinity

And after that, our cases would be like this:

BenchmarkScheduling (have no affinity)

{nodes: 100, existingPods: 0, minPods: 100}, {nodes: 100, existingPods: 1000, minPods: 100}, {nodes: 1000, existingPods: 0, minPods: 100}, {nodes: 1000, existingPods: 1000, minPods: 100},

BenchmarkSchedulingPodAntiAffinity

{nodes: 500, existingPods: 250, minPods: 250}, {nodes: 500, existingPods: 5000, minPods: 250}, {nodes: 1000, existingPods: 1000, minPods: 500},

BenchmarkSchedulingPodAffinity

{nodes: 500, existingPods: 250, minPods: 250}, {nodes: 500, existingPods: 5000, minPods: 250}, {nodes: 1000, existingPods: 1000, minPods: 500},

BenchmarkSchedulingNodeAffinity

{nodes: 500, existingPods: 250, minPods: 250}, {nodes: 500, existingPods: 5000, minPods: 250}, {nodes: 1000, existingPods: 1000, minPods: 500},

ping @bsalamat ^^

Seems reasonable from my side :-)

@Huang-Wei Looks good for now. I want to be able to try a larger number of nodes, but it has the risk of timing out in our CI/CD. Let's go with 1000 for now.

Done. Thanks @resouer @bsalamat .

resouer · 2018-10-24T14:56:23Z

/assign

- add benchmark test for PodAffinity - add benchmark test for NodeAffinity - add 1000-nodes test for PodAntiAffinity/PodAffinity/NodeAffinity

bsalamat

/lgtm
/approve

k8s-ci-robot · 2018-10-24T19:16:20Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, Huang-Wei

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~test/integration/scheduler_perf/OWNERS~~ [bsalamat]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Huang-Wei · 2018-10-24T19:19:07Z

/test pull-kubernetes-e2e-gce

k8s-ci-robot requested a review from bsalamat October 16, 2018 21:20

k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Oct 16, 2018

k8s-ci-robot requested review from misterikkit and resouer October 16, 2018 21:20

Huang-Wei commented Oct 16, 2018

View reviewed changes

Huang-Wei mentioned this pull request Oct 16, 2018

Inter-PodAffinity is calculated on multiple pods #68725

Closed

bsalamat reviewed Oct 18, 2018

View reviewed changes

Huang-Wei force-pushed the scheudler-perf-more-cases branch from f27f80a to 681a15e Compare October 18, 2018 18:11

resouer reviewed Oct 19, 2018

View reviewed changes

Huang-Wei force-pushed the scheudler-perf-more-cases branch from 681a15e to 74522a9 Compare October 19, 2018 07:13

bsalamat reviewed Oct 19, 2018

View reviewed changes

k8s-ci-robot assigned resouer Oct 24, 2018

add more scheduler benchmark testcases

5259d09

- add benchmark test for PodAffinity - add benchmark test for NodeAffinity - add 1000-nodes test for PodAntiAffinity/PodAffinity/NodeAffinity

Huang-Wei force-pushed the scheudler-perf-more-cases branch from 74522a9 to 5259d09 Compare October 24, 2018 18:06

bsalamat approved these changes Oct 24, 2018

View reviewed changes

k8s-ci-robot assigned bsalamat Oct 24, 2018

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 24, 2018

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 24, 2018

k8s-ci-robot merged commit 10121e6 into kubernetes:master Oct 24, 2018

Huang-Wei deleted the scheudler-perf-more-cases branch October 24, 2018 21:49

Huang-Wei mentioned this pull request Jun 22, 2020

Run Scheduler integration benchmarks as kubemark benchmarks #92388

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add more scheduler benchmark testcases #69898

add more scheduler benchmark testcases #69898

Huang-Wei commented Oct 16, 2018

Huang-Wei Oct 16, 2018

Huang-Wei Oct 16, 2018

Huang-Wei commented Oct 16, 2018

Huang-Wei commented Oct 16, 2018

bsalamat Oct 18, 2018

Huang-Wei Oct 18, 2018

wojtek-t Oct 18, 2018

krzysied Oct 18, 2018

Huang-Wei Oct 18, 2018

bsalamat Oct 18, 2018

Huang-Wei Oct 18, 2018

bsalamat Oct 18, 2018

Huang-Wei Oct 18, 2018

Huang-Wei commented Oct 18, 2018

resouer left a comment

resouer Oct 19, 2018

Huang-Wei Oct 19, 2018

resouer Oct 19, 2018

bsalamat Oct 19, 2018

resouer Oct 19, 2018 •

edited

Huang-Wei Oct 19, 2018

Huang-Wei commented Oct 19, 2018

bsalamat Oct 19, 2018

Huang-Wei Oct 20, 2018

Huang-Wei Oct 24, 2018

resouer Oct 24, 2018

bsalamat Oct 24, 2018

Huang-Wei Oct 24, 2018

resouer commented Oct 24, 2018

bsalamat left a comment

k8s-ci-robot commented Oct 24, 2018

Huang-Wei commented Oct 24, 2018

	PodAntiAffinity: &v1.PodAntiAffinity{
	PodAffinity: &v1.PodAffinity{

add more scheduler benchmark testcases #69898

add more scheduler benchmark testcases #69898

Conversation

Huang-Wei commented Oct 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huang-Wei commented Oct 16, 2018

Huang-Wei commented Oct 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huang-Wei commented Oct 18, 2018

resouer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

resouer Oct 19, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Huang-Wei commented Oct 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

resouer commented Oct 24, 2018

bsalamat left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Oct 24, 2018

Huang-Wei commented Oct 24, 2018

resouer Oct 19, 2018 •

edited