Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add job load test: create multiple jobs based on the number of nodes #1998

Merged
merged 2 commits into from
Mar 17, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion clusterloader2/pkg/measurement/common/wait_for_jobs.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ const (
defaultWaitForFinishedJobsTimeout = 10 * time.Minute
waitForFinishedJobsName = "WaitForFinishedJobs"
waitForFinishedJobsWorkers = 1
checkFinishedJobsInterval = 5 * time.Second
checkFinishedJobsInterval = time.Second
)

func init() {
Expand Down
51 changes: 42 additions & 9 deletions clusterloader2/testing/batch/config.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,22 @@
{{$mode := (DefaultParam .MODE "Indexed")}}
{{$pods_per_node_per_size := 20}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not 10 as default and guard it begin parameter?

30 is a suggested number of Pods per Node in highly scalable cluster. As there are 3 sizes (small/medium/large), each would get 10 pods per node.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Updated.

{{$total_pods_per_size := MultiplyInt .Nodes $pods_per_node_per_size}}
{{$small_job_size := 5}}
{{$small_jobs_count := DivideInt $total_pods_per_size $small_job_size}}
{{$medium_job_size := 20}}
{{$medium_jobs_count := DivideInt $total_pods_per_size $medium_job_size}}
{{$large_job_size := 400}}
{{$large_jobs_count := DivideInt $total_pods_per_size $large_job_size}}

name: batch

namespace:
number: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support only 3000 Pods per Namespace, see [1]. This means that we need single Namespace per 100 Nodes (or 3000 Pods)

[1]https://github.com/kubernetes/community/blob/master/sig-scalability/configs-and-limits/thresholds.md

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the template to have parameters similar to load.


tuningSets:
- name: Uniform1qps
- name: Uniform5qps
qpsLoad:
qps: 1
qps: 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want to use so low constant value?

For CL2 test we are using --env=CL2_LOAD_TEST_THROUGHPUT=50 to determine QPS in saturation tunning test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a parameter for it.


steps:
- name: Start measurements
Expand All @@ -16,22 +26,45 @@ steps:
Params:
action: start
labelSelector: group = test-job
- name: Create job
- name: Create {{$mode}} jobs
phases:
- namespaceRange:
min: 1
max: 1
replicasPerNamespace: 1
tuningSet: Uniform1qps
replicasPerNamespace: {{$small_jobs_count}}
tuningSet: Uniform5qps
objectBundle:
- basename: small
objectTemplatePath: "job.yaml"
templateFillMap:
Replicas: {{$small_job_size}}
Mode: {{$mode}}
- namespaceRange:
min: 1
max: 1
replicasPerNamespace: {{$medium_jobs_count}}
tuningSet: Uniform5qps
objectBundle:
- basename: medium
objectTemplatePath: "job.yaml"
templateFillMap:
Replicas: {{$medium_job_size}}
Mode: {{$mode}}
- namespaceRange:
min: 1
max: 1
replicasPerNamespace: {{$large_jobs_count}}
tuningSet: Uniform5qps
objectBundle:
- basename: test-job
- basename: large
objectTemplatePath: "job.yaml"
templateFillMap:
Replicas: 10
- name: Wait for jobs to finish
Replicas: {{$large_job_size}}
Mode: {{$mode}}
- name: Wait for {{$mode}} jobs to finish
measurements:
- Identifier: WaitForFinishedJobs
Method: WaitForFinishedJobs
Params:
action: gather
timeout: 1m
timeout: 10m
6 changes: 3 additions & 3 deletions clusterloader2/testing/batch/job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ metadata:
spec:
parallelism: {{.Replicas}}
completions: {{.Replicas}}
completionMode: {{.Mode}}
template:
metadata:
labels:
group: test-pod
spec:
containers:
- name: {{.Name}}
image: bash
args: ["-c", "exit"]
restartPolicy: Never
image: gcr.io/k8s-staging-perf-tests/sleep:v0.0.3
restartPolicy: Never
1 change: 1 addition & 0 deletions clusterloader2/testing/batch/overrides.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
MODE: Indexed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, newline

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to remove the file instead.