Load test failing on dedicated runner #27429

codeboten · 2023-10-04T18:48:37Z

Component(s)

No response

Describe the issue you're reporting

The following failure occurs when running the load test on dedicated runners:

2023/10/04 17:53:04 CPU consumption is 78.3%, max expected is 60%
    test_case.go:292: CPU consumption is 78.3%, max expected is 60%
    scenarios.go:304: 
        	Error Trace:	/home/ghrunner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/testbed/tests/scenarios.go:304
        	            				/home/ghrunner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/testbed/tests/trace_test.go:191
        	Error:      	"50" is not less than "0"
        	Test:       	TestTraceNoBackend10kSPS/MemoryLimit
2023/10/04 17:53:04 Stopped generator. Sent:     29090 items
--- FAIL: TestTraceNoBackend10kSPS (18.48s)
    --- PASS: TestTraceNoBackend10kSPS/NoMemoryLimit (15.28s)
    --- FAIL: TestTraceNoBackend10kSPS/MemoryLimit (3.20s)

https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/6408839554/job/17401287975

The text was updated successfully, but these errors were encountered:

An error appears to be the load tests to fail, but it's not being checked which makes it hard to debug. Helps with open-telemetry#27429 Signed-off-by: Alex Boten <aboten@lightstep.com>

An error appears to be the load tests to fail, but it's not being checked which makes it hard to debug. Helps with #27429 Signed-off-by: Alex Boten <aboten@lightstep.com> Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>

codeboten · 2023-10-04T21:57:30Z

The numbers reported by the dedicated runner pretty different from what we're seeing on github shared runners. A sample below shows the results on shared runners:

2023/10/04 13:46:35 Writing Agent log to /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/testbed/tests/results/TestMetric10kDPS/OpenCensus/agent.log
2023/10/04 13:46:35 Agent running, pid=2091
2023/10/04 13:46:35 Starting load generator at 10000 items/sec.
2023/10/04 13:46:38 Agent RAM (RES):   0 MiB, CPU: 0.0% | Sent:    204400 items | Received:   203,700 items (67,897/sec)
2023/10/04 13:46:41 Agent RAM (RES):  79 MiB, CPU:39.0% | Sent:    414400 items | Received:   413,700 items (68,947/sec)
2023/10/04 13:46:44 Agent RAM (RES):  79 MiB, CPU:39.7% | Sent:    624057 items | Received:   623,700 items (69,301/sec)
2023/10/04 13:46:47 Agent RAM (RES):  78 MiB, CPU:39.7% | Sent:    834400 items | Received:   833,700 items (69,477/sec)
2023/10/04 13:46:50 Agent RAM (RES):  79 MiB, CPU:38.7% | Sent:   1044400 items | Received: 1,043,700 items (69,578/sec)
2023/10/04 13:46:50 Stopped generator. Sent:   1049300 items

And on the dedicated runner:

 2023/10/04 21:40:23 Writing Agent log to /home/ghrunner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/testbed/tests/results/TestMetric10kDPS/OpenCensus/agent.log
2023/10/04 21:40:23 Agent running, pid=1283643
2023/10/04 21:40:23 Starting load generator at 10000 items/sec.
2023/10/04 21:40:26 Agent RAM (RES):  32 MiB, CPU: 0.0% | Sent:    204400 items | Received:   203,700 items (67,887/sec)
2023/10/04 21:40:29 Agent RAM (RES):  76 MiB, CPU:64.3% | Sent:    413700 items | Received:   413,700 items (68,951/sec)
2023/10/04 21:40:32 Agent RAM (RES):  76 MiB, CPU:68.0% | Sent:    623779 items | Received:   623,700 items (69,302/sec)
2023/10/04 21:40:35 Agent RAM (RES):  76 MiB, CPU:69.3% | Sent:    833700 items | Received:   833,000 items (69,418/sec)
2023/10/04 21:40:38 Agent RAM (RES):  77 MiB, CPU:70.0% | Sent:   1044400 items | Received: 1,043,000 items (69,533/sec)

The dedicated hardware runs with 16 cores, compared to the 2 cores for shared runners. The question I have is should the testbed be modified for set the number of CPUs used by the tests? Or should the expected values of the tests match the new hardware?

codeboten · 2023-10-04T21:59:33Z

One way to set number of CPU in the childProcessCollector's Start func:

cp.cmd.Env = append(os.Environ(),
		"GOMAXPROCS=2",
	)

codeboten · 2023-10-04T22:03:49Z

Fwiw, the change in performance reporting will impact the reported benchmarks:

djaglowski · 2023-10-04T23:51:04Z

Setting the number of cores seems like a good idea. Can a runner run multiple jobs simultaneously? Assuming so, cores may end up being restricted inconsistently based on the number of jobs running.

codeboten · 2023-10-05T16:22:16Z

Can a runner run multiple jobs simultaneously?

As far as I can tell, the runners currently only run one job at a time now, but you're right that it could change in the future.

This will result in more consistent benchmarks across different environments. Fixes #27429 Signed-off-by: Alex Boten <aboten@lightstep.com>

An error appears to be the load tests to fail, but it's not being checked which makes it hard to debug. Helps with open-telemetry#27429 Signed-off-by: Alex Boten <aboten@lightstep.com> Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>

This will result in more consistent benchmarks across different environments. Fixes open-telemetry#27429 Signed-off-by: Alex Boten <aboten@lightstep.com>

An error appears to be the load tests to fail, but it's not being checked which makes it hard to debug. Helps with open-telemetry#27429 Signed-off-by: Alex Boten <aboten@lightstep.com> Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>

This will result in more consistent benchmarks across different environments. Fixes open-telemetry#27429 Signed-off-by: Alex Boten <aboten@lightstep.com>

**Description:** Adding a feature - These changes add a new `WithEnvVar` `ChildProcessOption` to be able to influence the child process environment without acting on the current environment. They also move the `GOMAXPROCS=2` setting added in dd8e010 to each invoking test because though being helpful to address #27429, the constraint doesn't seem applicable to the helper directly. Limiting the utility as a whole instead of specifying in the test context cannot be easily worked around and is interfering w/ some load testing efforts.

…0491) **Description:** Adding a feature - These changes add a new `WithEnvVar` `ChildProcessOption` to be able to influence the child process environment without acting on the current environment. They also move the `GOMAXPROCS=2` setting added in open-telemetry@dd8e010 to each invoking test because though being helpful to address open-telemetry#27429, the constraint doesn't seem applicable to the helper directly. Limiting the utility as a whole instead of specifying in the test context cannot be easily worked around and is interfering w/ some load testing efforts.

codeboten added the ci-cd CI, CD, testing, build issues label Oct 4, 2023

codeboten self-assigned this Oct 4, 2023

codeboten mentioned this issue Oct 4, 2023

[chore] error wasn't checked in loadtest #27430

Merged

codeboten mentioned this issue Oct 5, 2023

[chore] set GOMAXPROCS for collector subprocess #27449

Merged

codeboten closed this as completed in #27449 Oct 5, 2023

codeboten pushed a commit that referenced this issue Oct 5, 2023

[chore] set GOMAXPROCS for collector subprocess (#27449)

dd8e010

This will result in more consistent benchmarks across different environments. Fixes #27429 Signed-off-by: Alex Boten <aboten@lightstep.com>

github-actions bot mentioned this issue Oct 10, 2023

Weekly Report: 2023-10-03 - 2023-10-10 #27574

Closed

rmfitzpatrick mentioned this issue Jan 12, 2024

testbed: add and adopt WithEnvVar for child process #30491

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load test failing on dedicated runner #27429

Load test failing on dedicated runner #27429

codeboten commented Oct 4, 2023

codeboten commented Oct 4, 2023

codeboten commented Oct 4, 2023

codeboten commented Oct 4, 2023

djaglowski commented Oct 4, 2023

codeboten commented Oct 5, 2023

Load test failing on dedicated runner #27429

Load test failing on dedicated runner #27429

Comments

codeboten commented Oct 4, 2023

Component(s)

Describe the issue you're reporting

codeboten commented Oct 4, 2023

codeboten commented Oct 4, 2023

codeboten commented Oct 4, 2023

djaglowski commented Oct 4, 2023

codeboten commented Oct 5, 2023