Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load test failing on dedicated runner #27429

Closed
codeboten opened this issue Oct 4, 2023 · 5 comments · Fixed by #27449
Closed

Load test failing on dedicated runner #27429

codeboten opened this issue Oct 4, 2023 · 5 comments · Fixed by #27449
Assignees
Labels
ci-cd CI, CD, testing, build issues

Comments

@codeboten
Copy link
Contributor

Component(s)

No response

Describe the issue you're reporting

The following failure occurs when running the load test on dedicated runners:

2023/10/04 17:53:04 CPU consumption is 78.3%, max expected is 60%
    test_case.go:292: CPU consumption is 78.3%, max expected is 60%
    scenarios.go:304: 
        	Error Trace:	/home/ghrunner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/testbed/tests/scenarios.go:304
        	            				/home/ghrunner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/testbed/tests/trace_test.go:191
        	Error:      	"50" is not less than "0"
        	Test:       	TestTraceNoBackend10kSPS/MemoryLimit
2023/10/04 17:53:04 Stopped generator. Sent:     29090 items
--- FAIL: TestTraceNoBackend10kSPS (18.48s)
    --- PASS: TestTraceNoBackend10kSPS/NoMemoryLimit (15.28s)
    --- FAIL: TestTraceNoBackend10kSPS/MemoryLimit (3.20s)

https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/6408839554/job/17401287975

@codeboten codeboten added the ci-cd CI, CD, testing, build issues label Oct 4, 2023
@codeboten codeboten self-assigned this Oct 4, 2023
codeboten pushed a commit to codeboten/opentelemetry-collector-contrib that referenced this issue Oct 4, 2023
An error appears to be the load tests to fail, but it's not being checked which makes it hard to debug. Helps with open-telemetry#27429

Signed-off-by: Alex Boten <aboten@lightstep.com>
codeboten pushed a commit that referenced this issue Oct 4, 2023
An error appears to be the load tests to fail, but it's not being
checked which makes it hard to debug. Helps with #27429

Signed-off-by: Alex Boten <aboten@lightstep.com>
Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>
@codeboten
Copy link
Contributor Author

The numbers reported by the dedicated runner pretty different from what we're seeing on github shared runners. A sample below shows the results on shared runners:

2023/10/04 13:46:35 Writing Agent log to /home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/testbed/tests/results/TestMetric10kDPS/OpenCensus/agent.log
2023/10/04 13:46:35 Agent running, pid=2091
2023/10/04 13:46:35 Starting load generator at 10000 items/sec.
2023/10/04 13:46:38 Agent RAM (RES):   0 MiB, CPU: 0.0% | Sent:    204400 items | Received:   203,700 items (67,897/sec)
2023/10/04 13:46:41 Agent RAM (RES):  79 MiB, CPU:39.0% | Sent:    414400 items | Received:   413,700 items (68,947/sec)
2023/10/04 13:46:44 Agent RAM (RES):  79 MiB, CPU:39.7% | Sent:    624057 items | Received:   623,700 items (69,301/sec)
2023/10/04 13:46:47 Agent RAM (RES):  78 MiB, CPU:39.7% | Sent:    834400 items | Received:   833,700 items (69,477/sec)
2023/10/04 13:46:50 Agent RAM (RES):  79 MiB, CPU:38.7% | Sent:   1044400 items | Received: 1,043,700 items (69,578/sec)
2023/10/04 13:46:50 Stopped generator. Sent:   1049300 items

And on the dedicated runner:

 2023/10/04 21:40:23 Writing Agent log to /home/ghrunner/actions-runner/_work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/testbed/tests/results/TestMetric10kDPS/OpenCensus/agent.log
2023/10/04 21:40:23 Agent running, pid=1283643
2023/10/04 21:40:23 Starting load generator at 10000 items/sec.
2023/10/04 21:40:26 Agent RAM (RES):  32 MiB, CPU: 0.0% | Sent:    204400 items | Received:   203,700 items (67,887/sec)
2023/10/04 21:40:29 Agent RAM (RES):  76 MiB, CPU:64.3% | Sent:    413700 items | Received:   413,700 items (68,951/sec)
2023/10/04 21:40:32 Agent RAM (RES):  76 MiB, CPU:68.0% | Sent:    623779 items | Received:   623,700 items (69,302/sec)
2023/10/04 21:40:35 Agent RAM (RES):  76 MiB, CPU:69.3% | Sent:    833700 items | Received:   833,000 items (69,418/sec)
2023/10/04 21:40:38 Agent RAM (RES):  77 MiB, CPU:70.0% | Sent:   1044400 items | Received: 1,043,000 items (69,533/sec)

The dedicated hardware runs with 16 cores, compared to the 2 cores for shared runners. The question I have is should the testbed be modified for set the number of CPUs used by the tests? Or should the expected values of the tests match the new hardware?

@codeboten
Copy link
Contributor Author

One way to set number of CPU in the childProcessCollector's Start func:

cp.cmd.Env = append(os.Environ(),
		"GOMAXPROCS=2",
	)

@codeboten
Copy link
Contributor Author

Fwiw, the change in performance reporting will impact the reported benchmarks:

Screenshot 2023-10-04 at 3 02 11 PM

@djaglowski
Copy link
Member

Setting the number of cores seems like a good idea. Can a runner run multiple jobs simultaneously? Assuming so, cores may end up being restricted inconsistently based on the number of jobs running.

@codeboten
Copy link
Contributor Author

Can a runner run multiple jobs simultaneously?

As far as I can tell, the runners currently only run one job at a time now, but you're right that it could change in the future.

codeboten pushed a commit that referenced this issue Oct 5, 2023
This will result in more consistent benchmarks across different
environments.

Fixes #27429

Signed-off-by: Alex Boten <aboten@lightstep.com>
jorgeancal pushed a commit to jorgeancal/opentelemetry-collector-contrib that referenced this issue Oct 9, 2023
An error appears to be the load tests to fail, but it's not being
checked which makes it hard to debug. Helps with open-telemetry#27429

Signed-off-by: Alex Boten <aboten@lightstep.com>
Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>
jorgeancal pushed a commit to jorgeancal/opentelemetry-collector-contrib that referenced this issue Oct 9, 2023
This will result in more consistent benchmarks across different
environments.

Fixes open-telemetry#27429

Signed-off-by: Alex Boten <aboten@lightstep.com>
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
An error appears to be the load tests to fail, but it's not being
checked which makes it hard to debug. Helps with open-telemetry#27429

Signed-off-by: Alex Boten <aboten@lightstep.com>
Co-authored-by: Bogdan Drutu <bogdandrutu@gmail.com>
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
This will result in more consistent benchmarks across different
environments.

Fixes open-telemetry#27429

Signed-off-by: Alex Boten <aboten@lightstep.com>
dmitryax pushed a commit that referenced this issue Jan 16, 2024
**Description:**
Adding a feature - These changes add a new `WithEnvVar`
`ChildProcessOption` to be able to influence the child process
environment without acting on the current environment. They also move
the `GOMAXPROCS=2` setting added in
dd8e010
to each invoking test because though being helpful to address
#27429,
the constraint doesn't seem applicable to the helper directly. Limiting
the utility as a whole instead of specifying in the test context cannot
be easily worked around and is interfering w/ some load testing efforts.
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this issue Feb 1, 2024
…0491)

**Description:**
Adding a feature - These changes add a new `WithEnvVar`
`ChildProcessOption` to be able to influence the child process
environment without acting on the current environment. They also move
the `GOMAXPROCS=2` setting added in
open-telemetry@dd8e010
to each invoking test because though being helpful to address
open-telemetry#27429,
the constraint doesn't seem applicable to the helper directly. Limiting
the utility as a whole instead of specifying in the test context cannot
be easily worked around and is interfering w/ some load testing efforts.
anthoai97 pushed a commit to anthoai97/opentelemetry-collector-contrib that referenced this issue Feb 12, 2024
…0491)

**Description:**
Adding a feature - These changes add a new `WithEnvVar`
`ChildProcessOption` to be able to influence the child process
environment without acting on the current environment. They also move
the `GOMAXPROCS=2` setting added in
open-telemetry@dd8e010
to each invoking test because though being helpful to address
open-telemetry#27429,
the constraint doesn't seem applicable to the helper directly. Limiting
the utility as a whole instead of specifying in the test context cannot
be easily worked around and is interfering w/ some load testing efforts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-cd CI, CD, testing, build issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants