Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance evaluation (and possible optimization?) #41

Closed
mariomac opened this issue Mar 30, 2023 · 5 comments
Closed

Performance evaluation (and possible optimization?) #41

mariomac opened this issue Mar 30, 2023 · 5 comments

Comments

@mariomac
Copy link
Contributor

In projects previously using a similar architecture, 200K events/second consumed 0.5 CPUs. The impact was caused mostly because of the userspace having to wakeup on each message in the ringbuffer. In the case of the linked project, moving from a ringbuffer to a hashmap allowed aggregating millions of events/second with 0.1% CPU.

In the case of HTTP/GRPC servers, it seems unlikely that a single host will need to process so many events/second. So our current implementation should be fine. However, we could create performance tests to see how many resources we are consuming on a typical high-load scenarios.

If we decide to optimize the performance, I would recommend to address the userspace wakeups on each HTTP request message.

We can configure the ringbuffer to accumulate messages and send them to the user space in batches of X (user configurable). If, after a user-configured timeout, the batch size didn't reach X, we will anyway send them. This will decrease by orders of magnitude the number of userspace wakeups.

@grcevski
Copy link
Contributor

I did couple of benchmark runs in HTTP mode, using our wrk2 based driver on top of the simple pingserver we have in our tests. I didn't use any delay in the API requests, so this is pretty much the worst case scenario - all instrumentation overhead.

Here are the results for a 10,000 QPS (which is very unrealistic for a single service) :

Environment

Linux kernel version 5.19.0-38
12th Gen Intel(R) Core(TM) i7-1280P, 2 GHz, 6 performance cores, 8 low power cores, running without CPU throttling
64 GB total system memory

Notes

Latency varies a bit from run to run, so the latency histograms are on-par.

Base

server 85% CPU utilization

Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%  609.00us
 75.000%    0.88ms
 90.000%    1.04ms
 99.000%    1.16ms
 99.900%    1.24ms
 99.990%    1.74ms
 99.999%    4.63ms
100.000%    6.49ms

NOOP tracing, just collecting and ignoring the events

otelhttp 15%
server from 85% to 95%

Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%  632.00us
 75.000%    0.89ms
 90.000%    1.05ms
 99.000%    1.17ms
 99.900%    1.25ms
 99.990%    1.40ms
 99.999%    2.26ms
100.000%    2.63ms

Full OTel HTTP with local Mimir, Tempo, Agent

server from 85% to 95%
otelhttp 95%

Latency Distribution (HdrHistogram - Recorded Latency)
 50.000%  603.00us
 75.000%    0.86ms
 90.000%    1.01ms
 99.000%    1.12ms
 99.900%    1.63ms
 99.990%    2.11ms
 99.999%    2.46ms
100.000%    3.07ms 

@mariomac
Copy link
Contributor Author

@grcevski do you have the stress scripts somewhere?

@mariomac
Copy link
Contributor Author

@grcevski would it be possible to repeat your experiments with the version in the following PR?

#61

@grcevski
Copy link
Contributor

@grcevski do you have the stress scripts somewhere?

Yes, I committed the script in test/cmd/pingserver/driver.sh. I use wrk2 to create a constant load, regardless if the service can keep up. I had slightly modified wrk2 locally to close and re-establish the connection each time, thinking it's possibly closer to what happens in the real world.

@grcevski
Copy link
Contributor

@grcevski would it be possible to repeat your experiments with the version in the following PR?

#61

Definitely, I'll get these runs done today!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants